(XML)Vocabularies getting out of hand

Stuart G. Poss Stuart.Poss at USM.EDU
Wed Dec 1 18:45:12 CET 1999


> Jean-Marc writes:
> >Just say for the moment that we'll have several vocabularies by large
> >domains:

To which Kevin Thiele responds:

> Getting messy - I can't imagine that  ...

> I think this whole idea of a universalist vocabulary (even several
> universes) is becoming silly.

I believe the point was that we need to plan for a mechanism to handle
consistently and with reasonable efficiency a wide variety of vocabularies in an
extensible format, not that we would actually mandate any universal format.  Any
recommendation once implemented will, however, necessarily require us to forego
some possible schemes.  Lets just be sure that useful ideas are not lost as we
collectively thrash things out.

> Leigh comments:
> Even a core set for mammals, insects, nematodes, sponges... would probably
> be a null set, except for cellular characters.

The fact that not every pair of taxa can be compared for a particular basis of
comparison doesn't mean establishing a number of "vocabularies/name spaces" for
particular levels of generality is a useless approach.  Considering that such a
common set or "core or intersection" might actually be several thousands of
"protein" characters alone might suggest that this might be more fruitful than
one might first suspect.  I would agree, however, that it will likely take a
potentially large number of "namespaces" to tease out all the
interesting/potentially informative features in comparing widely disparate taxa,
especially since one would like to characterize features along more than one
basis of comparison (eg biochemical as well as morphological or ecological).  We
need not settle on a "final solution" as to name useage here, but only develop a
mechanism that can be modified through time, according to conventional useage, so
as to encompass a wide array of complex associations, some not yet discovered.

If I have at least partially correctly digested the references on the RDF syntax
correctly, we seem to need a mechanism to handle a wide variety of
subject,predicate,object relations relating to both taxa and characters.  The
extent of these statments will likely arrange themselves according to the context
(taxa) in question, at least to a large degree.  Consequently it seems to me that
we need some mechanism to specify taxa as a qualifier, or at least the "taxonomic
range" (extent) over which the basis of comparison (character) implies.
Likewise, we also need some mechanism (set of name spaces?) for communicating the
basis(bases) of comparison implict for a given character (ie shape, molecular
structure, fine structure, gross morphology, scale of comparison, physiology,

>>>From my perspective, for statements of the form Taxon "A" HAS character "X", it
would seem we need distinct "name spaces" (if my present conception of this term
is the one being used by XML programmers) for both the taxa (subject) and
character (object) part of the dialog.  It remains unclear to me if this implies
whether we may also need a potentially separate set of "name-spaces" for the
reverse mappings (of the form character "X" can be found in/identified in/is
representative of Taxon "A").   My suspicion is that we do, since such relations
will usually not be isomorphic.  However, this may depend upon the discipline
(context).  Likewise,  the nature of the qualifiers for the predicates themselves
(such as "is found in", "has", "has many of", "is a protein of", "has the shape
of", "is a synapomorphy for" etc.)  are even more unclear to me, as is  whether
there are any universal or common rules regarding such predicates that might
usefully constrain the ways the taxon and character "vocabularies" are defined
and used as an XML "grammar" potentially emerges.  Also, it is also not yet clear
to me how we can use XML to construct more complicated grammars implicit in the
higher order grammatical relations of the type "Taxon "A" has character "X"
according to Investigator(s) "Y" using observational method "Z" at time "T" (for
rapidly evolving retroviruses or seaonal breeding colors, flowering condition. or
molecular pathways perhaps).

With respect to "messy" names-spaces/vocabularies I suggest we let the computers
handle the large lists, multiple spellings, homonymy of terms, etc. They are
particularly good at it and we will need a great many to characterize the
millions of taxa, billions of molecules, and innumerable different bases for
character definitions.  I believe we need to concern ourselves more with how a
consistent grammar for a "metadata language about characters and taxa" can be
constructed/(implemented on the WWW) to facilitate consistent human input and
precise meaning.  Which specific character strings we ultimately use to formulate
the tags or namespaces might take a bit longer to resolve, particularly should
foreign colleagues feel strongly that they should be given unicode rather than
soley ascii representations.


More information about the tdwg-content mailing list