:
Jean-Marc writes:
Just say for the moment that we'll have several vocabularies by large domains:
To which Kevin Thiele responds:
Getting messy - I can't imagine that ...
I think this whole idea of a universalist vocabulary (even several universes) is becoming silly.
I believe the point was that we need to plan for a mechanism to handle consistently and with reasonable efficiency a wide variety of vocabularies in an extensible format, not that we would actually mandate any universal format. Any recommendation once implemented will, however, necessarily require us to forego some possible schemes. Lets just be sure that useful ideas are not lost as we collectively thrash things out.
Leigh comments:
Even a core set for mammals, insects, nematodes, sponges... would probably be a null set, except for cellular characters.
The fact that not every pair of taxa can be compared for a particular basis of comparison doesn't mean establishing a number of "vocabularies/name spaces" for particular levels of generality is a useless approach. Considering that such a common set or "core or intersection" might actually be several thousands of "protein" characters alone might suggest that this might be more fruitful than one might first suspect. I would agree, however, that it will likely take a potentially large number of "namespaces" to tease out all the interesting/potentially informative features in comparing widely disparate taxa, especially since one would like to characterize features along more than one basis of comparison (eg biochemical as well as morphological or ecological). We need not settle on a "final solution" as to name useage here, but only develop a mechanism that can be modified through time, according to conventional useage, so as to encompass a wide array of complex associations, some not yet discovered.
If I have at least partially correctly digested the references on the RDF syntax correctly, we seem to need a mechanism to handle a wide variety of subject,predicate,object relations relating to both taxa and characters. The extent of these statments will likely arrange themselves according to the context (taxa) in question, at least to a large degree. Consequently it seems to me that we need some mechanism to specify taxa as a qualifier, or at least the "taxonomic range" (extent) over which the basis of comparison (character) implies. Likewise, we also need some mechanism (set of name spaces?) for communicating the basis(bases) of comparison implict for a given character (ie shape, molecular structure, fine structure, gross morphology, scale of comparison, physiology, etc.).
From my perspective, for statements of the form Taxon "A" HAS character "X", it
would seem we need distinct "name spaces" (if my present conception of this term is the one being used by XML programmers) for both the taxa (subject) and character (object) part of the dialog. It remains unclear to me if this implies whether we may also need a potentially separate set of "name-spaces" for the reverse mappings (of the form character "X" can be found in/identified in/is representative of Taxon "A"). My suspicion is that we do, since such relations will usually not be isomorphic. However, this may depend upon the discipline (context). Likewise, the nature of the qualifiers for the predicates themselves (such as "is found in", "has", "has many of", "is a protein of", "has the shape of", "is a synapomorphy for" etc.) are even more unclear to me, as is whether there are any universal or common rules regarding such predicates that might usefully constrain the ways the taxon and character "vocabularies" are defined and used as an XML "grammar" potentially emerges. Also, it is also not yet clear to me how we can use XML to construct more complicated grammars implicit in the higher order grammatical relations of the type "Taxon "A" has character "X" according to Investigator(s) "Y" using observational method "Z" at time "T" (for rapidly evolving retroviruses or seaonal breeding colors, flowering condition. or molecular pathways perhaps).
With respect to "messy" names-spaces/vocabularies I suggest we let the computers handle the large lists, multiple spellings, homonymy of terms, etc. They are particularly good at it and we will need a great many to characterize the millions of taxa, billions of molecules, and innumerable different bases for character definitions. I believe we need to concern ourselves more with how a consistent grammar for a "metadata language about characters and taxa" can be constructed/(implemented on the WWW) to facilitate consistent human input and precise meaning. Which specific character strings we ultimately use to formulate the tags or namespaces might take a bit longer to resolve, particularly should foreign colleagues feel strongly that they should be given unicode rather than soley ascii representations.
Stuart