Based on playing with LSIDs as part of the Taxonomic Search Engine (http://darwin.zoology.gla.ac.uk/~rpage/portal ), and following some of the recent efforts on developing taxonomic name schema (such as the LinneanCore), I've become concerned that I don't think the implications of LSIDs have been fully thought through.
Specifically, I worry that by focusing on schema for names in their present format, the community is making more work for itself, and missing the point (and potential) of LSIDs.
Metadata --------
LSIDs are not simply identifiers, they come with associated metadata. For example, the LSID urn:lsid:ipni.org.lsid.zoology.gla.ac.uk:Id:20012729-1 has associated metadata which can be viewed directly at http://ipni.org.lsid.zoology.gla.ac.uk/authority/metadata/?lsid=urn:lsid:ipn... , or by using a LSID resolver (http://biopathways.ibm.nebiogrid.org/resolver/urn:lsid:ipni.org.lsid.zoology... ).
This metadata is in RDF (Resource Description Format), and provides information about the name, and links to other resources (via LSIDs). For example, the above record for *Poissonia heterantha*" has a link to its basionym, *Tephrosia heterantha*.
We've been here before -----------------------
It seems to me that there is an assumption that all we do with LSIDs is stick them in an XML document as an identifier ("GUID"), and our work is done. I feel this rather misses the point.
The key point is that, if we serve LSIDs we need to serve metadata about the names. So, we need a standard for the metadata. But, hang on, we've just spent energy on a standard for our data...? So, once a schema is agreed, someone then someone has to create a new schema for the metadata for LSIDs..? And, how do these schema relate...? Hmmmm.
One response to these ideas might be to simply serve a document based on one of the current schema (such as the LinneanCore) as metadata. But I think is a poor solution that doesn't exploit the potential of RDF metadata.
RDF ---
RDF is very cool, in that there are tools that can take RDF and reason about them. For instance, given metadata for the LSIDs urn:lsid:ipni.org.lsid.zoology.gla.ac.uk:Id:20012728-1 (Poissonia heterantha), and urn:lsid:ipni.org.lsid.zoology.gla.ac.uk:Id:944651-1 (Coursetia heterantha), we can infer that these two names are synonyms, because they share the same basionym.
If we use RDF, we get this kind of ability for "free." There is a lot of work in the semantic web, onotology, and bioinformatics communities about making inferences like this. Isn't this the kind of thing we want to do, rather than simply pass XML documents around? Wouldn't it be nice to take two or more LSIDs and workout their relationship, automatically (where possible). Client databases that stored LSIDs could work out whether names were synonyms in a standard way, without actually having to be told that the names are synonyms.
A radical view --------------
Instead of developing schema for exchanging information that are specific to taxonomy, a radical approach would be to adopt LSIDs as identifiers and standardise the associated metadata (which would convey all the information about the name). By adopting RDF we can tap into a lot of existing work, as well as existing external standards (e.g., Dublin core, and the emerging use of LSIDs in bioinformatics). It also offers an opportunity to serve up information on taxonomic names in a much more useful form than a simple XML document.
I wonder if an opportunity is being missed here.
Regards
Rod
-- -------------------------------------------------------- Professor Roderic D. M. Page Editor Elect, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org
Search for taxonomic names at http://darwin.zoology.gla.ac.uk/~rpage/portal