Rod Page wrote:
Perhaps to try and clarify my previous post (not a good idea to write before the coffee has kicked in), as a straw man what's wrong with the following approach:
- Data providers serve up their own records, each identified by a GUID
Agree, although for some areas there might be some control over _who_ is a data provider to prevent overlap?
- At a minimum a "record" is metadata about that object, using
existing vocabulary as much as possible. This means Dublin Core for basic stuff (title, creator, etc.), Prism for bibliographic details, Basic Geo (WGS84 for geographic co-ordinates, Darwin Core for other specimen stuff, etc.
- Such records may refer to other objects (e.g., a sequence may refer
to a publication and a museum voucher specimen), all by way of GUIDs.
- Each record carries a Creative Commons license specifying what you
can do with the data. The present system where museums have some text saying what can and can't be done is too clumsy, and putting the GBIF data usage agreement in the way of searching GBIF is way too cumbersome.
If we do this (and we're pretty close to this already), then I think we've got the core of a useful resource. Partly because all of this is technically easy to do, and is already being done in other areas. So, why not just do it?
I think the role of GUIDs is to (a) unambiguously identify digital objects, and (b) tell us where to get that digital record.
Many of the issues that crop up here are, in my opinion, not really about GUIDs. Regarding the issue of taxonomic concepts, I think this is something of a red herring. If concepts are essentially a pairing of name and use, then given a GUID for a name and a publication, there will be huge numbers of these, and they will occur in all sorts of domains (taxonomy, ecology, development, conservation, medicine, etc.). They will also exist in databases (e.g., GenBank, TreeBASE, etc.). But if they are pairings of names and usage (e.g., publication, database), then given a name GUID and a usage GUID, do we need anything else, really? Names, publications, and specimens are primary, concepts are secondary.
Interesting - you could end up with a sort of hashed or concatenated guid (because people don't like dealing with composite keys). This might help with specimens which are duplicates of other specimens from the same collecting event - no need to unambiguously ID the event itself centrally if you can ID the collector and the ID he/she gave it ... although IDs for Collectors seems like it will be a long way off.
It seems to me the core notion of a concept is knowing what somebody meant when using a name. This is (a) a problem of inference, and (b) probably intractable in most cases (who knows what ecologist "x" meant by species "y" in 1910, regardless of what classification he or she may have claimed to be using).
One example where this inference is more straightforward, especially given 1-4 above, is if observations are linked to specimens. For example, in TreeBASE there are various occurrences of "Apomys datae" in different studies. Given that these names are linked to sequences, which are linked to specimens, we can infer that these different occurrences are the same thing. This kind of inference is made much easier when data is openly available. I also think that inference of concepts will be domain-specific, localised to particular questions, and not necessarily something taxonomic databases need to themselves support. Do we seriously think we can build a database of the usage of all names and what they meant? I suggest this knowledge will emerge over time on the foundations of 1-4 above. Since taxonomists don't control how names are used, and probably use names a lot less than other biologists (after all, we provide names so biologists can communicate), I don't think it is the role of taxonomists to document every concept.
The KISS principle (Keep It Simple Stupid) would seem to apply here. Why not focus on something that is achievable, has value, and can be linked to what people in other areas are doing (such as bioinformatics, the Semantic Web, etc.).
In other words, maybe it's time to stop thinking like taxonomists...
Professor Roderic D. M. Page Editor, Systematic Biology DEEB, IBLS Graham Kerr Building University of Glasgow Glasgow G12 8QP United Kingdom
Phone: +44 141 330 4778 Fax: +44 141 330 2792 email: r.page@bio.gla.ac.uk web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
Subscribe to Systematic Biology through the Society of Systematic Biologists Website: http://systematicbiology.org Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/ Find out what we know about a species at http://ispecies.org
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com
*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk