Re: Globally Unique Identifier
Richard Pyle wrote:
Yes, certainly, a GUID within the context of an XML document is pretty well defined by the schema, dtd or just it's loose association with other elements in the document.
But what about if one appears in a journal article, a citation in a policy document, etc?
Well...that's partly why I emphasized that I think GUIDs should be for computer-computer data exchange only. But even if printed for a pair of human eyes to read, surely there would be *some* stated context. E.g., "ITIS TSN 1234567"; "BPBM 123456"; "GBIF Specimen ID 9876543"; "ICZN NameID 92AB5B37-70E9-4f05-9E97-CBABD08513ED"; etc....
So formalize that a little and you might have something more consistently machine parsable like: ITIS.ORG:TSN:1234567; BPBM.EDU:something:123456;GBIF.ORG:Specimen:9876543, ...
Add in the system identifier for resolution (urn:lsid:...) and you have LSIDs. The result is a far more consistent, legible and widely useful mechanism for referencing objects. Allowing an author to arbitrarily provide the context for identifiers gets us little further along.
Have you seen how LSIDs and DOIs are being used in electronic publications?
It would be nice to be able to provide a unique identifier as perhaps a footnote for a scientific name mentioned in a document.
How hard would it be in such cases to include within the Methods section of the document, something to the effect of "All taxon IDs listed in this paper refer to GBIF Specimen ID's, which can be resolved at gbif.net". If the problem is one involving a pair of human eyes reading a number, then the problem can be solved in the context of a pair of human eyes reading the context.
Sure, but do that consistently, by all authors? And do it in a way that is without ambiguity? Machine parsable (for electronic publication)? Easily resuable in other documents?
Or perhaps a system might be developed that provided an LSID for a DiGIR query document- so the dataset could be completely recreated just be hitting on the LSID (yes, one is under construction). One could imagine simply passing the LSID to another infrastructure that say, estimated potential distribution, or highlighted relevant news reports from an AP feed mentioning the species for which the query was created. Using a simple, meaningless GUID buys us none of this potential, and forces us to always use a wrapper to provide a contextual basis on how to interpret the identifier.
I guess my question is, why *must* the wrapper be integral to the ID itself? Why can't the contextual basis be established around the ID, at the time the ID is presented/transferred, as needed? If the cost of embedding the context within the GUID is that all links to, say, Bishop Museum ichthyology GUIDs for specimens become useless if the collection is transfered to another institution and the embedded DomainName terminated, then I say put the burden of context establishment on the ID exchange system ("presentation layer"), rather than embedded within the ID itself.
It must be so that it can be reused outside the original context of the document that contained the identifier. I believe there are mechanisms in the LSID spec for dealing with this problem - but I need to go back through it to be sure.
Aloha, Rich
Kia ora, Dave V.
participants (1)
-
Dave Vieglais