Re: Globally Unique Identifier

24 Sep 2004

      Richard Pyle wrote:
...
...
Yes, certainly, a GUID within the context of an XML document is pretty
well defined by the schema, dtd or just it's loose association with
other elements in the document.
But what about if one appears in a journal article, a citation in a
policy document, etc?
Well...that's partly why I emphasized that I think GUIDs should be for
computer-computer data exchange only.  But even if printed for a pair of
human eyes to read, surely there would be *some* stated context.  E.g.,
"ITIS TSN 1234567"; "BPBM 123456"; "GBIF Specimen ID 9876543"; "ICZN NameID
92AB5B37-70E9-4f05-9E97-CBABD08513ED"; etc....
So formalize that a little and you might have something more
consistently machine parsable like: ITIS.ORG:TSN:1234567;
BPBM.EDU:something:123456;GBIF.ORG:Specimen:9876543, ...

Add in the system identifier for resolution (urn:lsid:...) and you have
LSIDs.  The result is a far more consistent, legible and widely useful
mechanism for referencing objects.  Allowing an author to arbitrarily
provide the context for identifiers gets us little further along.

Have you seen how LSIDs and DOIs are being used in electronic publications?
...
...
It would be nice to be able to provide a unique
identifier as perhaps a footnote for a scientific name mentioned in a
document.
How hard would it be in such cases to include within the Methods section of
the document, something to the effect of "All taxon IDs listed in this paper
refer to GBIF Specimen ID's, which can be resolved at gbif.net".  If the
problem is one involving a pair of human eyes reading a number, then the
problem can be solved in the context of a pair of human eyes reading the
context.
Sure, but do that consistently, by all authors?  And do it in a way that
is without ambiguity?  Machine parsable (for electronic publication)?
Easily resuable in other documents?
...
...
Or perhaps a system might be developed that provided an LSID
for a DiGIR query document- so the dataset could be completely recreated
just be hitting on the LSID (yes, one is under construction).  One could
imagine simply passing the LSID to another infrastructure that say,
estimated potential distribution, or highlighted relevant news reports
from an AP feed mentioning the species for which the query was created.
 Using a simple, meaningless GUID buys us none of this potential, and
forces us to always use a wrapper to provide a contextual basis on how
to interpret the identifier.
I guess my question is, why *must* the wrapper be integral to the ID itself?
Why can't the contextual basis be established around the ID, at the time the
ID is presented/transferred, as needed?  If the cost of embedding the
context within the GUID is that all links to, say, Bishop Museum ichthyology
GUIDs for specimens become useless if the collection is transfered to
another institution and the embedded DomainName terminated, then I say put
the burden of context  establishment on the ID exchange system
("presentation layer"), rather than embedded within the ID itself.
It must be so that it can be reused outside the original context of the
document that contained the identifier.  I believe there are mechanisms
in the LSID spec for dealing with this problem - but I need to go back
through it to be sure.
...
Aloha,
Rich
Kia ora,
   Dave V.

Dave Vieglais

tags

participants (1)