Globally Unique Identifier

Fri Sep 24 21:04:00 CEST 2004

Richard Pyle wrote:
>>Yes, certainly, a GUID within the context of an XML document is pretty
>>well defined by the schema, dtd or just it's loose association with
>>other elements in the document.
>>
>>But what about if one appears in a journal article, a citation in a
>>policy document, etc?
>
>
> Well...that's partly why I emphasized that I think GUIDs should be for
> computer-computer data exchange only.  But even if printed for a pair of
> human eyes to read, surely there would be *some* stated context.  E.g.,
> "ITIS TSN 1234567"; "BPBM 123456"; "GBIF Specimen ID 9876543"; "ICZN NameID
> 92AB5B37-70E9-4f05-9E97-CBABD08513ED"; etc....
>

So formalize that a little and you might have something more
consistently machine parsable like: ITIS.ORG:TSN:1234567;
BPBM.EDU:something:123456;GBIF.ORG:Specimen:9876543, ...

Add in the system identifier for resolution (urn:lsid:...) and you have
LSIDs.  The result is a far more consistent, legible and widely useful
mechanism for referencing objects.  Allowing an author to arbitrarily
provide the context for identifiers gets us little further along.

Have you seen how LSIDs and DOIs are being used in electronic publications?

>
>>It would be nice to be able to provide a unique
>>identifier as perhaps a footnote for a scientific name mentioned in a
>>document.
>
>
> How hard would it be in such cases to include within the Methods section of
> the document, something to the effect of "All taxon IDs listed in this paper
> refer to GBIF Specimen ID's, which can be resolved at gbif.net".  If the
> problem is one involving a pair of human eyes reading a number, then the
> problem can be solved in the context of a pair of human eyes reading the
> context.
>

Sure, but do that consistently, by all authors?  And do it in a way that
is without ambiguity?  Machine parsable (for electronic publication)?
Easily resuable in other documents?

>
>>Or perhaps a system might be developed that provided an LSID
>>for a DiGIR query document- so the dataset could be completely recreated
>>just be hitting on the LSID (yes, one is under construction).  One could
>>imagine simply passing the LSID to another infrastructure that say,
>>estimated potential distribution, or highlighted relevant news reports
>>from an AP feed mentioning the species for which the query was created.
>>  Using a simple, meaningless GUID buys us none of this potential, and
>>forces us to always use a wrapper to provide a contextual basis on how
>>to interpret the identifier.
>
>
> I guess my question is, why *must* the wrapper be integral to the ID itself?
> Why can't the contextual basis be established around the ID, at the time the
> ID is presented/transferred, as needed?  If the cost of embedding the
> context within the GUID is that all links to, say, Bishop Museum ichthyology
> GUIDs for specimens become useless if the collection is transfered to
> another institution and the embedded DomainName terminated, then I say put
> the burden of context  establishment on the ID exchange system
> ("presentation layer"), rather than embedded within the ID itself.
>

It must be so that it can be reused outside the original context of the
document that contained the identifier.  I believe there are mechanisms
in the LSID spec for dealing with this problem - but I need to go back
through it to be sure.

> Aloha,
> Rich
>

Kia ora,
   Dave V.