Re: Globally Unique Identifier

31 Oct 2004

      ...
Richard Pyle writes:
...
"a central question, which Donald included in his
PowerPoint file, is whether the GUID is assigned to the physical
object, or to the electronic representation (data record).  Most of my
comments have been from the standpoint that the GUID applies to the
physical specimen. If it is the electronic records that we wish to
uniquely identify, then it seems to me that the <objectID> component
of an LSID should apply to the physical specimen, and multiple
database records should be uniquely identified using the <version>
component."
I think this is not practical. Do you mean those GLOPP organism-
interaction-data that have specimen voucher information can not be
published/referenced in GBIF until I figure out whether a collection
has digitized them (most have never digitized elsewhere!)?
Not necessarily.  I don't think the issue is whether or not the collection
has been digitized, but rather whether GUIDs have already been assigned to
the vouchers you want to document in the GLOPP dataset. So, if your question
is more along the lines of "do I need to check to see if GUIDs have already
been issued to voucher specimens that I cite, before I issue new GUIDs",
then my answer -- in the long run, at least -- would be, "well....yes!"
That's sort of the fundamental point of the GUIDs, isn't it?  But I don't
see this as being necessarily burdensome. For example, if your GLOPP dataset
included unambiguous pointers to specific voucher specimens (e.g., via
InstitutionCode+CollectionCode+CatalogNumber), then it *should* be a
relatively quick and straightforward process to find out if GUIDs have
already been assigned (if it's not quick & easy, then the GUID service would
be horribly inadequate!)  If, on the other hand, the GLOPP dataset does not
provide unambiguous pointers to specific voucher specimens, then the
"vouchered" aspect of those specimen citations seems unsupported, in which
case your GUIDs would need to be assigned to virtual/unvouchered "specimens"
(analogous to observation records), and hence non-duplicate.
...
Or if I
find they have not been, when the collection starts to digitize them,
they would have to create for those that have already been published
in GLOPP use a new version of the GLOPP LSID?
I would hope that if you assigned GUIDs to GLOPP-relevant voucher specimens
that belong to a collection that is not-yet digitized, you would do the
courtesy of providing the manager of that collection with a listing of the
GUIDs you created for the specific relevant specimens. I would further hope
that, when that collection is eventually digitized, the manager would have
the wherewithal to assign new GUIDs only to those specimens that did not yet
have them.

But, as someone who has worked in a natural history collection for nearly
two decades (and who bore witness to the transition of the collection from
non-digitized to digitized), I certainly do understand the "realities" of
this, and fully recognize that my optimistic perspective is likely to be
overly idealistic.  This is why I feel that duplicate assignment of GUIDs is
inevitable (that is, two different numbers for one object; not duplicate
GUIDs), and MUST be accommodated in any GUID system that is developed.  My
main point is that such "redundant" GUID issuance should be minimized (i.e.,
never done intentionally), and quickly/easily identified as such whenever it
is discovered.

So....if/when the situation does come up that (for example) GLOPP assigns
GUIDs to vouchers on behalf of a non-digitized collection, and that
collection later (inadvertently) re-assigns redundant GUIDs to the same set
of specimens; that eventual discovery of this duplication should be
accommodated by a mechanism for "retiring" one of the IDs into "objective
synonomy" of the other ID, and automated systems should be implemented in
the resolver service that "auto-forward" the retired ID to the active ID.

If your question is more about whether the collection, when it later becomes
digitized, should use the same <objectID> ID as was assigned for the GLOPP
dataset, but qualify that same Object ID with a unique version number --
then my answer is, "I don't know".  That is sort of the question I was
trying to ask ('though I didn't ask it very effectively). Basically, I was
suggesting that/asking whether it would make sense to pin the <ObjectID>
portion of a GUID to the physical object, and using the <version> feature as
a unique identifier to electronic representations thereof?
...
The same applies to taxonomic data - most revisions contain voucher
data.
Same solution, I think.

For the most part, though -- I see these as "growing pains" of a GUID system
during its first years of existence.  I would predict that two decades from
now, if one were to do an analysis of redundant GUIDs, one would find the
bulk of those having been issued relatively early on.

Aloha,
Rich