Richard Pyle writes:
"a central question, which Donald included in his PowerPoint file, is whether the GUID is assigned to the physical object, or to the electronic representation (data record). Most of my comments have been from the standpoint that the GUID applies to the physical specimen. If it is the electronic records that we wish to uniquely identify, then it seems to me that the <objectID> component of an LSID should apply to the physical specimen, and multiple database records should be uniquely identified using the <version> component."
I think this is not practical. Do you mean those GLOPP organism- interaction-data that have specimen voucher information can not be published/referenced in GBIF until I figure out whether a collection has digitized them (most have never digitized elsewhere!)?
Not necessarily. I don't think the issue is whether or not the collection has been digitized, but rather whether GUIDs have already been assigned to the vouchers you want to document in the GLOPP dataset. So, if your question is more along the lines of "do I need to check to see if GUIDs have already been issued to voucher specimens that I cite, before I issue new GUIDs", then my answer -- in the long run, at least -- would be, "well....yes!" That's sort of the fundamental point of the GUIDs, isn't it? But I don't see this as being necessarily burdensome. For example, if your GLOPP dataset included unambiguous pointers to specific voucher specimens (e.g., via InstitutionCode+CollectionCode+CatalogNumber), then it *should* be a relatively quick and straightforward process to find out if GUIDs have already been assigned (if it's not quick & easy, then the GUID service would be horribly inadequate!) If, on the other hand, the GLOPP dataset does not provide unambiguous pointers to specific voucher specimens, then the "vouchered" aspect of those specimen citations seems unsupported, in which case your GUIDs would need to be assigned to virtual/unvouchered "specimens" (analogous to observation records), and hence non-duplicate.
Or if I find they have not been, when the collection starts to digitize them, they would have to create for those that have already been published in GLOPP use a new version of the GLOPP LSID?
I would hope that if you assigned GUIDs to GLOPP-relevant voucher specimens that belong to a collection that is not-yet digitized, you would do the courtesy of providing the manager of that collection with a listing of the GUIDs you created for the specific relevant specimens. I would further hope that, when that collection is eventually digitized, the manager would have the wherewithal to assign new GUIDs only to those specimens that did not yet have them.
But, as someone who has worked in a natural history collection for nearly two decades (and who bore witness to the transition of the collection from non-digitized to digitized), I certainly do understand the "realities" of this, and fully recognize that my optimistic perspective is likely to be overly idealistic. This is why I feel that duplicate assignment of GUIDs is inevitable (that is, two different numbers for one object; not duplicate GUIDs), and MUST be accommodated in any GUID system that is developed. My main point is that such "redundant" GUID issuance should be minimized (i.e., never done intentionally), and quickly/easily identified as such whenever it is discovered.
So....if/when the situation does come up that (for example) GLOPP assigns GUIDs to vouchers on behalf of a non-digitized collection, and that collection later (inadvertently) re-assigns redundant GUIDs to the same set of specimens; that eventual discovery of this duplication should be accommodated by a mechanism for "retiring" one of the IDs into "objective synonomy" of the other ID, and automated systems should be implemented in the resolver service that "auto-forward" the retired ID to the active ID.
If your question is more about whether the collection, when it later becomes digitized, should use the same <objectID> ID as was assigned for the GLOPP dataset, but qualify that same Object ID with a unique version number -- then my answer is, "I don't know". That is sort of the question I was trying to ask ('though I didn't ask it very effectively). Basically, I was suggesting that/asking whether it would make sense to pin the <ObjectID> portion of a GUID to the physical object, and using the <version> feature as a unique identifier to electronic representations thereof?
The same applies to taxonomic data - most revisions contain voucher data.
Same solution, I think.
For the most part, though -- I see these as "growing pains" of a GUID system during its first years of existence. I would predict that two decades from now, if one were to do an analysis of redundant GUIDs, one would find the bulk of those having been issued relatively early on.
Aloha, Rich