Thanks, Donald.
There are different situations (use cases) in which we may require GUIDs. The service characteristics of GUIDs may not always be the same.
I agree, and this is true at two levels. At one level, the GUID needs may be different for different data domains (i.e., top-level object categories: Taxon Names, References, Specimens, Agents, etc.) At another level, the GUID needs for different use cases *within* a data domain may differ.
- Users can assume that data records relate to the same real-world object
if and only if they share the same GUID.
I think that we have generally assumed that all our GUID uses will require this additional function, but this is the part that will get very expensive to implement.
Why would it be so much more expensive to implement?
I believe that we should take a case-by-case decision on whether the additional infrastructure is really required to support this level of integration.
I'm not sure I follow. Are you saying that it requires additional infrastructure to support the notion that "Users can assume that data records relate to the same real-world object if and only if they share the same GUID"? But under what circumstances could users ever assume that data records relate to the same real-world object when they do *not* share the same GUID? It would seem to me that it would require a human brain to reliably ascertain that GUID A refers to the same real-world object as GUID B. And once that determination is made, the "expensive" part is already done. At that point, the GUIDs should be (objectively) "synonymized", so that future users who reference either GUID do not have to invest the same expensive decision-making process involved with ascertaining their equivalency.
Maybe I've completely misunderstood your point, but I guess what I'm struggling to understand is, when presented with two data objects with different GUIDs, under what circumstances could a user "assume" anything other than that they are not the same real-world object?
As an example, I can see excellent reasons for us at least to try for benefit 3 for specimen records, and also probably in some way for taxon names and publications (whatever actual form the identifiers for these may take). However (I think, like Rod) I am not totally convinced that we will need to manage GUIDs offering benefit 3 for taxon concepts.
O.K., now I think I understand where you are coming from (i.e., the "fuzzy" nature of taxonomic concept GUIDs, and what they really represent). As I have argued during the TCS/LC discussions, my feeling is that a Taxonomic Concept object should not receive a GUID that is disconnected from the [TaxonName]+[Publication] pair of GUIDs. In other words, I think that a Concept should be uniquely identified via a name-usage instance (specifically, the name-usage instance that defined the concept). But I may not be in the majority on this.
We should use GUIDs (with benefits 1 and 2) for taxon concepts, and our taxon concepts should reference names and publications which probably have the stronger kind of GUIDs (with benefits 1-3). The coupling of a name GUID and a publication GUID would then be a unique combination that would still allow concepts to be compared.
Exactly....so why assign a different GUID to the Concept? Or, perhaps put a different way, why assign a different GUID to the *name* (in the Taxonomer paradigm that names do not exist as stand-alone objects independent of usage instances)?
This is just an example, but indicates that we need to think hard about the use cases for each data object class and its associated GUIDs.
Agreed!
Aloha, Rich
Richard L. Pyle, PhD Ichthyology, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://hbs.bishopmuseum.org/staff/pylerichard.html