Different reasons for different GUIDs (Was: GUIDs, LSIDs, and metadata)

Tue Sep 13 15:30:21 CEST 2005

Thanks, Donald.

> There are different situations (use cases) in which we may require GUIDs.
> The service characteristics of GUIDs may not always be the same.

I agree, and this is true at two levels.  At one level, the GUID needs may
be different for different data domains (i.e., top-level object categories:
Taxon Names, References, Specimens, Agents, etc.) At another level, the GUID
needs for different use cases *within* a data domain may differ.

> 3. Users can assume that data records relate to the same real-world object
> if and only if they share the same GUID.
>
> I think that we have generally assumed that all our GUID uses will require
> this additional function, but this is the part that will get very
> expensive to implement.

Why would it be so much more expensive to implement?

> I believe that we should take a case-by-case decision on
> whether the additional infrastructure is really required to support this
> level of integration.

I'm not sure I follow.  Are you saying that it requires additional
infrastructure to support the notion that "Users can assume that data
records relate to the same real-world object if and only if they share the
same GUID"?  But under what circumstances could users ever assume that data
records relate to the same real-world object when they do *not* share the
same GUID?  It would seem to me that it would require a human brain to
reliably ascertain that GUID A refers to the same real-world object as GUID
B.  And once that determination is made, the "expensive" part is already
done. At that point, the GUIDs should be (objectively) "synonymized", so
that future users who reference either GUID do not have to invest the same
expensive decision-making process involved with ascertaining their
equivalency.

Maybe I've completely misunderstood your point, but I guess what I'm
struggling to understand is, when presented with two data objects with
different GUIDs, under what circumstances could a user "assume" anything
other than that they are not the same real-world object?

> As an example, I can see excellent reasons for us at least to try for
> benefit 3 for specimen records, and also probably in some way for taxon
> names and publications (whatever actual form the identifiers for these may
> take).  However (I think, like Rod) I am not totally convinced
> that we will
> need to manage GUIDs offering benefit 3 for taxon concepts.

O.K., now I think I understand where you are coming from (i.e., the "fuzzy"
nature of taxonomic concept GUIDs, and what they really represent).  As I
have argued during the TCS/LC discussions, my feeling is that a Taxonomic
Concept object should not receive a GUID that is disconnected from the
[TaxonName]+[Publication] pair of GUIDs. In other words, I think that a
Concept should be uniquely identified via a name-usage instance
(specifically, the name-usage instance that defined the concept). But I may
not be in the majority on this.

> We should use
> GUIDs (with benefits 1 and 2) for taxon concepts, and our taxon concepts
> should reference names and publications which probably have the stronger
> kind of GUIDs (with benefits 1-3).  The coupling of a name GUID and a
> publication GUID would then be a unique combination that would still allow
> concepts to be compared.

Exactly....so why assign a different GUID to the Concept?  Or, perhaps put a
different way, why assign a different GUID to the *name* (in the Taxonomer
paradigm that names do not exist as stand-alone objects independent of usage
instances)?

> This is just an example, but indicates that we need to think hard
> about the
> use cases for each data object class and its associated GUIDs.

Agreed!

Aloha,
Rich

Richard L. Pyle, PhD
Ichthyology, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html