Re: Basic requirements for a basic GUID infrastructure
Hi Ricardo,
I also think that the specimen group has a similar concern about how
many "kinds" of GUIDs we have. For example, we can issue GUIDs to a survey or collecting event, a lot, an individual, parts of an individual, tissues, etc. However, Donald suggested a neat solution to break down the problem into smaller manageable pieces, which is to develop a shared ontology and relate each record in our specimen network to the classes in the ontology. So in the end, we only have one kind of GUID and many metadata services laying on top of that doing the various mappings.
This sounds almost *exactly* like what I am thinking for taxonomic data. The one unit that is shared by all consumers of taxonomic data is the NameUsage instance. It lies at the heart of indexing efforts, spelling variant documentation, basionym creation, new combination creation, and concept definitions (among others). How these fundamental units of taxonomy are used and cross-linked can be determined by overlying metadata services, via a shared ontology.
In my opinion, this kind of solution lay down a number of requirements for an underlying GUID infrastructure:
- Identifiers deliver you to objects (where feasible).
- Identifiers deliver you to object metadata.
- Each object should wear its own identifier (desirable).
- Identifiers are issued in a descentralized manner, i.e., by
independent agents. 5) Ids identify digital objects, not physical ones. 6) Other requirements I missed?
Then I pose the question, would the above assumptions fulfill the
requirements of the candidate solutions laid out in the previous taxon concepts and names discussion?
I *think* so, but I would need clarification on some of the vocabulary:
- What is an "object" in the context of taxonomy, and how is it different from object metadata? Is it necessarily a physical object? Or might it be an abstract object, such as a Code-compliant taxon name (independent of how or where that name is used or appears within some form of documentation)?
- What does "wear" mean in #3?
- I understand the need for #4, but how, then, are GUIDs re-used by multiple data providers? For example, the NameUsage instance of "Aus bus" in the publication "Smith 1995" might be replicated as a data record in many different data providers. Would each provider automatically create its own GUID for this same NameUsage instance? Or would GBIF establish tools to help locate and re-use a GUID that has already been established by another data provider for this NameUsage instance?
- I assume that #5 also applies to abstract objects (like the notion of a TaxonName independent of its involvement with any particular NameUsage instance), such that the identifier refers to the digital record established for the physical/abstract object. Does this sound correct?
- Also related to #5 (and your next paragraph), there is a much tighter connection between a physical object and its digital representation in the realm of specimens, than there is in the realm of taxonomy. It is unlikely that Bishop Museum will assign large numbers of GUIDs for digital records representing physical specimens housed at the Smithsonian. On the other hand, it is VERY likely that Bishop Museum and Smithsonian will share MANY taxon names and NameUsage instances across their two datasets.
You might argue that the current taxon names and concepts discussion
would require a centralized ID issuing authority
I would not argue that taxonomy GUIDs should necessarily be centralized. But I *would* argue that tools be established to encourage re-use of existing GUIDs, rather than encourage proliferation of multiple GUIDs for multiple digital objects that refer to the identical physical/abstract object.
Aloha, Rich
participants (1)
-
Richard Pyle