Basic requirements for a basic GUID infrastructure

Tue Nov 15 11:05:50 CET 2005

Hi Ricardo,

>     I also think that the specimen group has a similar concern about how
> many "kinds" of GUIDs we have. For example, we can issue GUIDs to a
> survey or collecting event, a lot, an individual, parts of an
> individual, tissues, etc. However, Donald suggested a neat solution to
> break down the problem into smaller manageable pieces, which is to
> develop a shared ontology and relate each record in our specimen network
> to the classes in the ontology. So in the end, we only have one kind of
> GUID and many metadata services laying on top of that doing the various
> mappings.

This sounds almost *exactly* like what I am thinking for taxonomic data.
The one unit that is shared by all consumers of taxonomic data is the
NameUsage instance.  It lies at the heart of indexing efforts, spelling
variant documentation, basionym creation, new combination creation, and
concept definitions (among others).  How these fundamental units of taxonomy
are used and cross-linked can be determined by overlying metadata services,
via a shared ontology.

> In my opinion, this kind of solution lay down a number of
> requirements for an underlying GUID infrastructure:
>
> 1) Identifiers deliver you to objects (where feasible).
> 2) Identifiers deliver you to object metadata.
> 3) Each object should wear its own identifier (desirable).
> 4) Identifiers are issued in a descentralized manner, i.e., by
> independent agents.
> 5) Ids identify digital objects, not physical ones.
> 6) Other requirements I missed?
>
>     Then I pose the question, would the above assumptions fulfill the
> requirements of the candidate solutions laid out in the previous taxon
> concepts and names discussion?

I *think* so, but I would need clarification on some of the vocabulary:

- What is an "object" in the context of taxonomy, and how is it different
from object metadata?  Is it necessarily a physical object? Or might it be
an abstract object, such as a Code-compliant taxon name (independent of how
or where that name is used or appears within some form of documentation)?

- What does "wear" mean in #3?

- I understand the need for #4, but how, then, are GUIDs re-used by multiple
data providers? For example, the NameUsage instance of "Aus bus" in the
publication "Smith 1995" might be replicated as a data record in many
different data providers. Would each provider automatically create its own
GUID for this same NameUsage instance?  Or would GBIF establish tools to
help locate and re-use a GUID that has already been established by another
data provider for this NameUsage instance?

- I assume that #5 also applies to abstract objects (like the notion of a
TaxonName independent of its involvement with any particular NameUsage
instance), such that the identifier refers to the digital record established
for the physical/abstract object.  Does this sound correct?

- Also related to #5 (and your next paragraph), there is a much tighter
connection between a physical object and its digital representation in the
realm of specimens, than there is in the realm of taxonomy.  It is unlikely
that Bishop Museum will assign large numbers of GUIDs for digital records
representing physical specimens housed at the Smithsonian.  On the other
hand, it is VERY likely that Bishop Museum and Smithsonian will share MANY
taxon names and NameUsage instances across their two datasets.

>     You might argue that the current taxon names and concepts discussion
> would require a centralized ID issuing authority

I would not argue that taxonomy GUIDs should necessarily be centralized.
But I *would* argue that tools be established to encourage re-use of
existing GUIDs, rather than encourage proliferation of multiple GUIDs for
multiple digital objects that refer to the identical physical/abstract
object.

Aloha,
Rich