Topic 3: GUIDs for Taxon Names and Taxon Concepts

Richard Pyle deepreef at BISHOPMUSEUM.ORG
Fri Nov 4 03:21:20 CET 2005


> > By "Namestring", do you include author, our just taxonomic
> > nomenclatural
> > elements?  In either case, why even bother establishing a GUID for
> > taxonomic
> > names at all?  Why not just use the namestring itself?
>
> Oh come on, you can't be serious ;-) Names aren't unique, GUIDs are.

You didn't answer my question: By "Namestring", do you include author, our
just taxonomic nomenclatural elements? If you do not include author details
(as I suspect from your comment above), then you are right: namestrings are
non-unique because of homonyms.  If you do include author elements, then
namestrings can approach uniqueness.

So, let me ask you a seemingly simple question: If you feel that any
namestring (with the possible exception of orthographic variants) should
receive a GUID, then does the GUID represent the namestring per se? Or, does
it represent the name "object" (with implied secondary data such as
authorship, type specimen, etc.)?  If the namestring per se, then again I
ask why do we need GUIDs at all?  If the answer is "to disambiguate
homonyms", then it seems the GUID represents the abstract name "object",
rather than the sequence of text characters.

So, if a "Name GUID" is intended to represent a name "object", then what is
the definition and scope of such objects that would be candidates for GUIDs?
All names? Scientific names only?  What sort of attributes would these name
objects have?

> > Moreover, I don't think anyone has suggest that GUIDs be assigned
> > "solely"
> > to basionyms.
>
> Er, I quote from your earlier post.
>
>  > I completely agree -- but again, what gets a "Name" GUID? (as opposed
> to a
>  > "usage" GUID or a "concept" GUID)  Only basionyms? (I hope!)  Or also
>  > different combinations? (I hope not!) Or also spelling variants? (I
> *really*
>  > hope not!!)

I'll say it again: I don't think anyone has suggest that GUIDs be assigned
"solely" to basionyms. As you said yourself: "The idea that only the
resolution system needs to be able to distinguish between specimens, taxon
names, etc., seems unfortunate."  If I understand this sentence correctly,
you are saying that it's important to think of distinct "domains" of GUIDs.
For example, publications are a different domain from specimens, and taxon
names are a different domain still.  Please note in my quote above my
qualification of "Name" GUID.  I was talking about GUIDs within the Taxon
name "domain" only.  To say that I hope that "Name" GUIDs are only applied
to basionyms is *NOT* the same as saying that non-basionyms should not have
*any* GUID -- just that I don't think that non-basionyms fall within the
scope of what I believe should be treated as a name-object (i.e., candidate
for a "Name GUID").

Stated another way, if you dissect the informational components of
nomenclature, basionyms represent a (mostly) unambiguous and (mostly)
objectively-definable unit object.  Nomenclature beyond basionyms
(combinations, orthographic variants, etc.) *can* be thought of as
additional instances of name objects, but they can *also* be thought of as
attributes of name usage instances (a different domain from name objects).
My preference is the latter (not because I'm a zoologist, but because I see
it as more reflective of how nomenclatural information is really
structured).

> For me the point is that if you have a GUID, I can map to you and add
> value to my projects. Without a GUID, it's a pain.

But if the units to which we assign GUIDs are non-comparable, how can we
cross-map them (without it being a pain)?

> If it hurts don't do it! Put another way, why not make the mapping
> purely nomenclatural -- this name string also occurs in ITIS, and don't
> make any claims that whatever ITIS means by that name is what you mean.

I can do that just fine without GUIDs. My hope is that GUIDs will enable us
to exchange and harness information MUCH more effectively than we currently
can. In reference to your idea of how GUIDs should be assigned, you said it
yourself in your previous post: "we're pretty close to this already".  If
building an infrastructure of GUIDs only allows us to do incrementally more
than we can already now, then I wonder whether we need to bother with GUIDs
at all.  I would like to see the infrastructure established that allows us
to do much more than we can right now -- and that largely hinges on the
reusability aspect of GUIDs.

> Alternatively, let users decide what mappings they want to make between
> you and ITIS. Expertise is rare and widely distributed. I think you're
> making things harder for yourself than the need to be.

No, I'm trying to help foster a foundation for information exchange that
will make things easier for future generations of biologists (and
non-biologists).  With a few simple standards and conventions and
definitions, I think we can develop a system of GUIDs that lets us harness
biological information much more effectively than is possible now.

> >> If we think of the scientific literature, many paper have at least two
> >> GUIDs (DOI and PubMed), both of which are useful, and which serve
> >> different purposes.
> >
> > What is the value of expanding that to potentially hundreds, or
> > thousands of
> > GUIDs for each paper -- one GUID from each database that has a table of
> > literature records?  Wouldn't our information exchange be much more
> > efficient if we all adopted (or at least mapped to) either the DOI or
> > the
> > PubMed GUID (or both)? If so, why doesn't this same logic apply to
> > taxon
> > names?
>
> I guess my point was that this is a case where more than one GUID
> exists for the "same" thing (a publication), and we manage to cope.
> Yes, "one GUID to rule them all" might be nice, but I just don't buy
> that it's going to happen any time soon. Just look at the discussion
> this topic has generated...

Well, maybe we need two different discussion lists:  1) How to use GUIDs to
make what we already do a bit easier to manage (low cost, fast
implementation, incremental improvement); and 2) How to use GUIDs to
dramatically change the way biological data is exchanged (higher cost,
slower implementation, fundamental improvements).

> Again, I think we make a mistake if we equate GUIDs with solving all
> our problems. Rather, I suggest they give us an infrastructure on which
> to build tools for mapping names/concepts/whatever.

I think we are in full agreement on this.  I don't think that GUIDs can
solve all of our problems any more than you do.  However, I think they offer
the potential of solving a lot more problems than you seem to be reaching
for.

In a latr post, you wrote:

>   GUIDs enable you to explicitly identify objects, that's all.

We agree on this point -- but if I understand your position correctly, the
"objects" are the records in a database.  In my view, the "objects" are
definable, abstract nomenclatural objects that are shared across many, many
databases.  I also agree with your prior comment that converting LUIDs to
GUIDs using the format "<my database server>:<my data type>:<my id>" is very
easy.  I just don't think it allows us to do much more than we already can
do (as magnificently demonstrated by the taxonomic web portals that you and
the uBio folks have developed).

Aloha,
Rich




More information about the tdwg-tag mailing list