Globally Unique Identifier

Chuck Miller Chuck.Miller at MOBOT.ORG
Thu Sep 23 19:09:57 CEST 2004


Mahalo for your informative discussion, Rich.

A few questions.  You're pretty active on this so maybe you can help me out.

What about duplicate specimens?  Although a specimen may be MO 1234, K 5678
and P AABB, they may in fact all be SMITH 10001 and duplicates of the exact
same specimen, not different specimens. Is that one GUID or 3? When
attempting to use world-wide specimen records via GBIF for biodiversity
counts and species analyses, these duplicates artificially inflate the
counts significantly in some cases.

What about triplicate names?  IPNI is often given as the example for a set
of name records.  But, IPNI can have three records for the same exact name
and reference--one from IK, APNI and Grey Cards.  IPNI has no plans to ever
deduplicate these records due to the nature of the creation of the IPNI
collaboration.  So, do the three duplicate records get three GUIDs?

Where are the GUIDs actually to be perpetually located after they are
assigned?  Are all the originating organizations supposed to modify their
databases to add the GUID attribute and then build a mechanism to send out
their records and then receive the GUID back from somewhere and finally
update their records with it so the record+GUID can then in turn be
published from their database onto the web?

Couldn't agree more on the need for a single index/GUIDs to all references,
but beyond that is needed the single database containing all the GUIDS plus
the standard abbreviations and descriptions for them.  Nobody has this
database.  There are subsets like BPH and TL2.  But no single, definitive
list of all references, online, in one place with GUIDs.  This science needs
that in the worst way.

If a concept is Name+Reference, then don't IPNI and Tropicos contain
millions of concept records?

Thanks,
Chuck Miller
Chief Information Officer
Missouri Botanical Garden

-----Original Message-----
From: Richard Pyle [mailto:deepreef at BISHOPMUSEUM.ORG]
Sent: Thursday, September 23, 2004 6:29 PM
To: TDWG-SDD at LISTSERV.NHM.KU.EDU
Subject: Re: Globally Unique Identifier


I want to start by wholeheartedly endorsing Wouter's plea for
non-information-bearing (meaningless) GUIDs.  This feature is CRITICAL to
the long-term success of any GUID system.  It is absolutely imperative that
there NEVER be any motivation to change the content of a GUID (i.e., it
should be permanent).  If the GUID itself contains any information
whatsoever, there may be motivation to change that information at a later
time.

For this reason, I had initially preferred the DOI approach, but over time,
I am gradually warming up to the LSID approach.  While components of an LSID
do, indeed, represent information, they represent the one piece of
information that I think may legitimately belong embedded within a GUID:
context.  That is, the context, or domain, of the GUID itself.  The context
in this case would be the "issuer" of the GUID -- not necessarily the
current "owner" of the GUID (see more discussion on this below).  Though the
organization that issued a GUID may eventually disappear, the fact that the
organization was the one to issue the GUID in the first place will never
change, and thus represents a permanent and unchanging component of the
GUID.  Without the context portion, the GUID itself is really nothing more
than a random string of characters.  In summary, I'm warming up to the LSID
approach because it represents embedded context, without the risk of
temptation to change the content of a GUID after it has been issued.

Regarding Donald's PPT file, I have a couple of comments and questions:
(Assumes Title slide is "Slide 1")

Slide 2:
You note there is "No reliable mechanism" to relate the same record from
different providers to each other.  But in the context of DarwinCore, the
combination of [InstitutionCode]+[CollectionCode]+[CatalogNumber] should
represent a virtual GUID (provided that the Global Provider Registry ensures
no duplication of [InstitutionCode]). I do realize that words like "should"
and "reliable" are critical here. Perhaps the DarwinCore implementation
should enforce the requirement of uniqueness of
[CollectionCode]+[CatalogNumber] within a single [InstitutionCode], and
further ensure globally unique [InstitutionCode] values via the Global
Provider Registry.

Slide 3:
Wouldn't most of the problems indicated in the first four bulleted points be
largely solved by the Global Provider Registry? Using the [InstitutionCode]
would allow lookup in the registry for a (current/active) metadata URL, and
the metadata URL would provide information on where to access a particular
[CollectionCode]+[CatalogNumber] piece of data.

The issue of specimens changing numbers and/or collections is problematic,
of course.

The issue of versioning is a bit dicey, in my mind (e.g., at what resolution
of information change)?  Some things, like changing taxonomic determinations
(i.e., "real" changes) need to be handled in a robust way.  Other things,
like the correction of typos and different styles of representing the exact
same information (e.g., R.L. Pile==>R.L. Pyle; or R.L. Pyle==>Pyle, R.L.)
probably don't need to be versioned.  Other sorts of changes (e.g., the
elaboration of previously existing information, such as the addition of
retroactively-generated georeference coordinates) fall somewhere in-between.

Slide 4:
We should all get behind SEEK in addressing these issues (Taxon concept
mapping). Ultimately, we minimally need a GUID pool for References
(inclusive of unpublished works), and a GUID pool for what I call
"Protonyms" (original creations of IC_N Code-compliant names).  The union of
these two GUIDs (what I would call "Assertions") would itself represent a
GUID to a "potential concept" (Berendsohn). (Note: my preference would be to
define Protonyms as a subtype of Assertions, and therefore Protonym GUIDs
would be a subset drawn from the same pool as Assertion GUIDs -- but this is
a technical discussion for another time).

Slide 5:
Nice summary!!

Slide 6:
Good stuff here, but I'll respond with some of my personal opinions:

- RevisionID: see points of concern already expressed above

- Specimen Record LSIDs: I gather from subsequent slides that you recognize
two alternative approaches: having the "owner" of a specimen assign the LSID
within the context of their own <domainName>, or adopting GBIF as the
international standard issuer for ALL specimen GUID. In other words, GBIF
would represent the centralized issuer of GUIDs for all biological
specimens, and the biological specimen community would/should rally around
GBIF for thus purpose, and adopt GBIF specimen GUIDs as their own.  I
personally have no problem with this (I do not live in fear of "Big Brother"
centralization when it serves the benefit of all, as I believe it would in
this case) -- but I know there are many who might have a problem with it,
and therefore it might not garner widespread adoption without large volumes
of "fuss".

If, on the other hand, each organization issues its own GUIDs for its own
set of specimens, then the question is when, if ever, GBIF would assign a
specimen GUID?  Perhaps as a surrogate for institutions that lack the
technological ability to assign their own LSIDs?  But I wonder, how many
institutions that could server electronic data of their holdings to the
internet would lack the ability to assign their own LSIDs?

As you've outlined in subsequent slides, I see two alternative paths:  A)
Get the biological world to rally around GBIF as the centralized provider of
GUIDs for specimens for all collections; or B) Have each
collection/institution issue its own set of LSIDs for its own specimens, and
have GBIF adopt those LSIDs for its own internal purposes.  I could get
behind either approach, but I see danger in the adoption of a mixture of
these two approaches. I'll defer elaboration, but a lot of it has to do with
potential confusion about whether the GUID applies fundamentally to the
physical specimen, or the electronic conglomeration of data associated with
the specimen. Also, I think we should avoid the risk of assigning two
separate GUIDs for the same "single data element" (sensu your Slide 5).

- Name record LSIDs:  I understand the example of an IPNI LSID for a plant
name, and presumably there would be analogous "Catalog of Fishes" LSIDs for
each fish name, etc.  But I don't think that would be a wise approach.
Unlike specimen records, where there are fairly unambiguous "owner"
institutions (or at least "original owner" institutions that issued a GUID),
taxonomic aggregators (IPNI, ITIS, Species2000, GBIF, uBio, etc.) are most
certainly not owners of the taxonomic names that they include in their
databases.  We would want to avoid the risk of duplicate GUIDs for the same
name, and thus the need for mapping, e.g., an IPNI GUID for a name to its
ITIS equivalent.  Again, I can't help but think that the world will be a
better place if we can avoid assigning multiple GUIDs to the same "single
data element".

One approach would be to rally around GBIF, and rely on them to issue GUIDs
for all taxon names.  However, I also recognize that we do not exist in a
political/personality vacuum with regards to "ownership" of taxonomic names,
or the electronic representations thereof.  Therefore, the closest thing
that exists to an "owner" of a taxonomic name is the Commission of
Nomenclature (and it's respective Code of Nomenclature) under which the name
was established.  Thus, when it comes to assigning GUIDs for names (not
concepts), I would propose the following:

urn:lsid:ICZN.org:TaxonName:XXXXXX (all zoological names)
urn:lsid:ICBN.org:TaxonName:XXXXXX (all botanical names) urn:lsid:ICNB[or
LBSN??].org:TaxonName:XXXXXX (all bacteriological names) urn:lsid:ICTV[or
ICVCN??].org:TaxonName:XXXXXX (all virus names)

In an ideal world, we'd get to the point where there would be a need for
only one registrar of nomenclature, e.g.:
urn:lsid:BioCode.org:TaxonName:XXXXXXX

Or, perhaps:
urn:lsid:gbif.net:TaxonName:XXXXXXX

But I don't think we're quite there yet.

In any case, the idea would be for the taxon name aggregators to adopt the
unambiguously unique GUID for each taxon name.

Taxonomic concepts are a whole 'nother ball of wax....

Slide 8:
I actually prefer this approach (GBIF as the central issuer of specimen
GUIDs), for a variety of reasons.  One of the main reasons is that it would
assure uniqueness of an integer within a given <namespace> (e.g.,
Specimens), which would make things a bit easier for those of us who like to
use integers as primary keys in databases.  In other words, it avoids the
possibility of urn:lsid:bishopmuseum.org:Specimen:1234567 colliding with
urn:lsid:usnm.gov:Specimen:1234567, when reducing the GUID to just its
integer component for local application purposes (where context can be
enforced by other means).  However, I should point something out regarding
the "Advantage" part of this slide, which is that the "problem" of
transferring record locations doesn't exist, provided that the <domainName>
component of the LSID is taken as the issuer of the GUID, not as the current
owner of the specimen.  In other words, if Bishop Museum assigned GUID
urn:lsid:bishopmuseum.org:Specimen:1234567 to a specimen, and then gave that
specimen to Smithsonian, then Smithsonian would retain the complete GUID
intact as: urn:lsid:bishopmuseum.org:Specimen:1234567.

The danger comes when you try to use the <domainName> component as metadata
to represent the current location of the specimen and/or its electronically
represented data.  This is where Wouter's original point about 'meaningless'
GUIDs comes into play. If the whole point of using LSIDs is to embed the
"current location" information within the ID itself so that applications can
retrieve additional data associated with the GUID directly, then I have some
concerns (mostly address already).

Why there is a reference to urn:lsid:gbif.net:TaxonConcept:106734 at the top
of this slide???

Slide 9:
Again, I'm not sure I understand on this slide why there is a reference to
urn:lsid:ipni.org:TaxonName:82090-3:1.1
Also, in this model, what function does the LSID serve that is not met by
the concatenated [InstitutionCode]+[CollectionCode]+[CatalogNumber] (in the
context of Global Provider Registry).

Slide 10 (taxon concepts and literature):
This message is already getting too long... :-)
I already touched on this above under "Slide 4".  I definitely agree that we
need a GUID system for References.  This should include more than just
published references. It doesn't quite exist yet among the existing
Reference registrars (as far as I can tell) to accommodate the specific
needs of taxonomists (e.g. referring to a subsection of a reference as
representing an original taxonomic description), so I do see a need to
create a Reference GUID system specific to biology.  I could rant for pages
on this, but I'll summarize simply with a plea to *DEFINE* a Concept GUID as
an intersection between an Name GUID and a Reference GUID (i.e., what I
would call an "Assertion").  Not all Name-Reference combinations will be
worthy of recognition as a distinct "Concept", but all are *potentially*
representative of a concept (Berendsohn), and thus all should be drawn from
the same pool of GUIDs as Concept GUIDs.  In other words, "Concepts" should
be thought of as a subtype of Name-Reference instances.  I would go further
to suggest (as I did above) that "Name" GUIDs should also be a subtype of
Name-Reference instances (non-exclusive of Concept subtype instances), using
the Name-Reference instance that represents the Code-recognized original
description of the name as the "handle" to the Name.

By this approach, you need only two GUID object classes <objectClass>: one
for References, and one for Name-Reference intersections (Assertions). The
latter of these could serve as the source for both Concept GUIDs and Name
GUIDs.

Last Slide:

My own answers to your questions:

1) Are LSIDs the most appropriate technology?

        I'm increasingly coming to that conclusion.

2) Should identifiers be assigned and resolved centrally or via a fully
distributed model (or should providers have the option of using either
model)?

        I think the best option would be central.  The next option would be
full distributed.  Leaving it as an option would, in my opinion, be a BIG
mistake.

3) Which objects should receive identifiers?

        Specimens, References, Name-Reference intersections (Assertions),
and perhaps Agents.  [TaxonNames and Concepts can be subsets of
Name-Reference intersections].

3a) Should we develop a set of object classes for biodiversity informatics
and assign identifiers to instances of all of these?

        I think so, yes. Of course, it depends a bit on who you mean by
"we".  I'm thinking sensu lato.

3b) Should identifiers be associated with real world objects (e.g.
specimens), or with digitised records representing them (e.g. perhaps
multiple records representing different digitisation attempts by different
researchers for the same specimen), or both?

        I would say definitely real-world objects (treating things like
Code-recognized original descriptions of taxon names, and citable references
as "real-world objects").  I do NOT think we should have separate GUIDs for
digital representations thereof.  Alternative digital representations are
simply clutter that will eventually be weeded out of the system, once we all
get organized on this stuff, and harness the power of the internet to
implement a global editing/QA system.

4) What should be done about existing records without identifiers?

        As far as I know, ALL records are currently without identifiers
(unless someone established a widely accepted GUID system and I missed the
announcement...)

4a) Should they be left alone?

        Ultimately, no.

4b) Should they all be updated with identifiers?

        Ultimately, yes.

4c) Should the provider software be modified to generate "soft" identifiers
(ones which we cannot guarantee in all cases to be unique) based e.g. on the
combination of InstitutionCode, CollectionCode and CatalogNumber?

        As an interim solution, perhaps.  See my comments under "Slide 2"
above.

5) Are revision identifiers a useful feature?

        I would like to think not.  If the information is truly dynamic over
time (e.g., re-determinations of taxonomic identity of specimens), then
individual instances should probably receive their own set of GUIDs (as
opposed to versions of the "parent" GUID).  If the information is static
over time, and changes represent objective corrections, then I don't see a
real need to track that within the context of a GUID (record edit history
may or may not need to be tracked, but this seems to me to be a separate
issue from GUIDs).

5b) How many providers will be able to provide and handle them?

        If versioning is incorporated, then it should be designed such that
a "default" version is provided automatically when versioning is not
handled.


Sorry for the long post, but I feel that this issue is extremely important
at this point in bioinformatics history.

Aloha,
Rich

Richard L. Pyle, PhD
Natural Sciences Database Coordinator, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://www.bishopmuseum.org/bishop/HBS/pylerichard.html

> -----Original Message-----
> From: TDWG - Structure of Descriptive Data
> [mailto:TDWG-SDD at LISTSERV.NHM.KU.EDU]On Behalf Of Donald Hobern
> Sent: Thursday, September 23, 2004 6:22 AM
> To: TDWG-SDD at LISTSERV.NHM.KU.EDU
> Subject: Re: Globally Unique Identifier
>
>
> This is precisely one of the key questions we need to address with any
> identifier framework we adopt.  I think we could easily use LSIDs in a
> way that should overcome your concerns, and I think that the built-in
> mechanisms for discovery and metadata access within the LSID model are
> really exciting.
>
> I have just put together a PowerPoint presentation to explain some of
> what I think we could achieve with globally unique identifiers and
> particularly with LSIDS.  It can be downloaded from:
>
> http://circa.gbif.net/Public/irc/gbif/dadi/library?l=/architecture/glo
> ba
> llyuniqueidentifier/
>
> It may be clearest if you go through it as a slide show rather than in
> edit mode.
>
> Thanks,
>
> Donald
>
> ---------------------------------------------------------------
> Donald Hobern (dhobern at gbif.org)
> Programme Officer for Data Access and Database Interoperability Global
> Biodiversity Information Facility Secretariat Universitetsparken 15,
> DK-2100 Copenhagen, Denmark
> Tel: +45-35321483   Mobile: +45-28751483   Fax: +45-35321480
> ---------------------------------------------------------------
>
>
> -----Original Message-----
> From: TDWG - Structure of Descriptive Data
> [mailto:TDWG-SDD at LISTSERV.NHM.KU.EDU] On Behalf Of Wouter Addink
> Sent: 23. september 2004 17:38
> To: TDWG-SDD at LISTSERV.NHM.KU.EDU
> Subject: Re: Globally Unique Identifier
>
> It seems that DOI allows for any existing IDs to be used as part of
> the unique identifier. That seems to me as a fast to adopt short term
> solution but not a good idea for the long term. At first sight I very
> much liked the
> LSID specification, but the longer I think about it, the less I like
> some
> parts. What I think is missing in the LSID specification is that the
> unique
> identifier should be 'meaningless' apart from being an identifier to
> become
> time independent (and to avoid possible political problems). Any
> solution
> with a URN I can think of has some meaning, which makes solutions like a
> MAC-address generated GUID favorable in my opinion. And any meaning you
> need
> (like an authority of an object) can be specified in metadata instead of
> using it in the identifier. What is not very clear to me in the LSID
> specification is where the LSID generated by a LSIDAssigningService is
> actually stored.
>
> Wouter Addink
>
> ----- Original Message -----
> From: "Gregor Hagedorn" <G.Hagedorn at BBA.DE>
> To: <TDWG-SDD at LISTSERV.NHM.KU.EDU>
> Sent: Wednesday, September 08, 2004 6:20 PM
> Subject: Re: Globally Unique Identifier
>
>
> >I am not quite sure, but to me it seems with "GUID" you refer to the
> >numeric, MAC-address generated GUID type. I have nothing against
> >these. However, any URN in my view is a GUID that has most of the
> >properties you mention:
> >
> >> - it is guaranteed to be unique globally, and can be created
> anywhere,
> >> anytime by any server or client machine - it has no meaning as to
> >> where the data is physically located and will there not confuse any
> >> user about this
> >
> >> - most id
> >> mechanisms, especially URI/URN ids require a 'governing body' to
> >> handle namespaces/urls to ensure every URN is unique, whereas a
> >> GUID is always unique
> >
> > The governing body is restricted to the primary web address, and in
> > most cases such an address is already available. Being a member of a
> > governmental institution that explicitly forbids the use without
> > prior consent, and forbids the use of its domain name once you are
> > no longer working for them, I realize some potential for problem.
> >
> >> I do think a URL of some kind would be useful for things such as
> >> global searches of multiple databases, as this will allow the
> >> search to go directly to the data source where the name, referene,
> >> etc comes from.  But this should not be part of its ID.  Maybe a
> >> name/id should have several foms, a GUID for an ID and a URL + a
> >> GUID for a fully specified name.
> >>
> >> What are the current thoughts on these ideas?
> >
> > A GUID is only part of the problem. The other half of the problem is
> > actually getting at the resource. URN schemes like DOI or LSID (I
> > prefer the latter) intend to define resolution mechanisms. That make
> > the URN not yet a URL - in my view the good comes with the good,
> > location and reorganization independence.
> >
> > I believe GBIF should install such an LSID resolver, which is why in
> > the UBIF proxy model, under Links, I propose to support a general
> > URL (including potentially URNS), a typed LSID and a typed DOI. This
> > could be simplified to have just a URN (LSID and DOI are URNs), but
> > that would then require string parsing to determine and recognize
> > the preferred resolvable GUID types. Comments on splitting/not
> > splitting this are welcome!
> >
> > There may be some need to define a non-resolvable URN/numeric GUID
> > as well. However, that would not be under the linking question. Is
> > it correct that linking requires resolvability, or am I thinking
> > into a wrong direction?
> >
> > Gregor
> >>
> >
> >
> > ----------------------------------------------------------
> > Gregor Hagedorn (G.Hagedorn at bba.de)
> > Institute for Plant Virology, Microbiology, and Biosafety Federal
> > Research Center for Agriculture and Forestry (BBA)
> > Koenigin-Luise-Str. 19          Tel: +49-30-8304-2220
> > 14195 Berlin, Germany           Fax: +49-30-8304-2203
> >
> > Often wrong but never in doubt!

------_=_NextPart_001_01C4A1CA.D36788A0
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2654.45">
<TITLE>RE: Globally Unique Identifier</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>Mahalo for your informative discussion, Rich.</FONT>
</P>

<P><FONT SIZE=3D2>A few questions.&nbsp; You're pretty active on this =
so maybe you can help me out.</FONT>
</P>

<P><FONT SIZE=3D2>What about duplicate specimens?&nbsp; Although a =
specimen may be MO 1234, K 5678 and P AABB, they may in fact all be =
SMITH 10001 and duplicates of the exact same specimen, not different =
specimens. Is that one GUID or 3? When attempting to use world-wide =
specimen records via GBIF for biodiversity counts and species analyses, =
these duplicates artificially inflate the counts significantly in some =
cases.</FONT></P>

<P><FONT SIZE=3D2>What about triplicate names?&nbsp; IPNI is often =
given as the example for a set of name records.&nbsp; But, IPNI can =
have three records for the same exact name and reference--one from IK, =
APNI and Grey Cards.&nbsp; IPNI has no plans to ever deduplicate these =
records due to the nature of the creation of the IPNI =
collaboration.&nbsp; So, do the three duplicate records get three =
GUIDs?&nbsp; </FONT></P>

<P><FONT SIZE=3D2>Where are the GUIDs actually to be perpetually =
located after they are assigned?&nbsp; Are all the originating =
organizations supposed to modify their databases to add the GUID =
attribute and then build a mechanism to send out their records and then =
receive the GUID back from somewhere and finally update their records =
with it so the record+GUID can then in turn be published from their =
database onto the web?</FONT></P>

<P><FONT SIZE=3D2>Couldn't agree more on the need for a single =
index/GUIDs to all references, but beyond that is needed the single =
database containing all the GUIDS plus the standard abbreviations and =
descriptions for them.&nbsp; Nobody has this database.&nbsp; There are =
subsets like BPH and TL2.&nbsp; But no single, definitive list of all =
references, online, in one place with GUIDs.&nbsp; This science needs =
that in the worst way.</FONT></P>

<P><FONT SIZE=3D2>If a concept is Name+Reference, then don't IPNI and =
Tropicos contain millions of concept records?</FONT>
</P>

<P><FONT SIZE=3D2>Thanks,</FONT>
<BR><FONT SIZE=3D2>Chuck Miller</FONT>
<BR><FONT SIZE=3D2>Chief Information Officer</FONT>
<BR><FONT SIZE=3D2>Missouri Botanical Garden</FONT>
<BR><FONT SIZE=3D2>&nbsp; </FONT>
<BR><FONT SIZE=3D2>-----Original Message-----</FONT>
<BR><FONT SIZE=3D2>From: Richard Pyle [<A =
HREF=3D"mailto:deepreef at BISHOPMUSEUM.ORG">mailto:deepreef at BISHOPMUSEUM.O=
RG</A>] </FONT>
<BR><FONT SIZE=3D2>Sent: Thursday, September 23, 2004 6:29 PM</FONT>
<BR><FONT SIZE=3D2>To: TDWG-SDD at LISTSERV.NHM.KU.EDU</FONT>
<BR><FONT SIZE=3D2>Subject: Re: Globally Unique Identifier</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>I want to start by wholeheartedly endorsing Wouter's =
plea for non-information-bearing (meaningless) GUIDs.&nbsp; This =
feature is CRITICAL to the long-term success of any GUID system.&nbsp; =
It is absolutely imperative that there NEVER be any motivation to =
change the content of a GUID (i.e., it should be permanent).&nbsp; If =
the GUID itself contains any information whatsoever, there may be =
motivation to change that information at a later time.</FONT></P>

<P><FONT SIZE=3D2>For this reason, I had initially preferred the DOI =
approach, but over time, I am gradually warming up to the LSID =
approach.&nbsp; While components of an LSID do, indeed, represent =
information, they represent the one piece of information that I think =
may legitimately belong embedded within a GUID: context.&nbsp; That is, =
the context, or domain, of the GUID itself.&nbsp; The context in this =
case would be the &quot;issuer&quot; of the GUID -- not necessarily the =
current &quot;owner&quot; of the GUID (see more discussion on this =
below).&nbsp; Though the organization that issued a GUID may eventually =
disappear, the fact that the organization was the one to issue the GUID =
in the first place will never change, and thus represents a permanent =
and unchanging component of the GUID.&nbsp; Without the context =
portion, the GUID itself is really nothing more than a random string of =
characters.&nbsp; In summary, I'm warming up to the LSID approach =
because it represents embedded context, without the risk of temptation =
to change the content of a GUID after it has been issued.</FONT></P>

<P><FONT SIZE=3D2>Regarding Donald's PPT file, I have a couple of =
comments and questions: (Assumes Title slide is &quot;Slide =
1&quot;)</FONT>
</P>

<P><FONT SIZE=3D2>Slide 2:</FONT>
<BR><FONT SIZE=3D2>You note there is &quot;No reliable mechanism&quot; =
to relate the same record from different providers to each other.&nbsp; =
But in the context of DarwinCore, the combination of =
[InstitutionCode]+[CollectionCode]+[CatalogNumber] should represent a =
virtual GUID (provided that the Global Provider Registry ensures no =
duplication of [InstitutionCode]). I do realize that words like =
&quot;should&quot; and &quot;reliable&quot; are critical here. Perhaps =
the DarwinCore implementation should enforce the requirement of =
uniqueness of [CollectionCode]+[CatalogNumber] within a single =
[InstitutionCode], and further ensure globally unique [InstitutionCode] =
values via the Global Provider Registry.</FONT></P>

<P><FONT SIZE=3D2>Slide 3:</FONT>
<BR><FONT SIZE=3D2>Wouldn't most of the problems indicated in the first =
four bulleted points be largely solved by the Global Provider Registry? =
Using the [InstitutionCode] would allow lookup in the registry for a =
(current/active) metadata URL, and the metadata URL would provide =
information on where to access a particular =
[CollectionCode]+[CatalogNumber] piece of data.</FONT></P>

<P><FONT SIZE=3D2>The issue of specimens changing numbers and/or =
collections is problematic, of course.</FONT>
</P>

<P><FONT SIZE=3D2>The issue of versioning is a bit dicey, in my mind =
(e.g., at what resolution of information change)?&nbsp; Some things, =
like changing taxonomic determinations (i.e., &quot;real&quot; changes) =
need to be handled in a robust way.&nbsp; Other things, like the =
correction of typos and different styles of representing the exact same =
information (e.g., R.L. Pile=3D=3D&gt;R.L. Pyle; or R.L. =
Pyle=3D=3D&gt;Pyle, R.L.) probably don't need to be versioned.&nbsp; =
Other sorts of changes (e.g., the elaboration of previously existing =
information, such as the addition of retroactively-generated =
georeference coordinates) fall somewhere in-between.</FONT></P>

<P><FONT SIZE=3D2>Slide 4:</FONT>
<BR><FONT SIZE=3D2>We should all get behind SEEK in addressing these =
issues (Taxon concept mapping). Ultimately, we minimally need a GUID =
pool for References (inclusive of unpublished works), and a GUID pool =
for what I call &quot;Protonyms&quot; (original creations of IC_N =
Code-compliant names).&nbsp; The union of these two GUIDs (what I would =
call &quot;Assertions&quot;) would itself represent a GUID to a =
&quot;potential concept&quot; (Berendsohn). (Note: my preference would =
be to define Protonyms as a subtype of Assertions, and therefore =
Protonym GUIDs would be a subset drawn from the same pool as Assertion =
GUIDs -- but this is a technical discussion for another =
time).</FONT></P>

<P><FONT SIZE=3D2>Slide 5:</FONT>
<BR><FONT SIZE=3D2>Nice summary!!</FONT>
</P>

<P><FONT SIZE=3D2>Slide 6:</FONT>
<BR><FONT SIZE=3D2>Good stuff here, but I'll respond with some of my =
personal opinions:</FONT>
</P>

<P><FONT SIZE=3D2>- RevisionID: see points of concern already expressed =
above</FONT>
</P>

<P><FONT SIZE=3D2>- Specimen Record LSIDs: I gather from subsequent =
slides that you recognize two alternative approaches: having the =
&quot;owner&quot; of a specimen assign the LSID within the context of =
their own &lt;domainName&gt;, or adopting GBIF as the international =
standard issuer for ALL specimen GUID. In other words, GBIF would =
represent the centralized issuer of GUIDs for all biological specimens, =
and the biological specimen community would/should rally around GBIF =
for thus purpose, and adopt GBIF specimen GUIDs as their own.&nbsp; I =
personally have no problem with this (I do not live in fear of =
&quot;Big Brother&quot; centralization when it serves the benefit of =
all, as I believe it would in this case) -- but I know there are many =
who might have a problem with it, and therefore it might not garner =
widespread adoption without large volumes of =
&quot;fuss&quot;.</FONT></P>

<P><FONT SIZE=3D2>If, on the other hand, each organization issues its =
own GUIDs for its own set of specimens, then the question is when, if =
ever, GBIF would assign a specimen GUID?&nbsp; Perhaps as a surrogate =
for institutions that lack the technological ability to assign their =
own LSIDs?&nbsp; But I wonder, how many institutions that could server =
electronic data of their holdings to the internet would lack the =
ability to assign their own LSIDs?</FONT></P>

<P><FONT SIZE=3D2>As you've outlined in subsequent slides, I see two =
alternative paths:&nbsp; A) Get the biological world to rally around =
GBIF as the centralized provider of GUIDs for specimens for all =
collections; or B) Have each collection/institution issue its own set =
of LSIDs for its own specimens, and have GBIF adopt those LSIDs for its =
own internal purposes.&nbsp; I could get behind either approach, but I =
see danger in the adoption of a mixture of these two approaches. I'll =
defer elaboration, but a lot of it has to do with potential confusion =
about whether the GUID applies fundamentally to the physical specimen, =
or the electronic conglomeration of data associated with the specimen. =
Also, I think we should avoid the risk of assigning two separate GUIDs =
for the same &quot;single data element&quot; (sensu your Slide =
5).</FONT></P>

<P><FONT SIZE=3D2>- Name record LSIDs:&nbsp; I understand the example =
of an IPNI LSID for a plant name, and presumably there would be =
analogous &quot;Catalog of Fishes&quot; LSIDs for each fish name, =
etc.&nbsp; But I don't think that would be a wise approach. Unlike =
specimen records, where there are fairly unambiguous &quot;owner&quot; =
institutions (or at least &quot;original owner&quot; institutions that =
issued a GUID), taxonomic aggregators (IPNI, ITIS, Species2000, GBIF, =
uBio, etc.) are most certainly not owners of the taxonomic names that =
they include in their databases.&nbsp; We would want to avoid the risk =
of duplicate GUIDs for the same name, and thus the need for mapping, =
e.g., an IPNI GUID for a name to its ITIS equivalent.&nbsp; Again, I =
can't help but think that the world will be a better place if we can =
avoid assigning multiple GUIDs to the same &quot;single data =
element&quot;.</FONT></P>

<P><FONT SIZE=3D2>One approach would be to rally around GBIF, and rely =
on them to issue GUIDs for all taxon names.&nbsp; However, I also =
recognize that we do not exist in a political/personality vacuum with =
regards to &quot;ownership&quot; of taxonomic names, or the electronic =
representations thereof.&nbsp; Therefore, the closest thing that exists =
to an &quot;owner&quot; of a taxonomic name is the Commission of =
Nomenclature (and it's respective Code of Nomenclature) under which the =
name was established.&nbsp; Thus, when it comes to assigning GUIDs for =
names (not concepts), I would propose the following:</FONT></P>

<P><FONT SIZE=3D2>urn:lsid:ICZN.org:TaxonName:XXXXXX (all zoological =
names) urn:lsid:ICBN.org:TaxonName:XXXXXX (all botanical names) =
urn:lsid:ICNB[or LBSN??].org:TaxonName:XXXXXX (all bacteriological =
names) urn:lsid:ICTV[or ICVCN??].org:TaxonName:XXXXXX (all virus =
names)</FONT></P>

<P><FONT SIZE=3D2>In an ideal world, we'd get to the point where there =
would be a need for only one registrar of nomenclature, e.g.: =
urn:lsid:BioCode.org:TaxonName:XXXXXXX</FONT></P>

<P><FONT SIZE=3D2>Or, perhaps:</FONT>
<BR><FONT SIZE=3D2>urn:lsid:gbif.net:TaxonName:XXXXXXX</FONT>
</P>

<P><FONT SIZE=3D2>But I don't think we're quite there yet.</FONT>
</P>

<P><FONT SIZE=3D2>In any case, the idea would be for the taxon name =
aggregators to adopt the unambiguously unique GUID for each taxon =
name.</FONT></P>

<P><FONT SIZE=3D2>Taxonomic concepts are a whole 'nother ball of =
wax....</FONT>
</P>

<P><FONT SIZE=3D2>Slide 8:</FONT>
<BR><FONT SIZE=3D2>I actually prefer this approach (GBIF as the central =
issuer of specimen GUIDs), for a variety of reasons.&nbsp; One of the =
main reasons is that it would assure uniqueness of an integer within a =
given &lt;namespace&gt; (e.g., Specimens), which would make things a =
bit easier for those of us who like to use integers as primary keys in =
databases.&nbsp; In other words, it avoids the possibility of =
urn:lsid:bishopmuseum.org:Specimen:1234567 colliding with =
urn:lsid:usnm.gov:Specimen:1234567, when reducing the GUID to just its =
integer component for local application purposes (where context can be =
enforced by other means).&nbsp; However, I should point something out =
regarding the &quot;Advantage&quot; part of this slide, which is that =
the &quot;problem&quot; of transferring record locations doesn't exist, =
provided that the &lt;domainName&gt; component of the LSID is taken as =
the issuer of the GUID, not as the current owner of the specimen.&nbsp; =
In other words, if Bishop Museum assigned GUID =
urn:lsid:bishopmuseum.org:Specimen:1234567 to a specimen, and then gave =
that specimen to Smithsonian, then Smithsonian would retain the =
complete GUID intact as: =
urn:lsid:bishopmuseum.org:Specimen:1234567.</FONT></P>

<P><FONT SIZE=3D2>The danger comes when you try to use the =
&lt;domainName&gt; component as metadata to represent the current =
location of the specimen and/or its electronically represented =
data.&nbsp; This is where Wouter's original point about 'meaningless' =
GUIDs comes into play. If the whole point of using LSIDs is to embed =
the &quot;current location&quot; information within the ID itself so =
that applications can retrieve additional data associated with the GUID =
directly, then I have some concerns (mostly address =
already).</FONT></P>

<P><FONT SIZE=3D2>Why there is a reference to =
urn:lsid:gbif.net:TaxonConcept:106734 at the top of this =
slide???</FONT>
</P>

<P><FONT SIZE=3D2>Slide 9:</FONT>
<BR><FONT SIZE=3D2>Again, I'm not sure I understand on this slide why =
there is a reference to urn:lsid:ipni.org:TaxonName:82090-3:1.1</FONT>
<BR><FONT SIZE=3D2>Also, in this model, what function does the LSID =
serve that is not met by the concatenated =
[InstitutionCode]+[CollectionCode]+[CatalogNumber] (in the context of =
Global Provider Registry).</FONT></P>

<P><FONT SIZE=3D2>Slide 10 (taxon concepts and literature):</FONT>
<BR><FONT SIZE=3D2>This message is already getting too long... =
:-)</FONT>
<BR><FONT SIZE=3D2>I already touched on this above under &quot;Slide =
4&quot;.&nbsp; I definitely agree that we need a GUID system for =
References.&nbsp; This should include more than just published =
references. It doesn't quite exist yet among the existing Reference =
registrars (as far as I can tell) to accommodate the specific needs of =
taxonomists (e.g. referring to a subsection of a reference as =
representing an original taxonomic description), so I do see a need to =
create a Reference GUID system specific to biology.&nbsp; I could rant =
for pages on this, but I'll summarize simply with a plea to *DEFINE* a =
Concept GUID as an intersection between an Name GUID and a Reference =
GUID (i.e., what I would call an &quot;Assertion&quot;).&nbsp; Not all =
Name-Reference combinations will be worthy of recognition as a distinct =
&quot;Concept&quot;, but all are *potentially* representative of a =
concept (Berendsohn), and thus all should be drawn from the same pool =
of GUIDs as Concept GUIDs.&nbsp; In other words, &quot;Concepts&quot; =
should be thought of as a subtype of Name-Reference instances.&nbsp; I =
would go further to suggest (as I did above) that &quot;Name&quot; =
GUIDs should also be a subtype of Name-Reference instances =
(non-exclusive of Concept subtype instances), using the Name-Reference =
instance that represents the Code-recognized original description of =
the name as the &quot;handle&quot; to the Name.</FONT></P>

<P><FONT SIZE=3D2>By this approach, you need only two GUID object =
classes &lt;objectClass&gt;: one for References, and one for =
Name-Reference intersections (Assertions). The latter of these could =
serve as the source for both Concept GUIDs and Name GUIDs.</FONT></P>

<P><FONT SIZE=3D2>Last Slide:</FONT>
</P>

<P><FONT SIZE=3D2>My own answers to your questions:</FONT>
</P>

<P><FONT SIZE=3D2>1) Are LSIDs the most appropriate technology?</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; I'm =
increasingly coming to that conclusion.</FONT>
</P>

<P><FONT SIZE=3D2>2) Should identifiers be assigned and resolved =
centrally or via a fully distributed model (or should providers have =
the option of using either model)?</FONT></P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; I think =
the best option would be central.&nbsp; The next option would be full =
distributed.&nbsp; Leaving it as an option would, in my opinion, be a =
BIG mistake.</FONT></P>

<P><FONT SIZE=3D2>3) Which objects should receive identifiers?</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Specimens, =
References, Name-Reference intersections (Assertions), and perhaps =
Agents.&nbsp; [TaxonNames and Concepts can be subsets of Name-Reference =
intersections].</FONT></P>

<P><FONT SIZE=3D2>3a) Should we develop a set of object classes for =
biodiversity informatics and assign identifiers to instances of all of =
these?</FONT></P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; I think =
so, yes. Of course, it depends a bit on who you mean by =
&quot;we&quot;.&nbsp; I'm thinking sensu lato.</FONT>
</P>

<P><FONT SIZE=3D2>3b) Should identifiers be associated with real world =
objects (e.g. specimens), or with digitised records representing them =
(e.g. perhaps multiple records representing different digitisation =
attempts by different researchers for the same specimen), or =
both?</FONT></P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; I would =
say definitely real-world objects (treating things like Code-recognized =
original descriptions of taxon names, and citable references as =
&quot;real-world objects&quot;).&nbsp; I do NOT think we should have =
separate GUIDs for digital representations thereof.&nbsp; Alternative =
digital representations are simply clutter that will eventually be =
weeded out of the system, once we all get organized on this stuff, and =
harness the power of the internet to implement a global editing/QA =
system.</FONT></P>

<P><FONT SIZE=3D2>4) What should be done about existing records without =
identifiers?</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; As far as =
I know, ALL records are currently without identifiers (unless someone =
established a widely accepted GUID system and I missed the</FONT></P>

<P><FONT SIZE=3D2>announcement...)</FONT>
</P>

<P><FONT SIZE=3D2>4a) Should they be left alone?</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
Ultimately, no.</FONT>
</P>

<P><FONT SIZE=3D2>4b) Should they all be updated with =
identifiers?</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
Ultimately, yes.</FONT>
</P>

<P><FONT SIZE=3D2>4c) Should the provider software be modified to =
generate &quot;soft&quot; identifiers (ones which we cannot guarantee =
in all cases to be unique) based e.g. on the combination of =
InstitutionCode, CollectionCode and CatalogNumber?</FONT></P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; As an =
interim solution, perhaps.&nbsp; See my comments under &quot;Slide =
2&quot; above.</FONT>
</P>

<P><FONT SIZE=3D2>5) Are revision identifiers a useful feature?</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; I would =
like to think not.&nbsp; If the information is truly dynamic over time =
(e.g., re-determinations of taxonomic identity of specimens), then =
individual instances should probably receive their own set of GUIDs (as =
opposed to versions of the &quot;parent&quot; GUID).&nbsp; If the =
information is static over time, and changes represent objective =
corrections, then I don't see a real need to track that within the =
context of a GUID (record edit history may or may not need to be =
tracked, but this seems to me to be a separate issue from =
GUIDs).</FONT></P>

<P><FONT SIZE=3D2>5b) How many providers will be able to provide and =
handle them?</FONT>
</P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; If =
versioning is incorporated, then it should be designed such that a =
&quot;default&quot; version is provided automatically when versioning =
is not handled.</FONT></P>
<BR>

<P><FONT SIZE=3D2>Sorry for the long post, but I feel that this issue =
is extremely important at this point in bioinformatics history.</FONT>
</P>

<P><FONT SIZE=3D2>Aloha,</FONT>
<BR><FONT SIZE=3D2>Rich</FONT>
</P>

<P><FONT SIZE=3D2>Richard L. Pyle, PhD</FONT>
<BR><FONT SIZE=3D2>Natural Sciences Database Coordinator, Bishop =
Museum</FONT>
<BR><FONT SIZE=3D2>1525 Bernice St., Honolulu, HI 96817</FONT>
<BR><FONT SIZE=3D2>Ph: (808)848-4115, Fax: (808)847-8252</FONT>
<BR><FONT SIZE=3D2>email: deepreef at bishopmuseum.org <A =
HREF=3D"http://www.bishopmuseum.org/bishop/HBS/pylerichard.html" =
TARGET=3D"_blank">http://www.bishopmuseum.org/bishop/HBS/pylerichard.htm=
l</A></FONT>
</P>

<P><FONT SIZE=3D2>&gt; -----Original Message-----</FONT>
<BR><FONT SIZE=3D2>&gt; From: TDWG - Structure of Descriptive Data =
</FONT>
<BR><FONT SIZE=3D2>&gt; [<A =
HREF=3D"mailto:TDWG-SDD at LISTSERV.NHM.KU.EDU">mailto:TDWG-SDD at LISTSERV.NH=
M.KU.EDU</A>]On Behalf Of Donald Hobern</FONT>
<BR><FONT SIZE=3D2>&gt; Sent: Thursday, September 23, 2004 6:22 =
AM</FONT>
<BR><FONT SIZE=3D2>&gt; To: TDWG-SDD at LISTSERV.NHM.KU.EDU</FONT>
<BR><FONT SIZE=3D2>&gt; Subject: Re: Globally Unique Identifier</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; This is precisely one of the key questions we =
need to address with any </FONT>
<BR><FONT SIZE=3D2>&gt; identifier framework we adopt.&nbsp; I think we =
could easily use LSIDs in a </FONT>
<BR><FONT SIZE=3D2>&gt; way that should overcome your concerns, and I =
think that the built-in </FONT>
<BR><FONT SIZE=3D2>&gt; mechanisms for discovery and metadata access =
within the LSID model are </FONT>
<BR><FONT SIZE=3D2>&gt; really exciting.</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; I have just put together a PowerPoint =
presentation to explain some of </FONT>
<BR><FONT SIZE=3D2>&gt; what I think we could achieve with globally =
unique identifiers and </FONT>
<BR><FONT SIZE=3D2>&gt; particularly with LSIDS.&nbsp; It can be =
downloaded from:</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; <A =
HREF=3D"http://circa.gbif.net/Public/irc/gbif/dadi/library?l=3D/architec=
ture/glo" =
TARGET=3D"_blank">http://circa.gbif.net/Public/irc/gbif/dadi/library?l=3D=
/architecture/glo</A></FONT>
<BR><FONT SIZE=3D2>&gt; ba</FONT>
<BR><FONT SIZE=3D2>&gt; llyuniqueidentifier/</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; It may be clearest if you go through it as a =
slide show rather than in </FONT>
<BR><FONT SIZE=3D2>&gt; edit mode.</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; Thanks,</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; Donald</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; =
---------------------------------------------------------------</FONT>
<BR><FONT SIZE=3D2>&gt; Donald Hobern (dhobern at gbif.org)</FONT>
<BR><FONT SIZE=3D2>&gt; Programme Officer for Data Access and Database =
Interoperability Global </FONT>
<BR><FONT SIZE=3D2>&gt; Biodiversity Information Facility Secretariat =
Universitetsparken 15, </FONT>
<BR><FONT SIZE=3D2>&gt; DK-2100 Copenhagen, Denmark</FONT>
<BR><FONT SIZE=3D2>&gt; Tel: +45-35321483&nbsp;&nbsp; Mobile: =
+45-28751483&nbsp;&nbsp; Fax: +45-35321480</FONT>
<BR><FONT SIZE=3D2>&gt; =
---------------------------------------------------------------</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; -----Original Message-----</FONT>
<BR><FONT SIZE=3D2>&gt; From: TDWG - Structure of Descriptive Data =
</FONT>
<BR><FONT SIZE=3D2>&gt; [<A =
HREF=3D"mailto:TDWG-SDD at LISTSERV.NHM.KU.EDU">mailto:TDWG-SDD at LISTSERV.NH=
M.KU.EDU</A>] On Behalf Of Wouter Addink</FONT>
<BR><FONT SIZE=3D2>&gt; Sent: 23. september 2004 17:38</FONT>
<BR><FONT SIZE=3D2>&gt; To: TDWG-SDD at LISTSERV.NHM.KU.EDU</FONT>
<BR><FONT SIZE=3D2>&gt; Subject: Re: Globally Unique Identifier</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; It seems that DOI allows for any existing IDs =
to be used as part of </FONT>
<BR><FONT SIZE=3D2>&gt; the unique identifier. That seems to me as a =
fast to adopt short term </FONT>
<BR><FONT SIZE=3D2>&gt; solution but not a good idea for the long term. =
At first sight I very </FONT>
<BR><FONT SIZE=3D2>&gt; much liked the</FONT>
<BR><FONT SIZE=3D2>&gt; LSID specification, but the longer I think =
about it, the less I like</FONT>
<BR><FONT SIZE=3D2>&gt; some</FONT>
<BR><FONT SIZE=3D2>&gt; parts. What I think is missing in the LSID =
specification is that the</FONT>
<BR><FONT SIZE=3D2>&gt; unique</FONT>
<BR><FONT SIZE=3D2>&gt; identifier should be 'meaningless' apart from =
being an identifier to</FONT>
<BR><FONT SIZE=3D2>&gt; become</FONT>
<BR><FONT SIZE=3D2>&gt; time independent (and to avoid possible =
political problems). Any</FONT>
<BR><FONT SIZE=3D2>&gt; solution</FONT>
<BR><FONT SIZE=3D2>&gt; with a URN I can think of has some meaning, =
which makes solutions like a</FONT>
<BR><FONT SIZE=3D2>&gt; MAC-address generated GUID favorable in my =
opinion. And any meaning you</FONT>
<BR><FONT SIZE=3D2>&gt; need</FONT>
<BR><FONT SIZE=3D2>&gt; (like an authority of an object) can be =
specified in metadata instead of</FONT>
<BR><FONT SIZE=3D2>&gt; using it in the identifier. What is not very =
clear to me in the LSID</FONT>
<BR><FONT SIZE=3D2>&gt; specification is where the LSID generated by a =
LSIDAssigningService is</FONT>
<BR><FONT SIZE=3D2>&gt; actually stored.</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; Wouter Addink</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; ----- Original Message -----</FONT>
<BR><FONT SIZE=3D2>&gt; From: &quot;Gregor Hagedorn&quot; =
&lt;G.Hagedorn at BBA.DE&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; To: &lt;TDWG-SDD at LISTSERV.NHM.KU.EDU&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; Sent: Wednesday, September 08, 2004 6:20 =
PM</FONT>
<BR><FONT SIZE=3D2>&gt; Subject: Re: Globally Unique Identifier</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;I am not quite sure, but to me it seems =
with &quot;GUID&quot; you refer to the&nbsp; </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;numeric, MAC-address generated GUID type. I =
have nothing against&nbsp; </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;these. However, any URN in my view is a =
GUID that has most of the&nbsp; </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;properties you mention:</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; - it is guaranteed to be unique =
globally, and can be created</FONT>
<BR><FONT SIZE=3D2>&gt; anywhere,</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; anytime by any server or client =
machine - it has no meaning as to </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; where the data is physically located =
and will there not confuse any </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; user about this</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; - most id</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; mechanisms, especially URI/URN ids =
require a 'governing body' to </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; handle namespaces/urls to ensure every =
URN is unique, whereas a </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; GUID is always unique</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; The governing body is restricted to the =
primary web address, and in </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; most cases such an address is already =
available. Being a member of a </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; governmental institution that explicitly =
forbids the use without </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; prior consent, and forbids the use of its =
domain name once you are </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; no longer working for them, I realize some =
potential for problem.</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; I do think a URL of some kind would be =
useful for things such as </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; global searches of multiple databases, =
as this will allow the </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; search to go directly to the data =
source where the name, referene, </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; etc comes from.&nbsp; But this should =
not be part of its ID.&nbsp; Maybe a </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; name/id should have several foms, a =
GUID for an ID and a URL + a </FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; GUID for a fully specified =
name.</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt; What are the current thoughts on these =
ideas?</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; A GUID is only part of the problem. The =
other half of the problem is </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; actually getting at the resource. URN =
schemes like DOI or LSID (I </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; prefer the latter) intend to define =
resolution mechanisms. That make </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; the URN not yet a URL - in my view the =
good comes with the good, </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; location and reorganization =
independence.</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; I believe GBIF should install such an LSID =
resolver, which is why in </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; the UBIF proxy model, under Links, I =
propose to support a general </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; URL (including potentially URNS), a typed =
LSID and a typed DOI. This </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; could be simplified to have just a URN =
(LSID and DOI are URNs), but </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; that would then require string parsing to =
determine and recognize </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; the preferred resolvable GUID types. =
Comments on splitting/not </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; splitting this are welcome!</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; There may be some need to define a =
non-resolvable URN/numeric GUID </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; as well. However, that would not be under =
the linking question. Is </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; it correct that linking requires =
resolvability, or am I thinking </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; into a wrong direction?</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; Gregor</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; =
----------------------------------------------------------</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; Gregor Hagedorn (G.Hagedorn at bba.de)</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; Institute for Plant Virology, =
Microbiology, and Biosafety Federal </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; Research Center for Agriculture and =
Forestry (BBA)</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; Koenigin-Luise-Str. =
19&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Tel: =
+49-30-8304-2220</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; 14195 Berlin, =
Germany&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
Fax: +49-30-8304-2203</FONT>
<BR><FONT SIZE=3D2>&gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; Often wrong but never in doubt!</FONT>
</P>

</BODY>
</HTML>


More information about the tdwg-content mailing list