identifiers for geologic samples

Chuck Miller Chuck.Miller at MOBOT.ORG
Sat Jan 28 08:25:32 CET 2006


Although the Internet may change in the future and none of us have a crystal
ball, some method for turning names (gbif.org) into network addresses
(192.38.28.79) will be required.  For the forseeable future that method on
the Internet will almost certainly be DNS. The cascade of the global DNS
server network is key to making the Internet work.  No matter what URL you
put into your browser, the DNS network finds its way to the IP address of
the server.  Trillions of dollars of commerce now depend upon this global
standard. I think the analogy is more like the teletype machines and the
ASCII codes they used.  Although we have Unicode now, it still includes
ASCII. In communications, the new must continue to support the old.  This
can be seen in the W3C projects that continue to build upon the previous
standards.

To replace the Internet's global DNS locator service with something unique
to biodiversity seems like a complex, expensive and long-term proposition.
For the sake of getting things done in a timely manner, I think we need to
keep things simple and leverage the pieces of the puzzle that already work.
Implementing a GUID scheme is going to be tough enough without tackling a
replacement for DNS.

An issue that needs to be decided by the workshop is how much "abstraction"
of a GUID is absolutely necessry if it must also be locatable through the
Internet?  That is, is a compromise needed to allow embedding of domain
names in order to enable use of the DNS to locate GUIDs.  Surely there is
insufficient time, funds, and staff to embark upon creation of a master
switchboard (database) where the locations of millions of GUIDs are recorded
and updated in perpetuity.

Chuck Miller
Missouri Botanical Garden


-----Original Message-----
From: Roderic Page
To: TDWG-GUID at LISTSERV.NHM.KU.EDU
Sent: 1/28/2006 2:58 AM
Subject: Re: [TDWG-GUID] identifiers for geologic samples

On 28 Jan 2006, at 01:02, Richard Pyle wrote:

> The more I think about it, the more I think this is the sort of system
> that
> would work well for our field.  A centralized issuer (which could
issue
> blocks of thousands or millions of numbers at a time),

The major problem I see with this is that a central registry may be a
rate limiting step because it has to allocate blocks, it would also
decide for format of the last part of the identifier (which the
provider might not find desirable), and it may well lead to lots of
wasted identifiers (e.g., it allocates 100,000 to me, but I use 3 off
them).

Would it not be better to devolve this? You can still have a central
registry. For example, Handles and DOIs work by having a central
registry for the prefix (e.g., "1018") and the provider is responsible
for allocating the suffix locally.


> I'm not sure how wise it would be to create a new syntax standard,
> rather
> than go with one of the ones we've discussed.  But if (for example)
> using
> LSID, I personally think it would be preferable to establish a highly
> generic form, such as:
>
> urn:lsid:gbif.org:BioGUID:12345

Without wishing to preempt some of the things I'm going to present at
the workshop, I'm going off LSIDs a little because of their reliance on
the Internet DNS. Apart from the hassle of mucking with the DNS records
to set them up (I suspect not every provider is going to find this easy
to do), it assumes that the Internet its present form is going to be
here forever, and it also embeds information in the identifier (e.g.,
"gbif.org") that currently has meaning, but over time may loose
meaning, or worse, be positively misleading (say if GBIF goes belly up
and somebody else serves the data).

Handles (including DOIs) and ARK have no information in the identifier
(perhaps not strictly true for some DOIs, but that's by choice not
design), and also in principle don't need the internet. In the future
some other mode of information transport may come along, and they could
still be used.

While it might be hard to imagine the Internet and the DNS going away,
if anybody has a 5 1/4" floppy lying around, they'll be aware of how
hard it is to get information off it these days as 5 1/4" drives are
scarce as hens teeth -- the only one in my department is in an old PC
that is connected to the network. The digital library community seem
particularly sensitive to these issues, which is perhaps why they use
handles, DOIs, and ARK.

Regards

Rod



------------------------------------------------------------------------
----------------------------------------
Professor Roderic D. M. Page
Editor, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org
Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
Find out what we know about a species at http://ispecies.org






___________________________________________________________
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with
voicemail http://uk.messenger.yahoo.com

------_=_NextPart_001_01C62416.B2843512
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3DUS-ASCII">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2654.45">
<TITLE>RE: [TDWG-GUID] identifiers for geologic samples</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>Although the Internet may change in the future and =
none of us have a crystal ball, some method for turning names =
(gbif.org) into network addresses (192.38.28.79) will be =
required.&nbsp; For the forseeable future that method on the Internet =
will almost certainly be DNS. The cascade of the global DNS server =
network is key to making the Internet work.&nbsp; No matter what URL =
you put into your browser, the DNS network finds its way to the IP =
address of the server.&nbsp; Trillions of dollars of commerce now =
depend upon this global standard. I think the analogy is more like the =
teletype machines and the ASCII codes they used.&nbsp; Although we have =
Unicode now, it still includes ASCII. In communications, the new must =
continue to support the old.&nbsp; This can be seen in the W3C projects =
that continue to build upon the previous standards.</FONT></P>

<P><FONT SIZE=3D2>To replace the Internet's global DNS locator service =
with something unique to biodiversity seems like a complex, expensive =
and long-term proposition.&nbsp; For the sake of getting things done in =
a timely manner, I think we need to keep things simple and leverage the =
pieces of the puzzle that already work.&nbsp; Implementing a GUID =
scheme is going to be tough enough without tackling a replacement for =
DNS. </FONT></P>

<P><FONT SIZE=3D2>An issue that needs to be decided by the workshop is =
how much &quot;abstraction&quot; of a GUID is absolutely necessry if it =
must also be locatable through the Internet?&nbsp; That is, is a =
compromise needed to allow embedding of domain names in order to enable =
use of the DNS to locate GUIDs.&nbsp; Surely there is insufficient =
time, funds, and staff to embark upon creation of a master switchboard =
(database) where the locations of millions of GUIDs are recorded and =
updated in perpetuity.</FONT></P>

<P><FONT SIZE=3D2>Chuck Miller</FONT>
<BR><FONT SIZE=3D2>Missouri Botanical Garden</FONT>
<BR><FONT SIZE=3D2>&nbsp;</FONT>
</P>

<P><FONT SIZE=3D2>-----Original Message-----</FONT>
<BR><FONT SIZE=3D2>From: Roderic Page</FONT>
<BR><FONT SIZE=3D2>To: TDWG-GUID at LISTSERV.NHM.KU.EDU</FONT>
<BR><FONT SIZE=3D2>Sent: 1/28/2006 2:58 AM</FONT>
<BR><FONT SIZE=3D2>Subject: Re: [TDWG-GUID] identifiers for geologic =
samples</FONT>
</P>

<P><FONT SIZE=3D2>On 28 Jan 2006, at 01:02, Richard Pyle wrote:</FONT>
</P>

<P><FONT SIZE=3D2>&gt; The more I think about it, the more I think this =
is the sort of system</FONT>
<BR><FONT SIZE=3D2>&gt; that</FONT>
<BR><FONT SIZE=3D2>&gt; would work well for our field.&nbsp; A =
centralized issuer (which could</FONT>
<BR><FONT SIZE=3D2>issue</FONT>
<BR><FONT SIZE=3D2>&gt; blocks of thousands or millions of numbers at a =
time),</FONT>
</P>

<P><FONT SIZE=3D2>The major problem I see with this is that a central =
registry may be a</FONT>
<BR><FONT SIZE=3D2>rate limiting step because it has to allocate =
blocks, it would also</FONT>
<BR><FONT SIZE=3D2>decide for format of the last part of the identifier =
(which the</FONT>
<BR><FONT SIZE=3D2>provider might not find desirable), and it may well =
lead to lots of</FONT>
<BR><FONT SIZE=3D2>wasted identifiers (e.g., it allocates 100,000 to =
me, but I use 3 off</FONT>
<BR><FONT SIZE=3D2>them).</FONT>
</P>

<P><FONT SIZE=3D2>Would it not be better to devolve this? You can still =
have a central</FONT>
<BR><FONT SIZE=3D2>registry. For example, Handles and DOIs work by =
having a central</FONT>
<BR><FONT SIZE=3D2>registry for the prefix (e.g., &quot;1018&quot;) and =
the provider is responsible</FONT>
<BR><FONT SIZE=3D2>for allocating the suffix locally.</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>&gt; I'm not sure how wise it would be to create a =
new syntax standard,</FONT>
<BR><FONT SIZE=3D2>&gt; rather</FONT>
<BR><FONT SIZE=3D2>&gt; than go with one of the ones we've =
discussed.&nbsp; But if (for example)</FONT>
<BR><FONT SIZE=3D2>&gt; using</FONT>
<BR><FONT SIZE=3D2>&gt; LSID, I personally think it would be preferable =
to establish a highly</FONT>
<BR><FONT SIZE=3D2>&gt; generic form, such as:</FONT>
<BR><FONT SIZE=3D2>&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; urn:lsid:gbif.org:BioGUID:12345</FONT>
</P>

<P><FONT SIZE=3D2>Without wishing to preempt some of the things I'm =
going to present at</FONT>
<BR><FONT SIZE=3D2>the workshop, I'm going off LSIDs a little because =
of their reliance on</FONT>
<BR><FONT SIZE=3D2>the Internet DNS. Apart from the hassle of mucking =
with the DNS records</FONT>
<BR><FONT SIZE=3D2>to set them up (I suspect not every provider is =
going to find this easy</FONT>
<BR><FONT SIZE=3D2>to do), it assumes that the Internet its present =
form is going to be</FONT>
<BR><FONT SIZE=3D2>here forever, and it also embeds information in the =
identifier (e.g.,</FONT>
<BR><FONT SIZE=3D2>&quot;gbif.org&quot;) that currently has meaning, =
but over time may loose</FONT>
<BR><FONT SIZE=3D2>meaning, or worse, be positively misleading (say if =
GBIF goes belly up</FONT>
<BR><FONT SIZE=3D2>and somebody else serves the data).</FONT>
</P>

<P><FONT SIZE=3D2>Handles (including DOIs) and ARK have no information =
in the identifier</FONT>
<BR><FONT SIZE=3D2>(perhaps not strictly true for some DOIs, but that's =
by choice not</FONT>
<BR><FONT SIZE=3D2>design), and also in principle don't need the =
internet. In the future</FONT>
<BR><FONT SIZE=3D2>some other mode of information transport may come =
along, and they could</FONT>
<BR><FONT SIZE=3D2>still be used.</FONT>
</P>

<P><FONT SIZE=3D2>While it might be hard to imagine the Internet and =
the DNS going away,</FONT>
<BR><FONT SIZE=3D2>if anybody has a 5 1/4&quot; floppy lying around, =
they'll be aware of how</FONT>
<BR><FONT SIZE=3D2>hard it is to get information off it these days as 5 =
1/4&quot; drives are</FONT>
<BR><FONT SIZE=3D2>scarce as hens teeth -- the only one in my =
department is in an old PC</FONT>
<BR><FONT SIZE=3D2>that is connected to the network. The digital =
library community seem</FONT>
<BR><FONT SIZE=3D2>particularly sensitive to these issues, which is =
perhaps why they use</FONT>
<BR><FONT SIZE=3D2>handles, DOIs, and ARK.</FONT>
</P>

<P><FONT SIZE=3D2>Regards</FONT>
</P>

<P><FONT SIZE=3D2>Rod</FONT>
</P>
<BR>
<BR>

<P><FONT =
SIZE=3D2>---------------------------------------------------------------=
---------</FONT>
<BR><FONT SIZE=3D2>----------------------------------------</FONT>
<BR><FONT SIZE=3D2>Professor Roderic D. M. Page</FONT>
<BR><FONT SIZE=3D2>Editor, Systematic Biology</FONT>
<BR><FONT SIZE=3D2>DEEB, IBLS</FONT>
<BR><FONT SIZE=3D2>Graham Kerr Building</FONT>
<BR><FONT SIZE=3D2>University of Glasgow</FONT>
<BR><FONT SIZE=3D2>Glasgow G12 8QP</FONT>
<BR><FONT SIZE=3D2>United Kingdom</FONT>
</P>

<P><FONT SIZE=3D2>Phone:&nbsp;&nbsp;&nbsp; +44 141 330 4778</FONT>
<BR><FONT SIZE=3D2>Fax:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; +44 141 330 =
2792</FONT>
<BR><FONT SIZE=3D2>email:&nbsp;&nbsp;&nbsp; r.page at bio.gla.ac.uk</FONT>
<BR><FONT SIZE=3D2>web:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <A =
HREF=3D"http://taxonomy.zoology.gla.ac.uk/rod/rod.html" =
TARGET=3D"_blank">http://taxonomy.zoology.gla.ac.uk/rod/rod.html</A></FO=
NT>
<BR><FONT SIZE=3D2>reprints: <A =
HREF=3D"http://taxonomy.zoology.gla.ac.uk/rod/pubs.html" =
TARGET=3D"_blank">http://taxonomy.zoology.gla.ac.uk/rod/pubs.html</A></F=
ONT>
</P>

<P><FONT SIZE=3D2>Subscribe to Systematic Biology through the Society =
of Systematic</FONT>
<BR><FONT SIZE=3D2>Biologists Website:&nbsp; <A =
HREF=3D"http://systematicbiology.org" =
TARGET=3D"_blank">http://systematicbiology.org</A></FONT>
<BR><FONT SIZE=3D2>Search for taxon names at <A =
HREF=3D"http://darwin.zoology.gla.ac.uk/~rpage/portal/" =
TARGET=3D"_blank">http://darwin.zoology.gla.ac.uk/~rpage/portal/</A></FO=
NT>
<BR><FONT SIZE=3D2>Find out what we know about a species at <A HREF=3D"h=
ttp://ispecies.org" TARGET=3D"_blank">http://ispecies.org</A></FONT>
</P>
<BR>
<BR>
<BR>
<BR>
<BR>

<P><FONT =
SIZE=3D2>___________________________________________________________</FO=
NT>
<BR><FONT SIZE=3D2>Yahoo! Messenger - NEW crystal clear PC to PC =
calling worldwide with</FONT>
<BR><FONT SIZE=3D2>voicemail <A HREF=3D"http://uk.messenger.yahoo.com" =
TARGET=3D"_blank">http://uk.messenger.yahoo.com</A></FONT>
</P>

</BODY>
</HTML>


More information about the tdwg-tag mailing list