BioGUIDs and the Internet Analogy

Mon Sep 27 11:21:21 CEST 2004

> Perhaps it would be useful to look at the issues being discussed about
> a bio identifier/locator/GUID in comparison to the same things that are
> needed for Internet communications.

I've long thought that parts of the DNS system would be extremely useful to
emulate in some aspects of bioinformatics data management (particularly
taxonomic names; see below).

> IP addresses have to be unique world-wide to make the Internet work.
> The Internet Corporation for Assigned Names and Numbers (ICANN-
www.icann.org)
> provides that uniqueness by assigning all the IP numbers in unique blocks
or
> ranges of numbers to "Internet Registries".

...exactly the way that I envision an organization like GBIF would be
charged with the task of issuing UIDs for certain biological objects.

> There are Regional, National and Local Internet Registries that subdivide
and
> "license" IP addresses to ISPs, who in turn license IP addresses to
organizations.

There could be a useful analog for this in bioinformatics (particularly in
terms of individual institutions serving as regional registries for specimen
UIDs, or IC_N Commissions serving as "regional" registries for taxon name
UID assignment) -- but there doesn't necessarily have to be.

> So, there is a heirarchy of how the "unique identifiers" are managed.
There is
> in fact a central authority, but it delegates to decentralized
authorities.

To emulate this in bioinformatics, the "hierarchy" would be achieved simply
by allowing block-assignment of UIDs to various players -- but the important
point here is that only *one* organization ensures uniqueness (in the case
of Internet, of ISPs).  The data to which those UIDs apply would be, for the
most part, the responsibility of the UID recipient, not the UID issuer (in
my world view). Thus: centralized issuance; delegated application.

> Is there an analogy for BioGUIDs to have a central body who divvies out
the
> unique numbers (like IP addresses) to decentralized bodies or large
organizations?

GBIF seems to me to be the principle contender.

> Since IP addresses are hard to memorize (and so too would be a BioGUID),
"domain names"
> are used. Starting with a domain name, you can first find the name and/or
IP address
> of a device, called the Domain Name Server, that can locate the IP address
of other
> computers.  This is a form of indirect addressing.  ICANN also manages the
top-level
> namespace for the Internet. They decide what the valid domain "extensions"
are (like
> .com, .uk) so that everybody, everywhere knows where to look them up.
Then, the domain
> name extensions are separated among the Regional, National, and Local
Interent Registries
> around the world.  There is a scheme for where to find the IP addresses
for every domain
> extension (e.g. .com is on the ARIN registry, .com.uk is on the ).
> Then there is a layer of Domain Registrars who have been accredited by
ICANN to assign
> domain names for the domain extensions - e.g. tdwg.org.
> The domain name registrars are told by the owner of the domain where to
find their particular
> Domain Name Servers which may be many to enable redundancy - Primary,
Secondary, Tertiary,
> etc.  These redundant Domain Name Servers synchronize with each other at
particular times
> of day and may be located all around the world.  They are the main
"switchboard" for a
> particular organizations computer names and associated IP addresses.
> Then the individual organization can create multiple computers for the
domain name - e.g.
> www.tdwg.org - and add them to the Domain Name Server listing.  There can
be many computers
> for a domain, for instance: info.tdwg.org, www2.tdwg.org, myname.tdwg.org.
Each of these
> can be a different computer with a different IP address.  The redundant
Domain Name Servers
> all contain the list of all these names and what IP addresses they are.

This is analogous in many ways to how I would envision a global taxonomic
name service.  UIDs are assigned by a centralized body (e.g., GBIF; or by
the IC_N Commissions) to individual names.  Analogous to multiple redundant
Domain Name Servers (DNS) would be Taxon Name Servers (TNS).  Rather than
administered by one organization (e.g., GBIF, ITIS, Species 2000, uBio,
etc.) these TSNs would be replicated on dozens or hundreds of servers all
over the world, and maintained as synchronized within some reasonable time
unit.  Changes to any one replicate would be automatically propagated to all
replicates (either chaotically, or more strictly through one or a few
defined "hubs").  Instead of Domain names as surrogates for IP addresses,
there would be fully qualified "Basionyms" (e.g.,
"OriginalGenusName.OriginalSpeciesName.OriginalAuthor.DescriptionYear.Page.O
therOriginalCitationDetailsAsNeeded") representations of the
less-human-friendly GUIDs (analogues to IP addresses).  Ideally, this system
wouldn't be limited to just taxonomic names, but extended to all taxonomic
concepts, so that the "Domain Name" analogue would be extended to something
like:

"OriginalGenusName.OriginalSpeciesName.OriginalAuthor.DescriptionYear.Page.O
therOriginalCitationDetailsAsNeeded_AppliedGenusName.AppliedSpeciesSpelling.
ConceptAuthor.ConceptYear.Page.OtherConceptCitationDetailsAsNeeded"

> The players in the Internet networking fabric all now play by these
layered rules.
> They all know them and follow them in order to keep the Internet running.
This
> stuff happens out of sight to everyone but the networking people and we
all take it
> for granted and assume it is simple.  But, it's invisible not because it's
simple,
> but rather because it's disciplined.

Excellent synopsis, and (in my opinion), and excellent model to follow for
at least taxonomic names/concepts data.  Perhaps also for specimen data (but
seems less intuitive for that.) This comes back to my earlier question about
whether it is vital that all bioinformatics GUIDs be of the same scheme; or
whether different schemes might be optimal for different classes of objects.

Aloha,
Rich

Richard L. Pyle, PhD
Natural Sciences Database Coordinator, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://www.bishopmuseum.org/bishop/HBS/pylerichard.html