Re: BioGUIDs and the Internet Analogy
Perhaps it would be useful to look at the issues being discussed about a bio identifier/locator/GUID in comparison to the same things that are needed for Internet communications.
I've long thought that parts of the DNS system would be extremely useful to emulate in some aspects of bioinformatics data management (particularly taxonomic names; see below).
IP addresses have to be unique world-wide to make the Internet work. The Internet Corporation for Assigned Names and Numbers (ICANN-
www.icann.org)
provides that uniqueness by assigning all the IP numbers in unique blocks
or
ranges of numbers to "Internet Registries".
...exactly the way that I envision an organization like GBIF would be charged with the task of issuing UIDs for certain biological objects.
There are Regional, National and Local Internet Registries that subdivide
and
"license" IP addresses to ISPs, who in turn license IP addresses to
organizations.
There could be a useful analog for this in bioinformatics (particularly in terms of individual institutions serving as regional registries for specimen UIDs, or IC_N Commissions serving as "regional" registries for taxon name UID assignment) -- but there doesn't necessarily have to be.
So, there is a heirarchy of how the "unique identifiers" are managed.
There is
in fact a central authority, but it delegates to decentralized
authorities.
To emulate this in bioinformatics, the "hierarchy" would be achieved simply by allowing block-assignment of UIDs to various players -- but the important point here is that only *one* organization ensures uniqueness (in the case of Internet, of ISPs). The data to which those UIDs apply would be, for the most part, the responsibility of the UID recipient, not the UID issuer (in my world view). Thus: centralized issuance; delegated application.
Is there an analogy for BioGUIDs to have a central body who divvies out
the
unique numbers (like IP addresses) to decentralized bodies or large
organizations?
GBIF seems to me to be the principle contender.
Since IP addresses are hard to memorize (and so too would be a BioGUID),
"domain names"
are used. Starting with a domain name, you can first find the name and/or
IP address
of a device, called the Domain Name Server, that can locate the IP address
of other
computers. This is a form of indirect addressing. ICANN also manages the
top-level
namespace for the Internet. They decide what the valid domain "extensions"
are (like
.com, .uk) so that everybody, everywhere knows where to look them up.
Then, the domain
name extensions are separated among the Regional, National, and Local
Interent Registries
around the world. There is a scheme for where to find the IP addresses
for every domain
extension (e.g. .com is on the ARIN registry, .com.uk is on the ). Then there is a layer of Domain Registrars who have been accredited by
ICANN to assign
domain names for the domain extensions - e.g. tdwg.org. The domain name registrars are told by the owner of the domain where to
find their particular
Domain Name Servers which may be many to enable redundancy - Primary,
Secondary, Tertiary,
etc. These redundant Domain Name Servers synchronize with each other at
particular times
of day and may be located all around the world. They are the main
"switchboard" for a
particular organizations computer names and associated IP addresses. Then the individual organization can create multiple computers for the
domain name - e.g.
www.tdwg.org - and add them to the Domain Name Server listing. There can
be many computers
for a domain, for instance: info.tdwg.org, www2.tdwg.org, myname.tdwg.org.
Each of these
can be a different computer with a different IP address. The redundant
Domain Name Servers
all contain the list of all these names and what IP addresses they are.
This is analogous in many ways to how I would envision a global taxonomic name service. UIDs are assigned by a centralized body (e.g., GBIF; or by the IC_N Commissions) to individual names. Analogous to multiple redundant Domain Name Servers (DNS) would be Taxon Name Servers (TNS). Rather than administered by one organization (e.g., GBIF, ITIS, Species 2000, uBio, etc.) these TSNs would be replicated on dozens or hundreds of servers all over the world, and maintained as synchronized within some reasonable time unit. Changes to any one replicate would be automatically propagated to all replicates (either chaotically, or more strictly through one or a few defined "hubs"). Instead of Domain names as surrogates for IP addresses, there would be fully qualified "Basionyms" (e.g., "OriginalGenusName.OriginalSpeciesName.OriginalAuthor.DescriptionYear.Page.O therOriginalCitationDetailsAsNeeded") representations of the less-human-friendly GUIDs (analogues to IP addresses). Ideally, this system wouldn't be limited to just taxonomic names, but extended to all taxonomic concepts, so that the "Domain Name" analogue would be extended to something like:
"OriginalGenusName.OriginalSpeciesName.OriginalAuthor.DescriptionYear.Page.O therOriginalCitationDetailsAsNeeded_AppliedGenusName.AppliedSpeciesSpelling. ConceptAuthor.ConceptYear.Page.OtherConceptCitationDetailsAsNeeded"
The players in the Internet networking fabric all now play by these
layered rules.
They all know them and follow them in order to keep the Internet running.
This
stuff happens out of sight to everyone but the networking people and we
all take it
for granted and assume it is simple. But, it's invisible not because it's
simple,
but rather because it's disciplined.
Excellent synopsis, and (in my opinion), and excellent model to follow for at least taxonomic names/concepts data. Perhaps also for specimen data (but seems less intuitive for that.) This comes back to my earlier question about whether it is vital that all bioinformatics GUIDs be of the same scheme; or whether different schemes might be optimal for different classes of objects.
Aloha, Rich
Richard L. Pyle, PhD Natural Sciences Database Coordinator, Bishop Museum 1525 Bernice St., Honolulu, HI 96817 Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef@bishopmuseum.org http://www.bishopmuseum.org/bishop/HBS/pylerichard.html
participants (1)
-
Richard Pyle