Re: BioGUIDs and the Internet Analogy

27 Sep 2004

      ...
[For me, the bottom line---which however I nowhere state below---is:
There is /so much/ existing free infrastructure source code---e.g.
http://www-124.ibm.com/developerworks/oss/lsid/--- and (apparently)
funding, and (manifestly) professionally designed specifications for
LSID concerns that I am horrified at the prospect of adopting anything
else if LSID comes even close to being what the community needs.
I can certainly understand that perspective, and that's one of the main
reasons I am still semi-supportive of the LSID approach (i.e., existing
code). My major concern has to do with the "Authority"/"Resolver" domain
portion of the LSID, and the need (or non-need) for it to be an active,
accessible domain in order to resolve the LSID. I'm also VERY concerned
about there ever being a temptation to change an ID for an object (e.g., a
specimen given from one Museum to another) -- unless it is understood that
the non-"ObjectID" portion is really thought of as metadata of sorts, and
ObjectID itself is globally unique by itself. I'll need to read the LSID
spec in more detail, and give it some more thought, before I comment further
on this.
...
Or,
let's forget about LSID and instead of deploying what satisfies 98% of
the needs in six months, we could roll our own and deploy what satisfies
80% of the needs in a few years...
If that were really the balance, then it would be a no-brainer.  My concern
would be adopting (effectively committing to) a scheme that satisfies 80% of
the needs in six months, instead of being patient and picking a system that
accomodates 98% of the need a few years from now.  If I had confidence that
we could implement LSIDs in a "test-drive" mode for a couple of years,
without being fully committed to them, I'd be much more comfortable.  But as
someone who spends a considerable amount of time trying to undo the damage
of "legacy" solutions to data problems that were hastily conceived, I'm
trying to be cautious.
...
The design goals of TCP/IP and DNS, and their implementation, intersect
the requirements of Bio UUIDs only in a very small set, in fact, deep
down perhaps not at all.
These protocols and the associated address syntax were designed
primarily for /routing/, not in any way designed to guarantee that a
datum twice received has any connection between the two occurrences.
IP addresses are in no way persistent.
IP addresses are not globally unique, albeit in several small and varied
ways:
I don't think anyone (in this thread) was suggesting actually *USING* TCP/IP
and DNS for BioGUIDs (at least I wasn't).  Rather, I was looking to it as a
source of ground-truthed schemes for reliably managing globally distributed
information.  For instance, would DNS synchronization/propagation serve as a
useful model for gobally distributed, synchronized taxonomic registries? Or
would the taxonomic registry work more effectively with one or a few
centralized "masters" with which a larger set of replicates kept
synchronized?  I also think, as I explained earlier, that the hierarchical
approach with centralized block ID issuance and local application might be
instructive to a bioscheme.
...
In fact, if the UUIDs are meant to be semantically opaque it matters not
one whit who or how these matters are settled.
Can you elaborate on what you mean by "semantically opaque"? I think I
understand -- but the last thing this thread needs is ambiguity about the
meaning of terms (i.e., "opaque semantics"....  :-)  )
...
Exceptions to that are
social, not technical. ("If you don't let me decide X, I am not going to
use your scheme". "OK, then you won't participate in its benefits.
That's fine with me")
...and as I said before, the real challenge in establishing universally
adopted BioGUIDs is not going to be technical; it's going to be
social/political.
...
...
...
So, there is a heirarchy of how the "unique identifiers" are managed.
There is
...
in fact a central authority, but it delegates to decentralized
authorities.
But this is mainly to distribute costs and speed issuance. It has
nothing to do with the naming scheme. The number of organizations to be
issued Bio GUIDs surely is several orders of magnitude less than those
to be issued IBv6 addresses. So I doubt any IPv6 issuance mechanisms are
instructive, at least in their purpose (and hence, if well implemented,
in their implementation).
So are you saying that, because BioGUID traffic will be orders of magnitude
smaller than internet domain traffic, there does not need to be delegation
to decentralized authorities?  If so, then we are in full agreement.
...
...
GBIF seems to me to be the principle contender.
I enthusiastically agree. Also the /principal/ contender. [Sorry,
couldn't resist. My fingers slip on that one sometimes too.]
Ouch.... :-)
...
Not exactly. There is one scheme in case your application can't resolve
it in a more nearly "local" facility. There are /lots/ of ways to find
an IP address from a domain name. All those which comply fully with the
DNS protocol, however, can make available two pieces of metadata: the
TTL of the record it is offering, and the IP address of a machine at
which you can find an authoritative record of the assignment of the dns
name to the IP address. This protocol /might/, but you hope on
performance grounds usually /doesn't/, lead you up as far as the root
servers, and the "one scheme to bind them all". If there is any lesson
here at all, it is that name resolution protocols matter, but resolution
implementations don't. Yet another attribute on which, DNS/IP and LSID
are not distinguishable.
This seems to be a fundamental point of confusion (for me anyway).  Are the
domain names embedded within LSIDs information-bearing in the sense that
they are necessarily the internet domain at which the LSID is resolved?  I
guess I should read and understand Section 13.3 of the LSID spec before
commenting further.
...
More often, only when the TTLs expire, there being no motivation to do
otherwise.
O.K., there's a great analogy that may be useful if implementing a
distributed system of synchronized/mirrored biological data servers:  should
they remain in synch at fixed time intervals? In real time with each data
transaction? Or, should some sort of TTL feature be incorporated in data?

Much to think about.  But time for me to get some work done....

Aloha,
Rich

Re: BioGUIDs and the Internet Analogy

Richard Pyle