Topic 1: What do we mean by "GUID"?

Wed Oct 12 07:49:24 CEST 2005

I agree with all of Rod's points.  I'd like to add some details about
LSID resolution as well.

In current practice, LSID relies on the DNS to locate the authority that
is to be used to resolve (localize) an LSID.  This can be considered a
minor weakness (in that the DNS name is part of the identifier), but
also a major strength (the DNS is by far the most robust system we have
for persistent naming).  For an LSID that might be issued by, e.g.,
gbif.org, a change in gbif.org's ability to maintain the authority can
be fixed by simply pointing gbif.org's SRV record for the lsid service
to a new authority.  This is commonly recognized and used, and
represents a distributed resolution mechanism.

However, in addition, in the LSID spec the use of the DDDS (Dynamic
Delegation Discovery System) is also described top allow the use of a
centralized registry of resolvers.  The DDDS uses NAPTR records to
associate the NID "lsid" with a particular DNS server, which is then
queried for more NAPTR records that specify rewriting rules to obtain
the DNS name of the authority that should be used for a given LSID.  For
example, if the authority in an LSID is set to 'gbif.org', the rewriting
rules might turn that into the authority
'gbif.org.lsid.lsidauthority.org'. This allows the central
IANA-registered owner of the "lsid" NID to create a service that
overrides the DNS in particular cases to provide an alternative
authority in a centralized manner.  It also allows the use of non-DNS
based authority strings (e.g., myauthority).  This approch to resolution
is centralized in exactly the same manner that the centralized DOI and
ARK registries are.  However, as far as I can tell there are no LSID
resolvers that utilize this capability, and I don't know if the "lsid"
NID has been registered with an NAPTR record or not .  But nevertheless,
the LSID spec provides for both distributed and centralized authority
resolution, and so is a superset of the capabilities in DOI and ARK.

It also has the advantage of being Internet standards-based for all of
its resolution mechanisms.

Matt

Roderic Page wrote:
> I think this is a very nice statement of the issues.
>
> My own view is that ARK is interesting, but I'm not sure ARK is the best
> way forward. Persistence is a (perhaps the) key issue, and it is a
> social one not a technological one, as the DOI people make very clear.
> DOIs only work because the publishing industry has invested in the
> infrastructure to support them.
>
> In some ways, DOIs and ARK are very similar. If I use the DOI resolver
> to resolve a DOI
>
> http://dx.doi.org/10.1086/303303
> \--------/ \-----/ \----/
> | | |
> | Name Name
> Name mapping Assigning
> Authority Authority Number (NAAN)
> Hostport (NMAH)
>
> then I have a URL very like an ARK, where the authority assigning the
> name (such as a publisher, in this case the University of Chicago) is
> different from the authority makes the identifier actionable (doi.org).
> One could imagine that if DOI.org were to fall over, one could
> substitute another authority, such as doi.reborn.org. Indeed some
> publishers almost do essentially this, for example
> http://www.journals.uchicago.edu/cgi-bin/resolve?id=doi:10.1086/303303
> (although this will only resolve local DOIs). ARK simply makes this
> possibility explicit. LSIDs are more strongly tied to the DNS (the
> uniqueness of an LSID is partly guaranteed by using Internet domain
> names), although they do have limited support for foreign authorities
> (other providers that can serve metadata for objects that those
> providers don't actually own).
>
> ARK also adds the ability to retrieve a statement of commitment. I'm
> less impressed by this, as a statement is all very well, but will
> service providers actually honour it? I guess this is an issue of trust.
> I suspect that user's rating of service providers will be much more
> accurate than a rating provided by a service provider.
>
> One issue not on this list is who generates GUIDs? ARKs and DOIs require
> some degree of centralisation because both require unique identifiers
> for organisations providing data (e.g., 10.10086 identifies the
> University of Chicago Press). This in itself requires some degree of
> service commitment. LSIDs are decentralised, in that the unique
> identifier for an organisation is provided by the DNS. If, for example,
> GBIF took on the role of providing unique identifiers for organisations,
> but then closed due to funding issues (heaven forbid), then we have a
> problem. If the DNS goes belly up, then we will have much more pressing
> issues to worry about...
>
>
> Regards
>
> Rod
>
> On 11 Oct 2005, at 15:37, Donald Hobern wrote:
>
>
>     [ I will be trying to provide some structure to discussions in this
>     mailing list by raising specific topics and looking for comments.
>      Please keep the Topic number in responses ]
>
>     Topic 1: What do we mean by GUID?
>
>     The most fundamental thing that we need to establish as we consider
>     a GUID implementation is a definition for “GUID” in this context.
>      We have been using a number of terms to describe the identifiers we
>     need (unique, resolvable, persistent, etc.).
>
>     I’ve been spending some time following up on Rod Page’s
>     recommendation that we consider the use of Archival Resource Keys
>     (ARK) from the California Digital Library (see
>     http://wiki.gbif.org/guidwiki/wikka.php?wakka=ARK).  The CDL web
>     site includes an excellent overview of this GUID model, which also
>     serves as an excellent introduction to the issues involved.  I would
>     urge you all to read this document – it’s only nine pages long!):
>
>     http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf
>
>     This document arrives at the following problem definition for
>     persistent, actionable identifiers:
>
>     1 The goal: long-term actionable identifiers.
>     a Requirement: that identifiers deliver you to objects (where
>     feasible).
>     b Requirement: that identifiers deliver you to object metadata.
>     c Desirable: each object should wear its own identifier.
>     d Requirement: that identifiers deliver you to statements of
>     commitment.
>     2 The problem: URLs break for some objects (that is, associations
>     between URLs and objects are not maintained), and we have no way to
>     tell which ones will or won’t break.
>     3 Why URLs break: because objects are moved, removed, and replaced –
>     completely normal activities – and the provider in each case
>     demonstrates insufficient commitment to update indirection tables,
>     or to plan identifier assignment carefully. Persistence is in the
>     mission of few organizations.
>     4 Conventional hypothesis: use indirect names (PURLs, URNs, Handles)
>     instead of URLs; what worked for DNS should work for digital object
>     references.  Wrong. Indirection is spectacularly successful and
>     elegant in DNS, but it’s a side issue in the provision of digital
>     object persistence.
>
>     This document clearly identifies issues around provider service
>     commitments as the key problem that needs solving.  The construction
>     of ARKs seeks to address this in a couple of ways.  It separates the
>     role of Name Assigning Authority (i.e. who initially assigns the
>     identifier) from that of the Name Mapping Authority (i.e. who is
>     able to map the identifier to the data object at any particular
>     time).  It also defines a simple standard relationship between three
>     things: the data object, the metadata for the object, and a
>     commitment statement from the provider as to what aspects of
>     persistence are guaranteed.
>
>     ARK is a technology that we have not really considered up to this
>     point.  My question for discussion is what, if anything, is missing
>     or wrong about the problem definition provided in this document?  If
>     we agree that it provides a crisp definition of what we need, that
>     in itself will be a major step forward.
>
>     Please provide your thoughts.
>
>     Donald
>
>     ---------------------------------------------------------------
>     Donald Hobern (dhobern at gbif.org)
>     Programme Officer for Data Access and Database Interoperability
>     Global Biodiversity Information Facility Secretariat
>     Universitetsparken 15, DK-2100 Copenhagen, Denmark
>     Tel: +45-35321483   Mobile: +45-28751483   Fax: +45-35321480
>     ---------------------------------------------------------------
>
>
> Professor Roderic D. M. Page
> Editor, Systematic Biology
> DEEB, IBLS
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QP
> United Kingdom
>
> Phone: +44 141 330 4778
> Fax: +44 141 330 2792
> email: r.page at bio.gla.ac.uk
> web: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html
>
> Subscribe to Systematic Biology through the Society of Systematic
> Biologists Website: http://systematicbiology.org
> Search for taxon names at http://darwin.zoology.gla.ac.uk/~rpage/portal/
>
>
>
>

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matt Jones
jones at nceas.ucsb.edu                         Ph: 907-789-0496
National Center for Ecological Analysis and Synthesis (NCEAS)
UC Santa Barbara     http://www.nceas.ucsb.edu/ecoinformatics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~