random notes on LSIDs

Mon Sep 27 11:21:06 CEST 2004

Many thanks for jumping in on this, Dave!

> As I think Dave V. and others have pointed out, when an LSID is resolved,
> DNS is used to find the LSID authority.  The LSID authority then provides
> information about how the LSID can be served up (e.g. HTTP, SOAP, FTP),
> and where to get the data behind  the LSID and associated metadata.  If I
> start serving up LSIDs with the authority learningsite.com and later
> decide that I'm sick of serving up LSIDs, somebody else can take over
> serving up the data and the metadata.  However, I (or they) still bear the
>  responsibility of running the authority which points to the data. If my
> lsids have an authority like lsid.learningsite.com
> (urn:lsid:lsid.learningsite.com:foo:bar) then someone else can take over
> the authority by taking over lsid.learningsite.com and I can  still have
> www.learningsite.com, mail.learningsite.com, etc... for myself. So, with a
> little planning, it's not so hard to deal with an authority going away as
> long as the people running it are responsible.

O.K., that clears up things a great deal -- but also reinfornces my concerns
about LSIDs for specimen data.  What happens when Bishop Museum sends a
specimen (or an entire collection -- but not all of the collections) to
Smithsonian? If Bishop Museum still maintained lsid.bishopmuseum.org for its
other collections, then the specimen would presumably need a new LSID based
on lsid.Smithsonian.gov.  Is there a protocol for a request coming into
lsid.bishopmuseum.org to be automatically re-routed to lsid.Smithsonian.gov
for just those specimens flagged as transferred?  If so, then I would feel
(slightly) more comfortable with LSIDs if the issuing organizations would
agree to use some sort of independently unique value for <ObjectID>, that
that portion would be preserved along with the specimen object.

Of course, there is also the problem I alluded to earlier, where a specimen
object with a GUID is fractioned and in need of new IDs - but this is a
general problem which will need to be dealt with no matter the GUID scheme
that is adopted.

> * data and metadata
>
> With LSIDs there's a big difference between the data and metadata of an
> LSID - and I think this is going to be the biggest challenge in deciding
> how to use them in our context.  What's the data?  What's the metadata?
> With gene sequences, the datum is the sequence, the metadata are things
> like contact information, who did the sequencing, taxonomic information
> about the thing sequenced, etc.  There's an LTER site using LSIDs for
> their data sets.  The LSID data is the data set itself, and the metadata
> is what you'd expect - a description of the data set, who the
> investigators were, that sort of thing.  NCBI has pubmed LSIDs - they're
> not serving up the articles yet, but there's associated metadata in there.
>  For these things the division between data and metadata is fairly clear.
> However, what is the data for taxa?  What is the metadata?

VERY difficult question.  I perceive taxa (names & concepts) as artificial
constructs, without unambiguosly objective reality; and as such, everything
(except the UID itself) is metadata.  However, original descriptions of
taxon names do have an (almost) unambiguosly objective reality, as do
documented statements about taxonomic concepts to which those statements
apply.  But even still, most of the attributes we might think of data
elements for objects like an original description of a taxcon name, could
also be interpreted as metadata.

> Here's another interesting thing about data and metadata in LSIDs.  When
> you issue an LSID you're promising the the DATA behind that LSID never
> changes.

What about typographical/transcriptional errors?  Can they be corrected?

> * client stack versus authority server
>
> The LSID folks provide two batches of code - an authority server, for
> people who want to to serve up LSIDs themselves, and an LSID Client stack
> - which can be used by organizations to provide access to their LSIDs
> and/or proxy LSIDs provided by other organizations.  It may make sense for
> an organization like GBIF to build a service using the Client Stack to
> support both their own LSIDs and those served by other organizations.  The
> Client Stack has a caching mechanism which supports expiration information
> from the primary authority, so the primary authority can update where the
> LSID may be resolved and metadata of that authority.

Does the expiration apply to the domain, or to the individual object?  In
other words, can a defined set of ObjectID's within one domain's LSID pool
be re-directed, without having to redirect all calls to that LSID domain?

Thanks again for your very useful insights!

Aloha,
Rich