Re: random notes on LSIDs

27 Sep 2004

      Many thanks for jumping in on this, Dave!
...
As I think Dave V. and others have pointed out, when an LSID is resolved,
DNS is used to find the LSID authority.  The LSID authority then provides
information about how the LSID can be served up (e.g. HTTP, SOAP, FTP),
and where to get the data behind  the LSID and associated metadata.  If I
start serving up LSIDs with the authority learningsite.com and later
decide that I'm sick of serving up LSIDs, somebody else can take over
serving up the data and the metadata.  However, I (or they) still bear the
 responsibility of running the authority which points to the data. If my
lsids have an authority like lsid.learningsite.com
(urn:lsid:lsid.learningsite.com:foo:bar) then someone else can take over
the authority by taking over lsid.learningsite.com and I can  still have
www.learningsite.com, mail.learningsite.com, etc... for myself. So, with a
little planning, it's not so hard to deal with an authority going away as
long as the people running it are responsible.
O.K., that clears up things a great deal -- but also reinfornces my concerns
about LSIDs for specimen data.  What happens when Bishop Museum sends a
specimen (or an entire collection -- but not all of the collections) to
Smithsonian? If Bishop Museum still maintained lsid.bishopmuseum.org for its
other collections, then the specimen would presumably need a new LSID based
on lsid.Smithsonian.gov.  Is there a protocol for a request coming into
lsid.bishopmuseum.org to be automatically re-routed to lsid.Smithsonian.gov
for just those specimens flagged as transferred?  If so, then I would feel
(slightly) more comfortable with LSIDs if the issuing organizations would
agree to use some sort of independently unique value for <ObjectID>, that
that portion would be preserved along with the specimen object.

Of course, there is also the problem I alluded to earlier, where a specimen
object with a GUID is fractioned and in need of new IDs - but this is a
general problem which will need to be dealt with no matter the GUID scheme
that is adopted.
...
* data and metadata
With LSIDs there's a big difference between the data and metadata of an
LSID - and I think this is going to be the biggest challenge in deciding
how to use them in our context.  What's the data?  What's the metadata?
With gene sequences, the datum is the sequence, the metadata are things
like contact information, who did the sequencing, taxonomic information
about the thing sequenced, etc.  There's an LTER site using LSIDs for
their data sets.  The LSID data is the data set itself, and the metadata
is what you'd expect - a description of the data set, who the
investigators were, that sort of thing.  NCBI has pubmed LSIDs - they're
not serving up the articles yet, but there's associated metadata in there.
 For these things the division between data and metadata is fairly clear.
However, what is the data for taxa?  What is the metadata?
VERY difficult question.  I perceive taxa (names & concepts) as artificial
constructs, without unambiguosly objective reality; and as such, everything
(except the UID itself) is metadata.  However, original descriptions of
taxon names do have an (almost) unambiguosly objective reality, as do
documented statements about taxonomic concepts to which those statements
apply.  But even still, most of the attributes we might think of data
elements for objects like an original description of a taxcon name, could
also be interpreted as metadata.
...
Here's another interesting thing about data and metadata in LSIDs.  When
you issue an LSID you're promising the the DATA behind that LSID never
changes.
What about typographical/transcriptional errors?  Can they be corrected?
...
* client stack versus authority server
The LSID folks provide two batches of code - an authority server, for
people who want to to serve up LSIDs themselves, and an LSID Client stack
- which can be used by organizations to provide access to their LSIDs
and/or proxy LSIDs provided by other organizations.  It may make sense for
an organization like GBIF to build a service using the Client Stack to
support both their own LSIDs and those served by other organizations.  The
Client Stack has a caching mechanism which supports expiration information
from the primary authority, so the primary authority can update where the
LSID may be resolved and metadata of that authority.
Does the expiration apply to the domain, or to the individual object?  In
other words, can a defined set of ObjectID's within one domain's LSID pool
be re-directed, without having to redirect all calls to that LSID domain?

Thanks again for your very useful insights!

Aloha,
Rich

Re: random notes on LSIDs

Richard Pyle