Many thanks for jumping in on this, Dave!
As I think Dave V. and others have pointed out, when an LSID is resolved, DNS is used to find the LSID authority. The LSID authority then provides information about how the LSID can be served up (e.g. HTTP, SOAP, FTP), and where to get the data behind the LSID and associated metadata. If I start serving up LSIDs with the authority learningsite.com and later decide that I'm sick of serving up LSIDs, somebody else can take over serving up the data and the metadata. However, I (or they) still bear the responsibility of running the authority which points to the data. If my lsids have an authority like lsid.learningsite.com (urn:lsid:lsid.learningsite.com:foo:bar) then someone else can take over the authority by taking over lsid.learningsite.com and I can still have www.learningsite.com, mail.learningsite.com, etc... for myself. So, with a little planning, it's not so hard to deal with an authority going away as long as the people running it are responsible.
O.K., that clears up things a great deal -- but also reinfornces my concerns about LSIDs for specimen data. What happens when Bishop Museum sends a specimen (or an entire collection -- but not all of the collections) to Smithsonian? If Bishop Museum still maintained lsid.bishopmuseum.org for its other collections, then the specimen would presumably need a new LSID based on lsid.Smithsonian.gov. Is there a protocol for a request coming into lsid.bishopmuseum.org to be automatically re-routed to lsid.Smithsonian.gov for just those specimens flagged as transferred? If so, then I would feel (slightly) more comfortable with LSIDs if the issuing organizations would agree to use some sort of independently unique value for <ObjectID>, that that portion would be preserved along with the specimen object.
Of course, there is also the problem I alluded to earlier, where a specimen object with a GUID is fractioned and in need of new IDs - but this is a general problem which will need to be dealt with no matter the GUID scheme that is adopted.
- data and metadata
With LSIDs there's a big difference between the data and metadata of an LSID - and I think this is going to be the biggest challenge in deciding how to use them in our context. What's the data? What's the metadata? With gene sequences, the datum is the sequence, the metadata are things like contact information, who did the sequencing, taxonomic information about the thing sequenced, etc. There's an LTER site using LSIDs for their data sets. The LSID data is the data set itself, and the metadata is what you'd expect - a description of the data set, who the investigators were, that sort of thing. NCBI has pubmed LSIDs - they're not serving up the articles yet, but there's associated metadata in there. For these things the division between data and metadata is fairly clear. However, what is the data for taxa? What is the metadata?
VERY difficult question. I perceive taxa (names & concepts) as artificial constructs, without unambiguosly objective reality; and as such, everything (except the UID itself) is metadata. However, original descriptions of taxon names do have an (almost) unambiguosly objective reality, as do documented statements about taxonomic concepts to which those statements apply. But even still, most of the attributes we might think of data elements for objects like an original description of a taxcon name, could also be interpreted as metadata.
Here's another interesting thing about data and metadata in LSIDs. When you issue an LSID you're promising the the DATA behind that LSID never changes.
What about typographical/transcriptional errors? Can they be corrected?
- client stack versus authority server
The LSID folks provide two batches of code - an authority server, for people who want to to serve up LSIDs themselves, and an LSID Client stack
- which can be used by organizations to provide access to their LSIDs
and/or proxy LSIDs provided by other organizations. It may make sense for an organization like GBIF to build a service using the Client Stack to support both their own LSIDs and those served by other organizations. The Client Stack has a caching mechanism which supports expiration information from the primary authority, so the primary authority can update where the LSID may be resolved and metadata of that authority.
Does the expiration apply to the domain, or to the individual object? In other words, can a defined set of ObjectID's within one domain's LSID pool be re-directed, without having to redirect all calls to that LSID domain?
Thanks again for your very useful insights!
Aloha, Rich
participants (1)
-
Richard Pyle