[tdwg-guid] First step in implementing LSIDs?[Scanned]

Sat Jun 2 23:21:33 CEST 2007

On 6/2/07, Weitzman, Anna <WEITZMAN at si.edu> wrote:
> Bob,
>
> Thanks for the explanation.
>
> Excuse my lack of knowledge about this--but I trying to understand this in a way that taxonomists like myself will need to (and I need to understand it in terms that I can use to explain it to other taxonomists).  So much of what we are doing now in TDWG is so foreign to taxonomists, and I fear that you are going to completely leave us (even those of us who are 'relatively technically inclined' behind--which I don't think is helpful.

Use scenarios all have to come from the users, so things like your 1-7
are valuable.
>
> Your explanation does help (though I think my calling it a parent LSID vs. resolving to something in RDF is somewhat semantic--if the resolver does not allow all of the things mentioned in 1-7 (and so on) to resolve to the same "something" that relates them all to the same 'parent' (a term that taxonomists will understand) specimen it really isn't going to work--but I assume that you CS guys have that sorted and I just need to read and ask more questions so that I can translate it somehow into terminology that I and other taxonomists understand).

I think which of your 7 things deserve GUIDS is not so much about
GUIDS but about what the community wants to do with them, especially
across applications. The simplest use of course is to make sure that
two mentions are talking about the same thing. That's the minimal
guarantee of GUIDS. If your data or document and my data or document
mention the same GUID, then they must be referring to the same object
(digital or physical). No resolution at all is needed for that use
case.
Anything further than that is basically a community issue. For
example, if you want to use GUIDs to help guarantee that two Taxon
Concepts are described in the same publication, then you need GUIDS on
pubs and both TCs need to use the same GUID. [An object can have more
than one GUID, but no two objects can have the same GUID]. But does
your GUID resolution need to say on what page the description for the
TC appears? Not necessarily.

 More generally, in the case of LSIDs each community has to ask three
questions, the first of which is usually---but not always---easy:

   (a)Under what circumstances do we want to distinguish two objects
from one another

   (b)what information about an object do we want to say fundamentally
will never change, for the entire future of the universe, even if
there are no people, computers, or anything but black holes left;
including the object, if it was physical;

   (c)what information do we want to associate with an object that
might change as our views about it---scientific or curatorial---
themselves might change.

For (a), the LSID spec requires a guarantee that nowhere, nohow,
nowhen will the same LSID be issued twice. LSIDs resolvers are obliged
to offer the stuff in (b) as LSID resolution data, but TDWG takes no
position about how. LSID resolvers are required by TDWG to offer the
stuff in (c) as LSID resolution metadata,  represented in RDF. TDWG
may ultimately require that this be done using the TDWG ontology. This
is less onerous than it might seem because that is extensible. Or
rather, it is no more onerous than RDF is in the first place. An
important special case of (c) is the question of what, if any, are the
relationships among the things we say about these objects and about
other objects.

One can see from this that the LSID resolution metadata holds the most
interesting, useful, and potentially complex information about most
biodiversity digital objects and records in catalogs of physical
objects or events. The term metadata here is confusing to database
folks. It would have been close to the model in most people's head had
that stuff been called the data and the persistent stuff called
something else.

>>>From the beginning, and still, I've believed that adopting RDF as the
exchange format for the only interesting part of LSID resolution---the
metadata---is technically sweet but running much further ahead of the
TDWG membership than XML Schema did. This is at least in part because,
although RDF is over 10 years old, the enterprise tools are only now
emerging. Further, a lot of XML instance documents make sense both to
machine and human readers without much knowledge of XML-Schema,
whereas the corresponding thing for RDF is, in my opinion, much less
the case. I believe that for several more years TDWG communities will
need fairly high-powered programmers to turn  answers to the questions
(a-c) above into actual LSID resolvers and applications exploiting
them.

Meanwhile, my students and I are happy to join in funding proposals to
be the interim code jockeys. Hah, hah, just serious.

Bob
p.s.
I might be wrong about the time frame if Wasabi can or has replaced
Steve Perry, who I understand has gone to industry.

>
> So, to follow on that line: 'all' we, in INOTAXA, have to do is assign LSIDs within INOTAXA (temporarily at least); that we come up with the ontology for that in the Taxonomic literature interest group (but I assume that it will be better if they are similar to those for similar objects described by every other interest group since nearly everything that we will assign LSIDs to will relate to other interest groups); and that the resolver, once we have all this designed will be able to relate all of the things I referred to together?
>
> Finally, what does what you just said mean that Rich's question about whether the LSID applies to the specimen or the data record that describes it?  Following your logic, isn't it really better if we think of them each as having an LSID and making sure that we can bring all of them together somehow?  Or, perhaps the specimen does not have an 'official' LSID, but it should have some sort of GUID that allows the institution that holds them to link the specimen to the record that has the LSID (if only that were true--our Entomology Dept. gave up requiring GUIDs on specimens that match to records in the database--even for types--years ago and are only now starting to see that this was not a wise decision!).  In the latter case, clearly Rich needs to think of the LSID as applying to the record and not to the specimen, correct?
>
> Thanks,
> Anna
>
> Anna L. Weitzman, PhD
> Botanical and Biodiversity Informatics Research
> National Museum of Natural History
> Smithsonian Institution
>
> office: 202.633.0846
> mobile: 202.415.4684
> weitzman at si.edu
>
> ________________________________
>
> From: Bob Morris [mailto:morris.bob at gmail.com]
> Sent: Sat 02-Jun-07 3:29 PM
> To: Weitzman, Anna
> Cc: Richard Pyle; Paul Kirk; Jason Best; tdwg-guid at lists.tdwg.org
> Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]
>
>
>
> On 6/2/07, Weitzman, Anna <WEITZMAN at si.edu> wrote:
> >[... 7 examples omitted]
> >
> > Either each of these (1-7) will need to have its own LSID (or an equivalent in the case of the specimen itself) or they will all need to have the same LSID.  If the former, they will all have to resolve to the same parent LSID--is this for the specimen or the record in its home database?--in order for the overall biodiversity information system to really work.
> >
> >
>
> Two different objects cannot have the same LSID by definition. [This
> is more or less the sole overarching point of GUIDs].
>
> I don't know what is meant by "parent LSID", but TDWG requires that an
> LSID resolution service  return its metadata in RDF, the Resource
> Description Framework semantic web language. By its design, RDF is
> especially good at expressing relations between things it describes,
> so there is plenty of room for the LSID metadata to express whatever
> relations between these examples each of its resolution services might
> wish to. Furthermore, the emergent TDWG ontology standards (see
> TDWG-TAG) support some particularly convenient ways to do this,
> should the various interest groups be motivated to visit this
> question. That would be Good Thing,  so that different resolvers of
> similar objects might actually offer similar, or at least as to
> relations, easily comparable, metadata. Still, each subgroup is likely
> to need to thrash these issues out separately. The TCS group is
> historically ahead of everybody else in this regard, since they
> expressed a fixed set of relations among Taxon Concepts more or less
> ab initio.
>
>
>