On 6/2/07, Weitzman, Anna WEITZMAN@si.edu wrote:
Bob,
Thanks for the explanation.
Excuse my lack of knowledge about this--but I trying to understand this in a way that taxonomists like myself will need to (and I need to understand it in terms that I can use to explain it to other taxonomists). So much of what we are doing now in TDWG is so foreign to taxonomists, and I fear that you are going to completely leave us (even those of us who are 'relatively technically inclined' behind--which I don't think is helpful.
Use scenarios all have to come from the users, so things like your 1-7 are valuable.
Your explanation does help (though I think my calling it a parent LSID vs. resolving to something in RDF is somewhat semantic--if the resolver does not allow all of the things mentioned in 1-7 (and so on) to resolve to the same "something" that relates them all to the same 'parent' (a term that taxonomists will understand) specimen it really isn't going to work--but I assume that you CS guys have that sorted and I just need to read and ask more questions so that I can translate it somehow into terminology that I and other taxonomists understand).
I think which of your 7 things deserve GUIDS is not so much about GUIDS but about what the community wants to do with them, especially across applications. The simplest use of course is to make sure that two mentions are talking about the same thing. That's the minimal guarantee of GUIDS. If your data or document and my data or document mention the same GUID, then they must be referring to the same object (digital or physical). No resolution at all is needed for that use case. Anything further than that is basically a community issue. For example, if you want to use GUIDs to help guarantee that two Taxon Concepts are described in the same publication, then you need GUIDS on pubs and both TCs need to use the same GUID. [An object can have more than one GUID, but no two objects can have the same GUID]. But does your GUID resolution need to say on what page the description for the TC appears? Not necessarily.
More generally, in the case of LSIDs each community has to ask three questions, the first of which is usually---but not always---easy:
(a)Under what circumstances do we want to distinguish two objects from one another
(b)what information about an object do we want to say fundamentally will never change, for the entire future of the universe, even if there are no people, computers, or anything but black holes left; including the object, if it was physical;
(c)what information do we want to associate with an object that might change as our views about it---scientific or curatorial--- themselves might change.
For (a), the LSID spec requires a guarantee that nowhere, nohow, nowhen will the same LSID be issued twice. LSIDs resolvers are obliged to offer the stuff in (b) as LSID resolution data, but TDWG takes no position about how. LSID resolvers are required by TDWG to offer the stuff in (c) as LSID resolution metadata, represented in RDF. TDWG may ultimately require that this be done using the TDWG ontology. This is less onerous than it might seem because that is extensible. Or rather, it is no more onerous than RDF is in the first place. An important special case of (c) is the question of what, if any, are the relationships among the things we say about these objects and about other objects.
One can see from this that the LSID resolution metadata holds the most interesting, useful, and potentially complex information about most biodiversity digital objects and records in catalogs of physical objects or events. The term metadata here is confusing to database folks. It would have been close to the model in most people's head had that stuff been called the data and the persistent stuff called something else.
From the beginning, and still, I've believed that adopting RDF as the
exchange format for the only interesting part of LSID resolution---the metadata---is technically sweet but running much further ahead of the TDWG membership than XML Schema did. This is at least in part because, although RDF is over 10 years old, the enterprise tools are only now emerging. Further, a lot of XML instance documents make sense both to machine and human readers without much knowledge of XML-Schema, whereas the corresponding thing for RDF is, in my opinion, much less the case. I believe that for several more years TDWG communities will need fairly high-powered programmers to turn answers to the questions (a-c) above into actual LSID resolvers and applications exploiting them.
Meanwhile, my students and I are happy to join in funding proposals to be the interim code jockeys. Hah, hah, just serious.
Bob p.s. I might be wrong about the time frame if Wasabi can or has replaced Steve Perry, who I understand has gone to industry.
So, to follow on that line: 'all' we, in INOTAXA, have to do is assign LSIDs within INOTAXA (temporarily at least); that we come up with the ontology for that in the Taxonomic literature interest group (but I assume that it will be better if they are similar to those for similar objects described by every other interest group since nearly everything that we will assign LSIDs to will relate to other interest groups); and that the resolver, once we have all this designed will be able to relate all of the things I referred to together?
Finally, what does what you just said mean that Rich's question about whether the LSID applies to the specimen or the data record that describes it? Following your logic, isn't it really better if we think of them each as having an LSID and making sure that we can bring all of them together somehow? Or, perhaps the specimen does not have an 'official' LSID, but it should have some sort of GUID that allows the institution that holds them to link the specimen to the record that has the LSID (if only that were true--our Entomology Dept. gave up requiring GUIDs on specimens that match to records in the database--even for types--years ago and are only now starting to see that this was not a wise decision!). In the latter case, clearly Rich needs to think of the LSID as applying to the record and not to the specimen, correct?
Thanks, Anna
Anna L. Weitzman, PhD Botanical and Biodiversity Informatics Research National Museum of Natural History Smithsonian Institution
office: 202.633.0846 mobile: 202.415.4684 weitzman@si.edu
From: Bob Morris [mailto:morris.bob@gmail.com] Sent: Sat 02-Jun-07 3:29 PM To: Weitzman, Anna Cc: Richard Pyle; Paul Kirk; Jason Best; tdwg-guid@lists.tdwg.org Subject: Re: [tdwg-guid] First step in implementing LSIDs?[Scanned]
On 6/2/07, Weitzman, Anna WEITZMAN@si.edu wrote:
[... 7 examples omitted]
Either each of these (1-7) will need to have its own LSID (or an equivalent in the case of the specimen itself) or they will all need to have the same LSID. If the former, they will all have to resolve to the same parent LSID--is this for the specimen or the record in its home database?--in order for the overall biodiversity information system to really work.
Two different objects cannot have the same LSID by definition. [This is more or less the sole overarching point of GUIDs].
I don't know what is meant by "parent LSID", but TDWG requires that an LSID resolution service return its metadata in RDF, the Resource Description Framework semantic web language. By its design, RDF is especially good at expressing relations between things it describes, so there is plenty of room for the LSID metadata to express whatever relations between these examples each of its resolution services might wish to. Furthermore, the emergent TDWG ontology standards (see TDWG-TAG) support some particularly convenient ways to do this, should the various interest groups be motivated to visit this question. That would be Good Thing, so that different resolvers of similar objects might actually offer similar, or at least as to relations, easily comparable, metadata. Still, each subgroup is likely to need to thrash these issues out separately. The TCS group is historically ahead of everybody else in this regard, since they expressed a fixed set of relations among Taxon Concepts more or less ab initio.