[Tdwg-tag] Why we should not use LSID

Wed May 3 15:17:58 CEST 2006

Donald wrote_

> Considering SDD as an example, my interpretation of this is that the RDFS or
> other definitions for classes such as Character, State or Modifier should be
> accessible through URLs.  This does not mean that the instances of these
> classes (i.e. what I might call the individual SDD data elements) need
> themselves to be accessible in the same way.  
> 
> I do understand that Gregor is concerned about our ability to reason over
> the data as well as to validate the underlying documents.  However, if we do
> choose to use technologies such as OWL to support such reasoning, I do not
> believe that we can expect to reason over fully federated data.  Surely we
> would expect to resolve the data (through LSIDs, PURLs, or whatever else)
> and then process them locally?  We should certainly consider whether this is
> an issue, but we should keep it separate from the main issue identified by
> the TAG.

I fail to understand the distinction between class and instance. So far I 
understand that it is arbitrary, depending on the purpose. The class 
"SDD.Character" is expressed in an xml instance document (a w3c schema 
document, using instances of classes expressed in the w3c-schema-schema). So is 
the class "flower color" which is an instance (or "data" in Donald's sense) 
expressed in an SDD instance document. 

However, a taxon description makes use of "flower color" in the sense of a 
class definition, i.e. all descriptions using this term with a specific value 
are instances of the class flower color.

"Flower color" can be generalized to "color of flower-like structure". This 
often makes a lot of sense - e.g. in compositae (sunflower, etc.) many people 
will give answer about the inflorescence rather than about a color, and even 
botany students get confused about the cythium of Euphorbia). So we do want to 
make use of reasoning engines when processing taxon identification queries.

Exactly the same generalization relationships hold for taxa.

What I try to elicit here is that my perception is that those charged with 
developing the GBIF software use the distinction from their perspective - which 
is a good thing - but that it seems that we might be going down a way that 
prevents us from ever changing the perspective by requesting the use of LSIDs 
for what from the software development perspective is currently perceived an 
instance. If my information is correct, this would exactly prevent the use of 
standard reasoners to answer questions such as I posed in my talk prepared for 
TAG-1 (can not post to the email list, only 200 kB allowed). 

> I also suspect that LSIDs may be a really good way for us to handle many of
> our “controlled” vocabularies.  Obviously those vocabularies which make up
> the definition of classes and their properties may need URL access, but in
> many other contexts (including, I would have thought, SDD) it may be more
> sensible to treat the vocabulary terms as data objects.  This will allow us
> to extend them with all kinds of metadata.

> I would say that the major reason the GUID meetings avoided adopting PURLs
> was simply that they give us no clean separation between the identifier and
> the owner and location of the document.  LSIDs (provided we sort out
> appropriate best practices for how they are constructed) may, among other
> things, give us an intermediate layer we can conveniently manage to handle
> this.

Can you explain this. I believe a purl does exactly this. If I have 
purl.org/xyz it is an id, but the document may be anywhere and may in fact 
move.

Thanks!

Gregor----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn at bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Königin-Luise-Str. 19           Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203