[tdwg-tag] Typing in LSID RDF

Steve Baskauf steve.baskauf at vanderbilt.edu
Wed Dec 2 17:50:41 CET 2009


In anticipation of actually being able to use LSIDs now that the TDWG 
GUID Applicability Statement is moving toward becoming a standard, I 
have been trying to figure out what exactly I would be putting in RDF 
for particular types of LSIDs that I would be likely to create.  In 
particular, I have been thinking about the Recommendation "Objects in 
the biodiversity information domain that are identified by an LSID 
should be typed using the TDWG ontology or other accepted vocabularies 
in accordance with the TDWG common architecture." and how it would apply 
to the types of resources for which I want to create LSIDs.  There are 
three types of things that I'm concerned about: herbarium specimens, 
digital still images (which could be either images of individual 
organisms in the wild or images of herbarium specimens), and individuals 
(in the sense of a URIref object of dwc:individualID).

(The following examples assume the standard namespace declarations.)

1. For an herbarium specimen, it seems like I could use either:
<rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/TaxonOccurrence"/>
or
<rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/Specimen"/>

In the spirit of the new DwC standard, it seems like the first 
alternative would be better since specimen-related elements are now 
categorized by the more generic term Occurrence.  I could then assert 
the physical nature of the specimen by
<dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/>

2. For a digital image of either a live plant in the wild or of an 
herbarium specimen, I could use:
<rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/TaxonOccurrence"/>
or
<rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/DigitalImage"/>

Again in the spirit of the new DwC standard, the first alternative would 
seem to be preferable, with
<dcterms:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
to establish that the resource was an image.  However, if I do that 
there is nothing in the RDF to indicate that the LSID refers to a 
digital image rather than a 35mm slide gathering dust somewhere.  Under 
that option, it seems like the only way that an application receiving a 
resolution of the LSID will know whether the still image is digital or 
film (physical) is to try a getData() call and succeed, or to possibly 
include information about file format. 

3. For the individual organism (let's say a particular tree in my back 
yard), I am at a loss of what to use for rdfs:type.  None of the types 
in the TDWG ontology seem right for this.  There should be a type for it 
if I'm going to use dwc:individualID as a property of my live plant 
images (which I intend to do) given that dwc:individualID "may be a 
golbal unique identifier...". 
I can assert the physical nature of the tree by
<dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/>
but that really only tells me that I can't retrieve the tree over the 
Web and does nothing to let me know that I'm talking about the tree and 
not a specimen from it or (film) picture of it.

I could impart additional information by using basisOfRecord.  That 
would clarify what kind of Occurrence the herbarium specimen was 
(PreservedSpecimen), but would not help with knowing that the images 
were digital [i.e. that they would return data for a getData() call], 
since the suggested value of basisOfRecord for a digital image would be 
"StillImage" - the same information that's in the dcterms:type 
statement.  It seems to me that basisOfRecord might not even be 
appropriately applied to individual organisms, since the example values 
of basisOfRecord seem to provide information about the nature of the 
token that we use to know that an organism was present somewhere, rather 
than that the resource is the organism itself.  But it seems from 
previous discussion that basisOfRecord can mean pretty much anything 
that the people using it decide it should mean, so maybe I could just 
start using "Individual" as a value for basisOfRecord. 

I hope that there is a relatively straightforward answer to this 
question.  I'm talking about three very common and uncomplicated things 
that as a very practical matter need to be labeled with LSIDs.  If the 
typing of LSIDs is really intended to achieve this goal:  "Machine and 
human clients that retrieve the metadata associated with an LSID will 
use the associated typing information to decide how to process the 
metadata and any associated data.", then the object of a <rdfs:type 
rdf:resource="http://xxx"/> statement should allow the client to make 
this decision in a clear and uncomplicated way.  A client resolving the 
LSID of a specimen should know to expect typical metadata about 
specimens but know that there is nothing deliverable via the web.  A 
client resolving the LSID of an individual should know to expect 
metadata that might make it possible to put a dot on a map and to locate 
any specimens or images that exist for that individual.  A client 
resolving the LSID of a digital image should expect that there will be a 
(data) representation for a human to look at on a computer monitor and 
be able to  access metadata that would tell them how they would be 
allowed to use that image if they wanted.  It is not clear to me how 
this goal (deciding how to process data/metadata) would be achieved 
using the current TDWG ontology.  The creation of an 
"http://rs.tdwg.org/ontology/voc/Individual" class might help at least 
in the case of individuals in the environment.

As a representative of many non-IT-trained potential users of LSIDs and 
as a TDWG "outsider", I urge the TAG to provide clear and simple answers 
to this and other questions related to how we can as soon as possible 
start using HTTP URIs and LSIDs.  The TDWG GUID Applicability Statement 
goes a long way to achieving this, but the requirement that GUIDs be 
resolvable before they can be assigned creates a technical burden that 
could effectively "shut out" small players from using them.  I know few 
herbarium curators and plant photographers who are willing to put in the 
kind of time that I have to try to understand how HTTP URIs/LSIDs work, 
and even I still have major questions.

Thanks for all of the help and hard work that has been put into the 
development of these tools.
Steve Baskauf

--  
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu




More information about the tdwg-tag mailing list