[tdwg-tag] Typing in LSID RDF
Steve Baskauf
steve.baskauf at vanderbilt.edu
Wed Dec 2 17:50:41 CET 2009
In anticipation of actually being able to use LSIDs now that the TDWG
GUID Applicability Statement is moving toward becoming a standard, I
have been trying to figure out what exactly I would be putting in RDF
for particular types of LSIDs that I would be likely to create. In
particular, I have been thinking about the Recommendation "Objects in
the biodiversity information domain that are identified by an LSID
should be typed using the TDWG ontology or other accepted vocabularies
in accordance with the TDWG common architecture." and how it would apply
to the types of resources for which I want to create LSIDs. There are
three types of things that I'm concerned about: herbarium specimens,
digital still images (which could be either images of individual
organisms in the wild or images of herbarium specimens), and individuals
(in the sense of a URIref object of dwc:individualID).
(The following examples assume the standard namespace declarations.)
1. For an herbarium specimen, it seems like I could use either:
<rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/TaxonOccurrence"/>
or
<rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/Specimen"/>
In the spirit of the new DwC standard, it seems like the first
alternative would be better since specimen-related elements are now
categorized by the more generic term Occurrence. I could then assert
the physical nature of the specimen by
<dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/>
2. For a digital image of either a live plant in the wild or of an
herbarium specimen, I could use:
<rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/TaxonOccurrence"/>
or
<rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/DigitalImage"/>
Again in the spirit of the new DwC standard, the first alternative would
seem to be preferable, with
<dcterms:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
to establish that the resource was an image. However, if I do that
there is nothing in the RDF to indicate that the LSID refers to a
digital image rather than a 35mm slide gathering dust somewhere. Under
that option, it seems like the only way that an application receiving a
resolution of the LSID will know whether the still image is digital or
film (physical) is to try a getData() call and succeed, or to possibly
include information about file format.
3. For the individual organism (let's say a particular tree in my back
yard), I am at a loss of what to use for rdfs:type. None of the types
in the TDWG ontology seem right for this. There should be a type for it
if I'm going to use dwc:individualID as a property of my live plant
images (which I intend to do) given that dwc:individualID "may be a
golbal unique identifier...".
I can assert the physical nature of the tree by
<dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/>
but that really only tells me that I can't retrieve the tree over the
Web and does nothing to let me know that I'm talking about the tree and
not a specimen from it or (film) picture of it.
I could impart additional information by using basisOfRecord. That
would clarify what kind of Occurrence the herbarium specimen was
(PreservedSpecimen), but would not help with knowing that the images
were digital [i.e. that they would return data for a getData() call],
since the suggested value of basisOfRecord for a digital image would be
"StillImage" - the same information that's in the dcterms:type
statement. It seems to me that basisOfRecord might not even be
appropriately applied to individual organisms, since the example values
of basisOfRecord seem to provide information about the nature of the
token that we use to know that an organism was present somewhere, rather
than that the resource is the organism itself. But it seems from
previous discussion that basisOfRecord can mean pretty much anything
that the people using it decide it should mean, so maybe I could just
start using "Individual" as a value for basisOfRecord.
I hope that there is a relatively straightforward answer to this
question. I'm talking about three very common and uncomplicated things
that as a very practical matter need to be labeled with LSIDs. If the
typing of LSIDs is really intended to achieve this goal: "Machine and
human clients that retrieve the metadata associated with an LSID will
use the associated typing information to decide how to process the
metadata and any associated data.", then the object of a <rdfs:type
rdf:resource="http://xxx"/> statement should allow the client to make
this decision in a clear and uncomplicated way. A client resolving the
LSID of a specimen should know to expect typical metadata about
specimens but know that there is nothing deliverable via the web. A
client resolving the LSID of an individual should know to expect
metadata that might make it possible to put a dot on a map and to locate
any specimens or images that exist for that individual. A client
resolving the LSID of a digital image should expect that there will be a
(data) representation for a human to look at on a computer monitor and
be able to access metadata that would tell them how they would be
allowed to use that image if they wanted. It is not clear to me how
this goal (deciding how to process data/metadata) would be achieved
using the current TDWG ontology. The creation of an
"http://rs.tdwg.org/ontology/voc/Individual" class might help at least
in the case of individuals in the environment.
As a representative of many non-IT-trained potential users of LSIDs and
as a TDWG "outsider", I urge the TAG to provide clear and simple answers
to this and other questions related to how we can as soon as possible
start using HTTP URIs and LSIDs. The TDWG GUID Applicability Statement
goes a long way to achieving this, but the requirement that GUIDs be
resolvable before they can be assigned creates a technical burden that
could effectively "shut out" small players from using them. I know few
herbarium curators and plant photographers who are willing to put in the
kind of time that I have to try to understand how HTTP URIs/LSIDs work,
and even I still have major questions.
Thanks for all of the help and hard work that has been put into the
development of these tools.
Steve Baskauf
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
More information about the tdwg-tag
mailing list