In anticipation of actually being able to use LSIDs now that the TDWG GUID Applicability Statement is moving toward becoming a standard, I have been trying to figure out what exactly I would be putting in RDF for particular types of LSIDs that I would be likely to create. In particular, I have been thinking about the Recommendation "Objects in the biodiversity information domain that are identified by an LSID should be typed using the TDWG ontology or other accepted vocabularies in accordance with the TDWG common architecture." and how it would apply to the types of resources for which I want to create LSIDs. There are three types of things that I'm concerned about: herbarium specimens, digital still images (which could be either images of individual organisms in the wild or images of herbarium specimens), and individuals (in the sense of a URIref object of dwc:individualID).
(The following examples assume the standard namespace declarations.)
1. For an herbarium specimen, it seems like I could use either: <rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/TaxonOccurrence%22/%3E or <rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/Specimen%22/%3E
In the spirit of the new DwC standard, it seems like the first alternative would be better since specimen-related elements are now categorized by the more generic term Occurrence. I could then assert the physical nature of the specimen by <dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject%22/%3E
2. For a digital image of either a live plant in the wild or of an herbarium specimen, I could use: <rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/TaxonOccurrence%22/%3E or <rdfs:type rdf:resource="http://rs.tdwg.org/ontology/voc/DigitalImage%22/%3E
Again in the spirit of the new DwC standard, the first alternative would seem to be preferable, with <dcterms:type rdf:resource="http://purl.org/dc/dcmitype/StillImage%22/%3E to establish that the resource was an image. However, if I do that there is nothing in the RDF to indicate that the LSID refers to a digital image rather than a 35mm slide gathering dust somewhere. Under that option, it seems like the only way that an application receiving a resolution of the LSID will know whether the still image is digital or film (physical) is to try a getData() call and succeed, or to possibly include information about file format.
3. For the individual organism (let's say a particular tree in my back yard), I am at a loss of what to use for rdfs:type. None of the types in the TDWG ontology seem right for this. There should be a type for it if I'm going to use dwc:individualID as a property of my live plant images (which I intend to do) given that dwc:individualID "may be a golbal unique identifier...". I can assert the physical nature of the tree by <dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject%22/%3E but that really only tells me that I can't retrieve the tree over the Web and does nothing to let me know that I'm talking about the tree and not a specimen from it or (film) picture of it.
I could impart additional information by using basisOfRecord. That would clarify what kind of Occurrence the herbarium specimen was (PreservedSpecimen), but would not help with knowing that the images were digital [i.e. that they would return data for a getData() call], since the suggested value of basisOfRecord for a digital image would be "StillImage" - the same information that's in the dcterms:type statement. It seems to me that basisOfRecord might not even be appropriately applied to individual organisms, since the example values of basisOfRecord seem to provide information about the nature of the token that we use to know that an organism was present somewhere, rather than that the resource is the organism itself. But it seems from previous discussion that basisOfRecord can mean pretty much anything that the people using it decide it should mean, so maybe I could just start using "Individual" as a value for basisOfRecord.
I hope that there is a relatively straightforward answer to this question. I'm talking about three very common and uncomplicated things that as a very practical matter need to be labeled with LSIDs. If the typing of LSIDs is really intended to achieve this goal: "Machine and human clients that retrieve the metadata associated with an LSID will use the associated typing information to decide how to process the metadata and any associated data.", then the object of a <rdfs:type rdf:resource="http://xxx%22/%3E statement should allow the client to make this decision in a clear and uncomplicated way. A client resolving the LSID of a specimen should know to expect typical metadata about specimens but know that there is nothing deliverable via the web. A client resolving the LSID of an individual should know to expect metadata that might make it possible to put a dot on a map and to locate any specimens or images that exist for that individual. A client resolving the LSID of a digital image should expect that there will be a (data) representation for a human to look at on a computer monitor and be able to access metadata that would tell them how they would be allowed to use that image if they wanted. It is not clear to me how this goal (deciding how to process data/metadata) would be achieved using the current TDWG ontology. The creation of an "http://rs.tdwg.org/ontology/voc/Individual" class might help at least in the case of individuals in the environment.
As a representative of many non-IT-trained potential users of LSIDs and as a TDWG "outsider", I urge the TAG to provide clear and simple answers to this and other questions related to how we can as soon as possible start using HTTP URIs and LSIDs. The TDWG GUID Applicability Statement goes a long way to achieving this, but the requirement that GUIDs be resolvable before they can be assigned creates a technical burden that could effectively "shut out" small players from using them. I know few herbarium curators and plant photographers who are willing to put in the kind of time that I have to try to understand how HTTP URIs/LSIDs work, and even I still have major questions.
Thanks for all of the help and hard work that has been put into the development of these tools. Steve Baskauf
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu