[tdwg-content] Idea for Discussion, Differentiating between "type's" of identifiers

Steve Baskauf steve.baskauf at vanderbilt.edu
Mon Oct 4 17:41:20 CEST 2010


Although this specific example deals with taxonomic name identifiers, it 
is related to a previous discussion on this list about how we should use 
the dwc:xxxxxID terms and other terms (such as recordedBy and 
identifiedBy) that could have either a string (literal) or URI form.  
Although I don't really want to see an unnecessary proliferation of 
Darwin Core terms, I think that in the interest of clarity (particularly 
where RDF is involved) there either should be multiple terms that make 
it clear what form of identifier is expected, or else there should be an 
understanding that in RDF the default for such a term is a URI which 
would then have an rdfs:Label property which was the string form.  I 
think the former would be preferable to the latter. 

I came to this opinion when trying to write RDF describing an herbarium 
specimen.  The collector should be the dwc:recordedBy property of the 
specimen.  Optimally, there would be a database in which known 
collectors were assigned URIs so that "Glen N. Montz", "Glen Montz", "G. 
N. Montz", etc. would all be different labels for the same resource.  
However, realistically, I'm not going to drop what I'm doing to set up 
such a database (even if I were capable of doing it, which I'm not).  So 
I ended up just writing it as <dwc:recordedBy>Glen N. 
Montz</dwc:recordedBy> even though I knew it wasn't probably the best 
thing.  In a large Occurrence database that was compiled from the RDF 
created by a lot of people, there might end up being a mixture of 
strings and URIs for dwc:recordedBy properties of the specimens.  It 
seems to me like it would be better to have properties like 
dwc:recordedBy for strings and dwc:recordedByURI for a corresponding URI 
(and I suppose dwc:recordedByLSID if anyone wants to use it).  Of 
course, this would require a number of term additions to DwC and 
clarification in the DwC documentation that the generic version was 
intended for strings. 

With respect to the example
<dwc:hasScientificNameLSID 
rdf:resource="urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>
I think you are right that (with the possible exception of rdfs:seeAlso) 
there is an expectation that an rdf:resource attribute will be a 
resolvable URI that produces RDF.  So
<dwc:hasScientificNameLSID>urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010</dwc:hasScientificNameLSID>
is probably better.

Steve

Peter DeVries wrote:
> I have been thinking about the following pattern. In part after 
> looking at the GBIF vocabulary.
>
> I am not sure if it is even a good idea but might be worth some 
> discussion.
>
> For those fields that have both a string and "ID" form maybe the 
> following pattern might be useful
>
> hasScientificName = string form
> hasScientificNameURI = Resolvable LOD compliant identifier
> hasScientificNameLSID = LSID identifier which could be resolvable once 
> you add the "http:proxy" etc.
>
> This allows all three forms to be included if desired, it also 
> provides a hint as to how the field should be interpreted or resolved.
>
> One group could also provide a mapping service so that each record 
> does not need to include all three forms, but would allow systems
> to find the matching LSID for a given URI or vs. versa.
>
> My concern was that it would be difficult to infer how a 
> scientificNameID should be interpreted by other systems.
>
> Is this an LSD, is it a URI, is it a UUID etc. ?
>
> This impacts the structure of the RDF.
>
> * Note that the actual identifiers might not be correct, the example 
> below is more about the form of the RDF
> * For instance, I don't think it is probably correct to see the COL 
> LSID as just a namestring
> * Also in this example the GNI name does not exactly match the string name
>
> <dwc:hasScientificName>Puma concolor (Linnaeus 
> 1771)</dwc:hasScientificName>
> <dwc:hasScientificNameURI 
> rdf:resource="http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8"/>
> <dwc:hasScientificNameLSID 
> rdf:resource="urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>
>
> Some system may choke on the LSID form assuming that it uses a 
> standard resolution mechanism
>
> So it might be best to use this form
>
> <dwc:hasScientificNameLSID>urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010</dwc:hasScientificNameLSID>
>
> - Pete
>
> ----------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / 
> GeoSpecies Knowledge Base <http://lod.geospecies.org/>
> About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
> ------------------------------------------------------------

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101004/6a024264/attachment.html 
-------------- next part --------------
_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content


More information about the tdwg-content mailing list