[tdwg-content] Idea for Discussion, Differentiating between "type's" of identifiers
Steve Baskauf
steve.baskauf at vanderbilt.edu
Mon Oct 4 17:41:20 CEST 2010
Although this specific example deals with taxonomic name identifiers, it
is related to a previous discussion on this list about how we should use
the dwc:xxxxxID terms and other terms (such as recordedBy and
identifiedBy) that could have either a string (literal) or URI form.
Although I don't really want to see an unnecessary proliferation of
Darwin Core terms, I think that in the interest of clarity (particularly
where RDF is involved) there either should be multiple terms that make
it clear what form of identifier is expected, or else there should be an
understanding that in RDF the default for such a term is a URI which
would then have an rdfs:Label property which was the string form. I
think the former would be preferable to the latter.
I came to this opinion when trying to write RDF describing an herbarium
specimen. The collector should be the dwc:recordedBy property of the
specimen. Optimally, there would be a database in which known
collectors were assigned URIs so that "Glen N. Montz", "Glen Montz", "G.
N. Montz", etc. would all be different labels for the same resource.
However, realistically, I'm not going to drop what I'm doing to set up
such a database (even if I were capable of doing it, which I'm not). So
I ended up just writing it as <dwc:recordedBy>Glen N.
Montz</dwc:recordedBy> even though I knew it wasn't probably the best
thing. In a large Occurrence database that was compiled from the RDF
created by a lot of people, there might end up being a mixture of
strings and URIs for dwc:recordedBy properties of the specimens. It
seems to me like it would be better to have properties like
dwc:recordedBy for strings and dwc:recordedByURI for a corresponding URI
(and I suppose dwc:recordedByLSID if anyone wants to use it). Of
course, this would require a number of term additions to DwC and
clarification in the DwC documentation that the generic version was
intended for strings.
With respect to the example
<dwc:hasScientificNameLSID
rdf:resource="urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>
I think you are right that (with the possible exception of rdfs:seeAlso)
there is an expectation that an rdf:resource attribute will be a
resolvable URI that produces RDF. So
<dwc:hasScientificNameLSID>urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010</dwc:hasScientificNameLSID>
is probably better.
Steve
Peter DeVries wrote:
> I have been thinking about the following pattern. In part after
> looking at the GBIF vocabulary.
>
> I am not sure if it is even a good idea but might be worth some
> discussion.
>
> For those fields that have both a string and "ID" form maybe the
> following pattern might be useful
>
> hasScientificName = string form
> hasScientificNameURI = Resolvable LOD compliant identifier
> hasScientificNameLSID = LSID identifier which could be resolvable once
> you add the "http:proxy" etc.
>
> This allows all three forms to be included if desired, it also
> provides a hint as to how the field should be interpreted or resolved.
>
> One group could also provide a mapping service so that each record
> does not need to include all three forms, but would allow systems
> to find the matching LSID for a given URI or vs. versa.
>
> My concern was that it would be difficult to infer how a
> scientificNameID should be interpreted by other systems.
>
> Is this an LSD, is it a URI, is it a UUID etc. ?
>
> This impacts the structure of the RDF.
>
> * Note that the actual identifiers might not be correct, the example
> below is more about the form of the RDF
> * For instance, I don't think it is probably correct to see the COL
> LSID as just a namestring
> * Also in this example the GNI name does not exactly match the string name
>
> <dwc:hasScientificName>Puma concolor (Linnaeus
> 1771)</dwc:hasScientificName>
> <dwc:hasScientificNameURI
> rdf:resource="http://gni.globalnames.org/name_strings/6c3dc35f-d901-5cc5-b9c8-ad241069b9f8"/>
> <dwc:hasScientificNameLSID
> rdf:resource="urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>
>
> Some system may choke on the LSID form assuming that it uses a
> standard resolution mechanism
>
> So it might be best to use this form
>
> <dwc:hasScientificNameLSID>urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010</dwc:hasScientificNameLSID>
>
> - Pete
>
> ----------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> TaxonConcept Knowledge Base <http://www.taxonconcept.org/> /
> GeoSpecies Knowledge Base <http://lod.geospecies.org/>
> About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
> ------------------------------------------------------------
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101004/6a024264/attachment.html
-------------- next part --------------
_______________________________________________
tdwg-content mailing list
tdwg-content at lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
More information about the tdwg-content
mailing list