[tdwg-tag] string literals vs. uris for dwc:recordedBy, dwc:identifiedBy, and dwc:georeferencedBy in RDF

John Wieczorek tuco at berkeley.edu
Sat May 29 17:45:46 CEST 2010


Dear all,

Sorry to have to jump in to the discussion late. I'm just back out of
the field in Jujuy where the winds took out the little connectivity
that once existed.

I'm stepping back to the beginning of the conversation, out of the RDF
representation issue, which seems to reasonable resolved under the
thread "Darwin Core generic XML with attributes."

The Darwin Core terms Steve has brought to attention are among a
larger set (relationshipAccordingTo, georeferencedBy,
measurementDeterminedBy, recordedBy, identifiedBy) that all are
intended to cover the real world situation of data coming out of
database fields for which there was no sense of persistent global
unique identification. They were meant for literal strings for
dcterms:Agents. There is nothing to proscribe the use of URIs in these
fields. However, though there has been a lot of discussion about how
one might represent the URI and string literal information about the
recordedBy Agent, there has been no statement of consensus in the TAG
from discussions in this thread and the related thread "Darwin Core
generic XML with attributes" about whether actionable GUIDs for these
same concepts should have their own terms.

As I see it, we got this far:

In the thread "Darwin Core generic XML with attributes" Rod Page said:

"If you want a literal for <dwc:recordedBy> (say for ease of display)
then I think you want a different tag that is expressly defined to do
just that. For example,
http://rs.tdwg.org/ontology/voc/TaxonOccurrence
 has <identifiedTo>  to point to a URI for a taxon, and
<identifiedToString> if you want the literal. I don't know if dwc has
anything equivalent for recordedBy (and can somebody please tell me
why we now have so many vocabularies for the same things?)"

I think what Rod proposes is cleaner than using the same term for two
purposes, as much as I too would like to have my cake and eat it. I
agree with Rod, except that dwc:recordedBy is the string literal
version of the concept, and no URI version (something like
dwc:recordedByID refines dcterms:identifier) currently exists. Up to
now there has not been a need expressed, hence, no such term was
included in DwC.

In answer to Rod's last question, we have many vocabularies
(unfinished attempts), but only one that is is a ratified standard.
That standard isn't an ontology, nor is it complete in describing best
practices for expressing information in all representations. It's
reasonably clear for how to share information in text files, and about
the recommendations for XML, but in terms of RDF it is silent,
awaiting the badly needed ontology work.

Reviewing DwC in light of these discussions, I noted the following:

1) The set of terms (relationshipAccordingTo, georeferencedBy,
measurementDeterminedBy, recordedBy, identifiedBy) all currently
refine an abstract term dwc:accordingTo, defined as "Abstract term to
attribute information to a source." This refinement doesn't really do
anything useful in my opinion. I think the term dwc:accordingTo should
be dropped, and the terms listed above should instead refine
dcterms:contributor.

2) For one of these concepts (nameAccordingTo) it was anticipated that
there would be actionable GUIDs and the term nameAccordingToID was
added to clarify the distinction. nameAccordingTo currently also
refines dwc:accordingTo, but I think this is erroneous, as it is meant
to refer to a publication, not to a dcterms:Agent. This problem will
be resolved if dwc:accordingTo is dropped and nameAccordingTo is
changed to have no refines value.

3) In cases where actionable GUIDs were anticipated to be commonly
used, terms with names ending with "ID" were added specifically for
this purpose (relatedResourceID, resourceID, resourceRelationshipID,
higherGeographyID, parentNameUsageID, nameAccordingToID,
originalNameUsageID, eventID, individualID, occurrenceID,
geologicalContextID, namePublishedInID, measurementID,
acceptedNameUsageID, scientificNameID, locationID, identificationID,
datasetID, taxonConceptID, taxonID). All of these terms refine
dcterms:identifier and many of them are examples illustrating Rod's
recommendation to separate identifiers from string literal
representations of the concepts. If it is deemed that further "ID"
terms are needed, then a public discussion must take place on the
tdwg-content list.

Dropping dwc:accordingTo, and changing the refinements requires the
procedure under section 3.3 of the Darwin Core Namespace Policy
(http://rs.tdwg.org/dwc/terms/namespace/index.htm#classesofchanges).
I'll follow the protocol for changes for items 1 and 2 above, and
await further discussion on item 3.

Cheers,

John

On Wed, May 19, 2010 at 10:00 PM, Bob Morris <morris.bob at gmail.com> wrote:
> As Peter Ansell points out, another solution seems to be to use two
> dwc:recordedBy elements:
>
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>                 xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
>                 >
> <dwc:Occurrence rdf:about="http://herbarium.org/hb123456">
> <dwc:recordedBy
> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me"/>
> <dwc:recordedBy >Steve Baskauf</dwc:recordedBy>
> </dwc:Occurrence>
>
>
>
> I don't think the parse error is about the use of rdf:resource per se.
>  I must confess that I'm having trouble understanding how to conclude
> from the RDF spec why the original is a parse error, but these things
> usually can be understood by trying to show what the triples are, not
> by staring at the spec for some rdf tag.  I'm finding RDF/XML
> increasingly irksome and unpleasant in this kind of task. If one
> unwinds it here,  I bet it is that your failing example has subject
> "http://herbarium.org/hb123456", predicate dwc:recordedBy but is
> trying to have  two objects.
>
> The fact that you want to associate the URI for you with the string
> for you is probably a separate problem, and possibly not expressible
> within dwc itself.
>
> Bob
>
>
> On Wed, May 19, 2010 at 8:59 PM, Steve Baskauf
> <steve.baskauf at vanderbilt.edu> wrote:
>> In the specific case of RDF, having your cake and eating it doesn't work.
>> Paste this:
>>
>> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>                  xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
>>                  >
>> <dwc:Occurrence rdf:about="http://herbarium.org/hb123456">
>> <dwc:recordedBy
>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me">Steve
>> Baskauf</dwc:recordedBy>
>> </dwc:Occurrence>
>> </rdf:RDF>
>>
>> into the W3C RDF validator at:
>> http://www.w3.org/RDF/Validator/
>> and it will tell you "The attributes on this property element, are not
>> permitted with any content; expecting end element tag.".  So in RDF elements
>> having the rdf:resource attribute have to be empty elements.  I tried
>> validating an example where the recordedBy property was included twice, once
>> with a URI object and once with a string literal object.  It validated as
>> "good" RDF, but I think it would be confusing to a linked data client that
>> would really have no clue that both objects represented the same thing and
>> would probably "assume" that the occurrence was recorded by two entities
>> rather than one..
>>
>> A possible solution would be to use dcterms:description as another
>> attribute.  dcterms:description is defined as "An account of the resource.
>> Description may include ...a free-text account of the resource."  I couldn't
>> find a more appropriate Dublin Core to use as an attribute.  So running this
>> example:
>>
>> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>                 xmlns:dcterms="http://purl.org/dc/terms/"
>>                  xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
>>                  >
>> <dwc:Occurrence rdf:about="http://herbarium.org/hb123456">
>> <dwc:recordedBy
>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me"
>> dcterms:description="Steve Baskauf" />
>> </dwc:Occurrence>
>> </rdf:RDF>
>>
>> through the validator shows that this RDF asserts the following triples:
>> http://herbarium.org/hb123456  dwc:recordedBy
>> http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
>> and that
>> http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
>> dcterms:description  "Steve Baskauf"
>>
>> In other words, the occurrence was recorded by me (identified by my URI) and
>> that the description of the thing represented by my URI is "Steve Baskauf".
>> That is pretty much a correct representation of the situation, although the
>> whole point of using a URI as the object of a property is for a client to
>> dereference the URI to find out more about the object.  The FOAF file
>> (pointed to by the URI) would provide that information without the
>> dcterms:description attribute.
>>
>> Steve
>>
>>
>> Jim Croft wrote:
>>
>> wondering if
>> <dwc:recordedBy
>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me">Steve
>> Baskauf</dwc:recordedBy>
>> is legit?
>>
>> just a have your cake and eat it kinda guy...
>>
>> jim
>>
>> On Thu, May 20, 2010 at 7:41 AM, Kevin Richards
>> <RichardsK at landcareresearch.co.nz> wrote:
>>
>>
>> From my understanding (and after reading the example Bob referred to), the
>> difference is:
>>
>> [referring to external id]
>> <dwc:recordedBy
>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me" />
>>
>> [inline text]
>> <dwc:recordedBy>Steve Baskauf</dwc:recordedBy>
>>
>> Look right?
>>
>> Kevin
>>
>> -----Original Message-----
>> From: tdwg-tag-bounces at lists.tdwg.org
>> [mailto:tdwg-tag-bounces at lists.tdwg.org] On Behalf Of Jim Croft
>> Sent: Thursday, 20 May 2010 9:37 a.m.
>> To: Bob Morris
>> Cc: tdwg-tag at lists.tdwg.org
>> Subject: Re: [tdwg-tag] string literals vs. uris for dwc:recordedBy,
>> dwc:identifiedBy, and dwc:georeferencedBy in RDF
>>
>> Hi Bob - should the same term allow both types of content, or should
>> there be a different term for each?  Does it matter?  Should
>> applications be smart enough to tell the difference and know what to
>> do with it?
>>
>> Not really asking what the specification says, but about purity and
>> wholesomeness of design... :)
>>
>> jim
>>
>> On Thu, May 20, 2010 at 4:26 AM, Bob Morris <morris.bob at gmail.com> wrote:
>>
>>
>> Exactly this example is given in
>> http://web4.w3.org/TR/REC-rdf-syntax/#section-Syntax-property-attributes
>> so I would find it regrettable if DwC does something somewhere that
>> makes this substitution impossible or discouraged,  or encourages tool
>> construction that does so, or encourages documention be interpreted in
>> a way that does so.
>>
>> Indeed http://rs.tdwg.org/dwc/rdf/dwcterms.rdf defines its type to be
>> rdf:Property and is silent on any semantics  but that. My own
>> conclusion is that neither the intent or the outcome of the rdf
>> version of dwcterms discourages what you want, though I suppose the
>> intent part would be clearer if the documentation also said that a URI
>> can always be used, but applications are responsible for interpreting
>> it.
>>
>>
>> On Wed, May 19, 2010 at 11:09 AM, Steve Baskauf
>> <steve.baskauf at vanderbilt.edu> wrote:
>>
>>
>> The definition for the Darwin Core term recordedBy
>> http://rs.tdwg.org/dwc/terms/index.htm#recordedBy
>> says "A list (concatenated and separated) of names ...".  The examples
>> given are string literals.  However, when using this term as a predicate
>> in RDF, it would seem preferable to use a URI to an RDF representation
>> of the entity (if one exists) rather than a string literal.  For
>> example, can I use:
>> <dwc:recordedBy
>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me"/>
>> rather than
>> <dwc:recordedBy>Steven J. Baskauf</dwc:recordedBy>
>> ?
>>
>> Steve Baskauf
>> --
>>
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu
>>
>> _______________________________________________
>> tdwg-tag mailing list
>> tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>
>>
>>
>>
>> --
>> Robert A. Morris
>> Emeritus Professor  of Computer Science
>> UMASS-Boston
>> 100 Morrissey Blvd
>> Boston, MA 02125-3390
>> Associate, Harvard University Herbaria
>> email: ram at cs.umb.edu
>> web: http://bdei.cs.umb.edu/
>> web: http://etaxonomy.org/FilteredPush
>> http://www.cs.umb.edu/~ram
>> phone (+1)617 287 6466
>> _______________________________________________
>> tdwg-tag mailing list
>> tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>
>>
>>
>> --
>> _________________
>> Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~
>> http://www.google.com/profiles/jim.croft
>> 'A civilized society is one which tolerates eccentricity to the point
>> of doubtful sanity.'
>>  - Robert Frost, poet (1874-1963)
>> _______________________________________________
>> tdwg-tag mailing list
>> tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>
>> Please consider the environment before printing this email
>> Warning:  This electronic message together with any attachments is
>> confidential. If you receive it in error: (i) you must not read, use,
>> disclose, copy or retain it; (ii) please contact the sender immediately by
>> reply email and then delete the emails.
>> The views expressed in this email may not be those of Landcare Research New
>> Zealand Limited. http://www.landcareresearch.co.nz
>>
>>
>>
>>
>>
>> --
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu
>>
>
>
>
> --
> Robert A. Morris
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> Associate, Harvard University Herbaria
> email: ram at cs.umb.edu
> web: http://bdei.cs.umb.edu/
> web: http://etaxonomy.org/FilteredPush
> http://www.cs.umb.edu/~ram
> phone (+1)617 287 6466
> _______________________________________________
> tdwg-tag mailing list
> tdwg-tag at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>



More information about the tdwg-tag mailing list