[tdwg-tag] string literals vs. uris for dwc:recordedBy, dwc:identifiedBy, and dwc:georeferencedBy in RDF

Steve Baskauf steve.baskauf at vanderbilt.edu
Tue Jun 1 15:12:22 CEST 2010


John et al.,
I would cast a vote in favor of adding the "ID" forms for 
dwc:recordedBy, etc.(i.e. dwc:recordedByID).  This would provide a 
straightforward and semantically clear way of differentiating between 
the string literal and URI forms of the terms.  I think that this is 
particularly important if the existing terms are intended to be used 
with string literals. 

In addition to the clarity that this would add, it would also make use 
of the metadata easier.  Earlier in one of the threads, I mentioned 
using the RDF file associated with a GUID as a data source for creating 
XHTML for the human-readable representation through XSLT or AJAX.  I 
have been playing around with this idea a little bit.  In the source RDF 
file:
http://bioimages.vanderbilt.edu/baskauf/66921.rdf
I applied the suggestion we discussed of representing the terms such as 
dwc:recordedBy as URIs, then labeling the URIs by using the label 
property in a description about the URI.
I am a novice XSLT user, so there may be a much more straightforward 
method, but in my test:
http://bioimages.vanderbilt.edu/xml/image-generic.xsl
which transforms the RDF file to XHTML, it was a pain extracting the 
label data from the other Description elements to get the text 
representation of the terms.  In contrast, it would have been a very 
simple and straightforward matter to create the clickable hyperlinks I 
desired if the string and URI versions had been recorded as separate 
terms.  (The end result of the XSLT is used in a test XHTML file using 
Javascript
http://bioimages.vanderbilt.edu/metadata-test.htm?baskauf/66921/metadata/img
where you can see the clickable links I'm talking about.  Eventually 
when I'm done testing, the XHTML will be used for GUID resolution as in 
http://bioimages.vanderbilt.edu/baskauf/66921)

I think that if creating an RDF/XML representation is going to be a 
requirement for issuing GUIDs, then facilitating this kind of use would 
be a good thing.  I'm still waiting to see any real use of those RDF 
data by Linked Data clients, whereas using the RDF/XML for XSLT or AJAX 
has a clear benefit right now.

Steve

John Wieczorek wrote:
> Dear all,
>
> Sorry to have to jump in to the discussion late. I'm just back out of
> the field in Jujuy where the winds took out the little connectivity
> that once existed.
>
> I'm stepping back to the beginning of the conversation, out of the RDF
> representation issue, which seems to reasonable resolved under the
> thread "Darwin Core generic XML with attributes."
>
> The Darwin Core terms Steve has brought to attention are among a
> larger set (relationshipAccordingTo, georeferencedBy,
> measurementDeterminedBy, recordedBy, identifiedBy) that all are
> intended to cover the real world situation of data coming out of
> database fields for which there was no sense of persistent global
> unique identification. They were meant for literal strings for
> dcterms:Agents. There is nothing to proscribe the use of URIs in these
> fields. However, though there has been a lot of discussion about how
> one might represent the URI and string literal information about the
> recordedBy Agent, there has been no statement of consensus in the TAG
> from discussions in this thread and the related thread "Darwin Core
> generic XML with attributes" about whether actionable GUIDs for these
> same concepts should have their own terms.
>
> As I see it, we got this far:
>
> In the thread "Darwin Core generic XML with attributes" Rod Page said:
>
> "If you want a literal for <dwc:recordedBy> (say for ease of display)
> then I think you want a different tag that is expressly defined to do
> just that. For example,
> http://rs.tdwg.org/ontology/voc/TaxonOccurrence
>  has <identifiedTo>  to point to a URI for a taxon, and
> <identifiedToString> if you want the literal. I don't know if dwc has
> anything equivalent for recordedBy (and can somebody please tell me
> why we now have so many vocabularies for the same things?)"
>
> I think what Rod proposes is cleaner than using the same term for two
> purposes, as much as I too would like to have my cake and eat it. I
> agree with Rod, except that dwc:recordedBy is the string literal
> version of the concept, and no URI version (something like
> dwc:recordedByID refines dcterms:identifier) currently exists. Up to
> now there has not been a need expressed, hence, no such term was
> included in DwC.
>
> In answer to Rod's last question, we have many vocabularies
> (unfinished attempts), but only one that is is a ratified standard.
> That standard isn't an ontology, nor is it complete in describing best
> practices for expressing information in all representations. It's
> reasonably clear for how to share information in text files, and about
> the recommendations for XML, but in terms of RDF it is silent,
> awaiting the badly needed ontology work.
>
> Reviewing DwC in light of these discussions, I noted the following:
>
> 1) The set of terms (relationshipAccordingTo, georeferencedBy,
> measurementDeterminedBy, recordedBy, identifiedBy) all currently
> refine an abstract term dwc:accordingTo, defined as "Abstract term to
> attribute information to a source." This refinement doesn't really do
> anything useful in my opinion. I think the term dwc:accordingTo should
> be dropped, and the terms listed above should instead refine
> dcterms:contributor.
>
> 2) For one of these concepts (nameAccordingTo) it was anticipated that
> there would be actionable GUIDs and the term nameAccordingToID was
> added to clarify the distinction. nameAccordingTo currently also
> refines dwc:accordingTo, but I think this is erroneous, as it is meant
> to refer to a publication, not to a dcterms:Agent. This problem will
> be resolved if dwc:accordingTo is dropped and nameAccordingTo is
> changed to have no refines value.
>
> 3) In cases where actionable GUIDs were anticipated to be commonly
> used, terms with names ending with "ID" were added specifically for
> this purpose (relatedResourceID, resourceID, resourceRelationshipID,
> higherGeographyID, parentNameUsageID, nameAccordingToID,
> originalNameUsageID, eventID, individualID, occurrenceID,
> geologicalContextID, namePublishedInID, measurementID,
> acceptedNameUsageID, scientificNameID, locationID, identificationID,
> datasetID, taxonConceptID, taxonID). All of these terms refine
> dcterms:identifier and many of them are examples illustrating Rod's
> recommendation to separate identifiers from string literal
> representations of the concepts. If it is deemed that further "ID"
> terms are needed, then a public discussion must take place on the
> tdwg-content list.
>
> Dropping dwc:accordingTo, and changing the refinements requires the
> procedure under section 3.3 of the Darwin Core Namespace Policy
> (http://rs.tdwg.org/dwc/terms/namespace/index.htm#classesofchanges).
> I'll follow the protocol for changes for items 1 and 2 above, and
> await further discussion on item 3.
>
> Cheers,
>
> John
>
> On Wed, May 19, 2010 at 10:00 PM, Bob Morris <morris.bob at gmail.com> wrote:
>   
>> As Peter Ansell points out, another solution seems to be to use two
>> dwc:recordedBy elements:
>>
>> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>                 xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
>>                 >
>> <dwc:Occurrence rdf:about="http://herbarium.org/hb123456">
>> <dwc:recordedBy
>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me"/>
>> <dwc:recordedBy >Steve Baskauf</dwc:recordedBy>
>> </dwc:Occurrence>
>>
>>
>>
>> I don't think the parse error is about the use of rdf:resource per se.
>>  I must confess that I'm having trouble understanding how to conclude
>> from the RDF spec why the original is a parse error, but these things
>> usually can be understood by trying to show what the triples are, not
>> by staring at the spec for some rdf tag.  I'm finding RDF/XML
>> increasingly irksome and unpleasant in this kind of task. If one
>> unwinds it here,  I bet it is that your failing example has subject
>> "http://herbarium.org/hb123456", predicate dwc:recordedBy but is
>> trying to have  two objects.
>>
>> The fact that you want to associate the URI for you with the string
>> for you is probably a separate problem, and possibly not expressible
>> within dwc itself.
>>
>> Bob
>>
>>
>> On Wed, May 19, 2010 at 8:59 PM, Steve Baskauf
>> <steve.baskauf at vanderbilt.edu> wrote:
>>     
>>> In the specific case of RDF, having your cake and eating it doesn't work.
>>> Paste this:
>>>
>>> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>                  xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
>>>                  >
>>> <dwc:Occurrence rdf:about="http://herbarium.org/hb123456">
>>> <dwc:recordedBy
>>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me">Steve
>>> Baskauf</dwc:recordedBy>
>>> </dwc:Occurrence>
>>> </rdf:RDF>
>>>
>>> into the W3C RDF validator at:
>>> http://www.w3.org/RDF/Validator/
>>> and it will tell you "The attributes on this property element, are not
>>> permitted with any content; expecting end element tag.".  So in RDF elements
>>> having the rdf:resource attribute have to be empty elements.  I tried
>>> validating an example where the recordedBy property was included twice, once
>>> with a URI object and once with a string literal object.  It validated as
>>> "good" RDF, but I think it would be confusing to a linked data client that
>>> would really have no clue that both objects represented the same thing and
>>> would probably "assume" that the occurrence was recorded by two entities
>>> rather than one..
>>>
>>> A possible solution would be to use dcterms:description as another
>>> attribute.  dcterms:description is defined as "An account of the resource.
>>> Description may include ...a free-text account of the resource."  I couldn't
>>> find a more appropriate Dublin Core to use as an attribute.  So running this
>>> example:
>>>
>>> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>                 xmlns:dcterms="http://purl.org/dc/terms/"
>>>                  xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
>>>                  >
>>> <dwc:Occurrence rdf:about="http://herbarium.org/hb123456">
>>> <dwc:recordedBy
>>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me"
>>> dcterms:description="Steve Baskauf" />
>>> </dwc:Occurrence>
>>> </rdf:RDF>
>>>
>>> through the validator shows that this RDF asserts the following triples:
>>> http://herbarium.org/hb123456  dwc:recordedBy
>>> http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
>>> and that
>>> http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
>>> dcterms:description  "Steve Baskauf"
>>>
>>> In other words, the occurrence was recorded by me (identified by my URI) and
>>> that the description of the thing represented by my URI is "Steve Baskauf".
>>> That is pretty much a correct representation of the situation, although the
>>> whole point of using a URI as the object of a property is for a client to
>>> dereference the URI to find out more about the object.  The FOAF file
>>> (pointed to by the URI) would provide that information without the
>>> dcterms:description attribute.
>>>
>>> Steve
>>>
>>>
>>> Jim Croft wrote:
>>>
>>> wondering if
>>> <dwc:recordedBy
>>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me">Steve
>>> Baskauf</dwc:recordedBy>
>>> is legit?
>>>
>>> just a have your cake and eat it kinda guy...
>>>
>>> jim
>>>
>>> On Thu, May 20, 2010 at 7:41 AM, Kevin Richards
>>> <RichardsK at landcareresearch.co.nz> wrote:
>>>
>>>
>>> From my understanding (and after reading the example Bob referred to), the
>>> difference is:
>>>
>>> [referring to external id]
>>> <dwc:recordedBy
>>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me" />
>>>
>>> [inline text]
>>> <dwc:recordedBy>Steve Baskauf</dwc:recordedBy>
>>>
>>> Look right?
>>>
>>> Kevin
>>>
>>> -----Original Message-----
>>> From: tdwg-tag-bounces at lists.tdwg.org
>>> [mailto:tdwg-tag-bounces at lists.tdwg.org] On Behalf Of Jim Croft
>>> Sent: Thursday, 20 May 2010 9:37 a.m.
>>> To: Bob Morris
>>> Cc: tdwg-tag at lists.tdwg.org
>>> Subject: Re: [tdwg-tag] string literals vs. uris for dwc:recordedBy,
>>> dwc:identifiedBy, and dwc:georeferencedBy in RDF
>>>
>>> Hi Bob - should the same term allow both types of content, or should
>>> there be a different term for each?  Does it matter?  Should
>>> applications be smart enough to tell the difference and know what to
>>> do with it?
>>>
>>> Not really asking what the specification says, but about purity and
>>> wholesomeness of design... :)
>>>
>>> jim
>>>
>>> On Thu, May 20, 2010 at 4:26 AM, Bob Morris <morris.bob at gmail.com> wrote:
>>>
>>>
>>> Exactly this example is given in
>>> http://web4.w3.org/TR/REC-rdf-syntax/#section-Syntax-property-attributes
>>> so I would find it regrettable if DwC does something somewhere that
>>> makes this substitution impossible or discouraged,  or encourages tool
>>> construction that does so, or encourages documention be interpreted in
>>> a way that does so.
>>>
>>> Indeed http://rs.tdwg.org/dwc/rdf/dwcterms.rdf defines its type to be
>>> rdf:Property and is silent on any semantics  but that. My own
>>> conclusion is that neither the intent or the outcome of the rdf
>>> version of dwcterms discourages what you want, though I suppose the
>>> intent part would be clearer if the documentation also said that a URI
>>> can always be used, but applications are responsible for interpreting
>>> it.
>>>
>>>
>>> On Wed, May 19, 2010 at 11:09 AM, Steve Baskauf
>>> <steve.baskauf at vanderbilt.edu> wrote:
>>>
>>>
>>> The definition for the Darwin Core term recordedBy
>>> http://rs.tdwg.org/dwc/terms/index.htm#recordedBy
>>> says "A list (concatenated and separated) of names ...".  The examples
>>> given are string literals.  However, when using this term as a predicate
>>> in RDF, it would seem preferable to use a URI to an RDF representation
>>> of the entity (if one exists) rather than a string literal.  For
>>> example, can I use:
>>> <dwc:recordedBy
>>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me"/>
>>> rather than
>>> <dwc:recordedBy>Steven J. Baskauf</dwc:recordedBy>
>>> ?
>>>
>>> Steve Baskauf
>>> --
>>>
>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>> Vanderbilt University Dept. of Biological Sciences
>>>
>>> postal mail address:
>>> VU Station B 351634
>>> Nashville, TN  37235-1634,  U.S.A.
>>>
>>> delivery address:
>>> 2125 Stevenson Center
>>> 1161 21st Ave., S.
>>> Nashville, TN 37235
>>>
>>> office: 2128 Stevenson Center
>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>> http://bioimages.vanderbilt.edu
>>>
>>> _______________________________________________
>>> tdwg-tag mailing list
>>> tdwg-tag at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>>
>>>
>>>
>>>
>>> --
>>> Robert A. Morris
>>> Emeritus Professor  of Computer Science
>>> UMASS-Boston
>>> 100 Morrissey Blvd
>>> Boston, MA 02125-3390
>>> Associate, Harvard University Herbaria
>>> email: ram at cs.umb.edu
>>> web: http://bdei.cs.umb.edu/
>>> web: http://etaxonomy.org/FilteredPush
>>> http://www.cs.umb.edu/~ram
>>> phone (+1)617 287 6466
>>> _______________________________________________
>>> tdwg-tag mailing list
>>> tdwg-tag at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>>
>>>
>>>
>>> --
>>> _________________
>>> Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~
>>> http://www.google.com/profiles/jim.croft
>>> 'A civilized society is one which tolerates eccentricity to the point
>>> of doubtful sanity.'
>>>  - Robert Frost, poet (1874-1963)
>>> _______________________________________________
>>> tdwg-tag mailing list
>>> tdwg-tag at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>>
>>> Please consider the environment before printing this email
>>> Warning:  This electronic message together with any attachments is
>>> confidential. If you receive it in error: (i) you must not read, use,
>>> disclose, copy or retain it; (ii) please contact the sender immediately by
>>> reply email and then delete the emails.
>>> The views expressed in this email may not be those of Landcare Research New
>>> Zealand Limited. http://www.landcareresearch.co.nz
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>> Vanderbilt University Dept. of Biological Sciences
>>>
>>> postal mail address:
>>> VU Station B 351634
>>> Nashville, TN  37235-1634,  U.S.A.
>>>
>>> delivery address:
>>> 2125 Stevenson Center
>>> 1161 21st Ave., S.
>>> Nashville, TN 37235
>>>
>>> office: 2128 Stevenson Center
>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>> http://bioimages.vanderbilt.edu
>>>
>>>       
>>
>> --
>> Robert A. Morris
>> Emeritus Professor  of Computer Science
>> UMASS-Boston
>> 100 Morrissey Blvd
>> Boston, MA 02125-3390
>> Associate, Harvard University Herbaria
>> email: ram at cs.umb.edu
>> web: http://bdei.cs.umb.edu/
>> web: http://etaxonomy.org/FilteredPush
>> http://www.cs.umb.edu/~ram
>> phone (+1)617 287 6466
>> _______________________________________________
>> tdwg-tag mailing list
>> tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>
>>     
> .
>
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20100601/b09bea5e/attachment.html 


More information about the tdwg-tag mailing list