[tdwg-tag] Darwin Core generic XML with attributes

Steve Baskauf steve.baskauf at vanderbilt.edu
Fri May 21 15:28:23 CEST 2010


Stefano,
 From the standpoint of the files as RDF and the Linked Data world, it 
would be a waste of time to ever define a label for a URI-identified 
resource more than once.  However, I am thinking about this problem from 
the standpoint of the GUID requirements for providing both human and 
machine-readable representations of a resource.  If one must create the 
RDF to meet the GUID requirement for making available a machine-readable 
representation, then it would be a relatively simple matter to create 
the required human-readable representation through AJAX or some other 
method that can use the RDF XML file as a data source.  AJAX is "stupid" 
in the sense that it doesn't "understand" RDF.  It just uses an XML file 
as a source of data.  So an HTML file using the RDF+XML file as an AJAX 
data source would need to get the label information from within the 
particular RDF+XML file containing the representation of the GUID and 
not get it by dereferencing a link to another file, like the FOAF file 
that's the object of the dwc:recordedBy triple.  That would be the 
reason for providing the rdfs:label property within multiple files.

I'm working on a functional example of this.  When I get it working, 
I'll post it to the list.
Steve

Stefano Bocconi wrote:
> Hi Steve,
>
> Maybe I am missing something here, but RDF is a knowledge representation
> language, therefore stating something (a triple) more than once would
> not make sense, and duplicates are not kept as far as I know. This
> unless as you say there would be a provenance mechanism in place, but
> this last issue is being studied in the context of named graph, which
> are not yet in widespread use. With named graph triples coming from
> different sources (basically RDF files) would have a sort of fourth
> attribute that could make them different from same statements stated
> elsewhere. Still within the same file this would not lead to multiple
> equal statements.
>
> Regards,
>
>     Stefano Bocconi
>
> Steve Baskauf wrote:
>   
>> For those who might be interested, I have answered my own question (at
>> least for one instance).  I created an rdf/xml file just like
>> http://dl.dropbox.com/u/639486/tdwg/1.xml
>> except that I changed the URI of the resource described by the about
>> attribute to "http://herbarium.org/hb123457".  I then had the OpenLink
>> RDF browser
>> http://demo.openlinksw.com/rdfbrowser/
>> query both files.  It only listed a single triple for
>> http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me --> label -->
>> "Steve Baskauf"
>> If I changed the label in the hb123457 file to "Steven J Baskauf",
>> OpenLink showed
>> http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me as having two
>> labels, "Steve Baskauf" and "Steven J Baskauf".  When I had OpenLink
>> query the http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf file
>> itself, it merged the label property with the other properties of the
>> http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me URI.
>>
>> So at least in the case of one Linked Data client (OpenLink),
>> redundant information is not recorded as additional triples.  So at
>> least for Linked Data clients similar to OpenLink, in RDF the method
>> of labeling a URI used as the object of a Darwin Core term seems to be
>> a good way of providing both types of information (URI and text) for
>> "local consumption" (within a single RDF file for an occurrence).
>>
>> Steve
>>
>> Steve Baskauf wrote:
>>     
>>> OK, I just had another thought/question about the approach suggested
>>> here.  If there were only one occurrence record in one RDF XML file,
>>> then this approach would be great.  I could create an XSLT that would
>>> make a human readable view of the RDF file that made use of the
>>> <rdfs:label> information to display the person's name as a string.
>>> However, if I have a database containing 10000 occurrence records and
>>> each record is represented by an RDF XML file (containing that label
>>> statement) that is provided when the HTTP URI guid for that
>>> occurrence is dereferenced, then I am making the assertion that
>>> PERSON <http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me> -->
>>> has name --> "Steve Baskauf"
>>> 10000 times.  Would a linked data client that was collecting metadata
>>> about my collection record 10000 triples, counting each label
>>> description as a separate assertion because it was made by a separate
>>> statement in a separate file, or would it be "smart" enough to
>>> realize that it was really the same thing being said 10000 times and
>>> just record one triple?  What this really boils down to is "trust" of
>>> sources making assertions-does a linked data client "trust" that one
>>> assertion of the label property of
>>> http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me is as good as
>>> another, or will it feel compelled to keep track of all of the
>>> assertions so that a user of the triple store can draw their own
>>> conclusions about the validity of all of the individual assertions?
>>>
>>> It would be very simple to make the assertion that "Steve Baskauf" is
>>> a label for http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
>>> in the FOAF file itself, but then the XSLT method wouldn't work
>>> because the XSL wouldn't be dereferencing the foaf.rdf file.
>>>
>>> Steve
>>>
>>>
>>>
>>>
>>>
>>> Roderic Page wrote:
>>>       
>>>> I think part of the problem here results from trying to satisfy
>>>> modelling the data and having something that is easy to read (i.e.,
>>>> having both a URI and a literal for the same tag). The result is messy
>>>> and inconsistent.
>>>>
>>>> I think Peter Ansell's first option is a good RDF solution, namely:
>>>>
>>>> http://dl.dropbox.com/u/639486/tdwg/1.xml
>>>>
>>>> =================
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <rdf:RDF
>>>>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>>>>     xmlns:dwc="http://rs.tdwg.org/dwc/terms/">
>>>>
>>>>     <!-- occurrence -->
>>>>     <rdf:Description rdf:about="http://herbarium.org/hb123456">
>>>>        <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
>>>> "/>
>>>>     </rdf:Description>
>>>>
>>>>     <!-- person -->
>>>>     <rdf:Description rdf:about="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
>>>> ">
>>>>        <rdfs:label>Steve Baskauf</rdfs:label>
>>>>     </rdf:Description>
>>>>
>>>> </rdf:RDF>
>>>> =================
>>>>
>>>> This document says:
>>>>
>>>> OCCURRENCE <http://herbarium.org/hb123456> --> recorded by --> PERSON <http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
>>>>  > --> who has name --> "Steve Baskauf"
>>>>
>>>> which I assume is what we want. You can see this graph in the W3C RDF
>>>> validator here http://tinyurl.com/39nqho2
>>>>
>>>> The RDF has all the information a linked data client needs in order to
>>>> say this, and we could also write a XSLT style sheet to render this in
>>>> HTML for people to read.
>>>>
>>>> Adding <dwc:recordedBy>Steve Baskauf</dwc:recordedBy is a hack that
>>>> breaks the model.
>>>>
>>>> Note also that if the person doesn't have a URI we still shouldn't use
>>>> dwc:recordedBy as a literal. Instead we can do this:
>>>>
>>>> http://dl.dropbox.com/u/639486/tdwg/2.xml
>>>>
>>>> =================
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <rdf:RDF
>>>>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>>>>     xmlns:dwc="http://rs.tdwg.org/dwc/terms/">
>>>>
>>>>     <!-- occurrence -->
>>>>     <rdf:Description rdf:about="http://herbarium.org/hb123456">
>>>>        <!-- person -->
>>>>        <dwc:recordedBy rdf:parseType="Resource">
>>>>             <rdfs:label>Steve Baskauf</rdfs:label>
>>>>        </dwc:recordedBy>
>>>>     </rdf:Description>
>>>>
>>>> </rdf:RDF>
>>>> =================
>>>>
>>>> Here we are saying that the occurrence was recorded by a person called
>>>> Steve Baskauf. If you paste this into the W3C validator you get the
>>>> same model
>>>>
>>>> OCCURRENCE <http://herbarium.org/hb123456> --> recorded by --> PERSON
>>>> <xxx> --> who has name --> "Steve Baskauf"
>>>>
>>>> Since "Steve Baskauf" in this example doesn't have a URI we get a
>>>> "bnode" with a local identifier. See http://tinyurl.com/392qzb3 .
>>>>
>>>> In both cases (person with or without a URI) we are saying the same
>>>> thing. If you want a literal for <dwc:recordedBy> (say for ease of
>>>> display) then I think you want a different tag that is expressly
>>>> defined to do just that. For example, http://rs.tdwg.org/ontology/voc/TaxonOccurrence
>>>>   has <identifiedTo>  to point to a URI for a taxon, and
>>>> <identifiedToString> if you want the literal. I don't know if dwc has
>>>> anything equivalent for recordedBy (and can somebody please tell me
>>>> why we now have so many vocabularies for the same things?)
>>>>
>>>> Personally I think that starting with XML and trying to generate RDF
>>>> and HTML from that is going to lead to a world of hurt. I suspect it
>>>> makes more sense to:
>>>>
>>>> a) model what we want to say
>>>> b) say it in RDF
>>>> c) write a XSLT to convert it to HTML for humans
>>>>
>>>> Regards
>>>>
>>>> Rod
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 20 May 2010, at 06:35, Bob Morris wrote:
>>>>
>>>>
>>>>         
>>>>> Per my discussion in answer to the original problem, I think what you
>>>>> are tripping on is that the way you want to do this effectively trying
>>>>> to make a triple with two objects.  I believe it is not really a
>>>>> modeling question, but rather a question of how RDF/XML is translated
>>>>> into triples.
>>>>>
>>>>> Bob Morris
>>>>>
>>>>> On Wed, May 19, 2010 at 9:52 PM, Steve Baskauf
>>>>> <steve.baskauf at vanderbilt.edu> wrote:
>>>>>
>>>>>           
>>>>>> This recent discussion reminds me of a question that I have been
>>>>>> wondering about for several months and hadn't gotten around to
>>>>>> bringing
>>>>>> up: can you have a Darwin Core XML representation where an element
>>>>>> has a
>>>>>> literal value and an attribute? If the XML is RDF, then I think the
>>>>>> answer is pretty much "no" as I just found out with the W3C
>>>>>> Validator.
>>>>>> However, in generic XML I don't think there is any rule that says
>>>>>> that
>>>>>> one can't have any attribute that one wants.  The only guidance I
>>>>>> know
>>>>>> of on the subject is:
>>>>>> http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#implement
>>>>>> It states that the value of a Darwin Core property should be the
>>>>>> content
>>>>>> of the element rather than stating the value as an attribute.
>>>>>> However,
>>>>>> I have the situation where I want to store or transfer two somewhat
>>>>>> equivalent representations of the value of a property: a string
>>>>>> literal
>>>>>> form and a URI form.  In the example we've been talking about, I
>>>>>> would
>>>>>> like my generic (non-RDF) XML to do something like this:
>>>>>>
>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>> <dwr:SimpleDarwinRecordSet
>>>>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>>>                 xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
>>>>>>                 xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/
>>>>>> "
>>>>>>                 >
>>>>>> <dwr:SimpleDarwinRecord>
>>>>>>     <dwc:occurrenceID
>>>>>> rdf:resource="http://herbarium.org/hb123456">http://herbarium.org/hb123456
>>>>>> </dwc:occurrenceID>
>>>>>>     <dwc:recordedBy
>>>>>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/
>>>>>> foaf.rdf#me">Steve
>>>>>> Baskauf</dwc:recordedBy>
>>>>>>     <dwc:basisOfRecord
>>>>>> rdf:resource="http://rs.tdwg.org/dwc/dwctype/
>>>>>> PreservedSpecimen">PreservedSpecimen</dwc:basisOfRecord>
>>>>>>     ... more elements, mostly with string literal values...
>>>>>> </dwr:SimpleDarwinRecord>
>>>>>> </dwr:SimpleDarwinRecordSet>
>>>>>>
>>>>>> This would meet the basic guidelines of the Darwin Core XML Guide in
>>>>>> that the literal values would be the contents of the elements.
>>>>>> What I
>>>>>> don't know is if the inclusion of the rdf:resource attributes would
>>>>>> invalidate the XML if it were validated against someone's schema that
>>>>>> was silent about attributes or if the schema would have to explicitly
>>>>>> say that having an rdf:resource attribute was a valid option.  I
>>>>>> think I
>>>>>> don't know enough about XML schemas ...
>>>>>>
>>>>>> The reason why I would like to maintain/transfer both types of values
>>>>>> (literal and URI) is so that I could use the XML data to generate
>>>>>> both
>>>>>> HTML and RDF if I wanted.  The HTML would tell humans that the
>>>>>> occurrence was a PreservedSpecimen, but the RDF would tell a linked
>>>>>> data
>>>>>> client that the occurrence was a
>>>>>> http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen .  I realize that
>>>>>> for
>>>>>> my own internal use, the XML can have any format I want, but if I
>>>>>> were
>>>>>> exporting XML for general public use, would it be bad to use the
>>>>>> approach above?
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> As an aside, I wanted to see exactly what the definition was for
>>>>>> rdf:resource  However, the usual namespace for rdf:
>>>>>> (http://www.w3.org/1999/02/22-rdf-syntax-ns#) doesn't seem to include
>>>>>> "resource" in the defined properties.  Very odd!  Maybe I'm just
>>>>>> missing
>>>>>> something...
>>>>>>
>>>>>> --
>>>>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>>>>> Vanderbilt University Dept. of Biological Sciences
>>>>>>
>>>>>> postal mail address:
>>>>>> VU Station B 351634
>>>>>> Nashville, TN  37235-1634,  U.S.A.
>>>>>>
>>>>>> delivery address:
>>>>>> 2125 Stevenson Center
>>>>>> 1161 21st Ave., S.
>>>>>> Nashville, TN 37235
>>>>>>
>>>>>> office: 2128 Stevenson Center
>>>>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>>>>> http://bioimages.vanderbilt.edu
>>>>>>
>>>>>> _______________________________________________
>>>>>> tdwg-tag mailing list
>>>>>> tdwg-tag at lists.tdwg.org
>>>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> --
>>>>> Robert A. Morris
>>>>> Emeritus Professor  of Computer Science
>>>>> UMASS-Boston
>>>>> 100 Morrissey Blvd
>>>>> Boston, MA 02125-3390
>>>>> Associate, Harvard University Herbaria
>>>>> email: ram at cs.umb.edu
>>>>> web: http://bdei.cs.umb.edu/
>>>>> web: http://etaxonomy.org/FilteredPush
>>>>> http://www.cs.umb.edu/~ram
>>>>> phone (+1)617 287 6466
>>>>> _______________________________________________
>>>>> tdwg-tag mailing list
>>>>> tdwg-tag at lists.tdwg.org
>>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>>>>
>>>>>
>>>>>           
>>>> ---------------------------------------------------------
>>>> Roderic Page
>>>> Professor of Taxonomy
>>>> DEEB, FBLS
>>>> Graham Kerr Building
>>>> University of Glasgow
>>>> Glasgow G12 8QQ, UK
>>>>
>>>> Email: r.page at bio.gla.ac.uk
>>>> Tel: +44 141 330 4778
>>>> Fax: +44 141 330 2792
>>>> AIM: rodpage1962 at aim.com
>>>> Facebook: http://www.facebook.com/profile.php?id=1112517192
>>>> Twitter: http://twitter.com/rdmpage
>>>> Blog: http://iphylo.blogspot.com
>>>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> .
>>>>
>>>>
>>>>         
>>> --
>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>> Vanderbilt University Dept. of Biological Sciences
>>>
>>> postal mail address:
>>> VU Station B 351634
>>> Nashville, TN  37235-1634,  U.S.A.
>>>
>>> delivery address:
>>> 2125 Stevenson Center
>>> 1161 21st Ave., S.
>>> Nashville, TN 37235
>>>
>>> office: 2128 Stevenson Center
>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>> http://bioimages.vanderbilt.edu
>>>
>>>       
>> --
>> Steven J. Baskauf, Ph.D., Senior Lecturer
>> Vanderbilt University Dept. of Biological Sciences
>>
>> postal mail address:
>> VU Station B 351634
>> Nashville, TN  37235-1634,  U.S.A.
>>
>> delivery address:
>> 2125 Stevenson Center
>> 1161 21st Ave., S.
>> Nashville, TN 37235
>>
>> office: 2128 Stevenson Center
>> phone: (615) 343-4582,  fax: (615) 343-6707
>> http://bioimages.vanderbilt.edu
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> tdwg-tag mailing list
>> tdwg-tag at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>
>>     
> .
>
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20100521/4a7cc3bb/attachment.html 


More information about the tdwg-tag mailing list