[tdwg-tag] Darwin Core generic XML with attributes

Steve Baskauf steve.baskauf at vanderbilt.edu
Fri May 21 13:00:46 CEST 2010


For those who might be interested, I have answered my own question (at 
least for one instance).  I created an rdf/xml file just like
http://dl.dropbox.com/u/639486/tdwg/1.xml
except that I changed the URI of the resource described by the about 
attribute to "http://herbarium.org/hb123457".  I then had the OpenLink 
RDF browser
http://demo.openlinksw.com/rdfbrowser/
query both files.  It only listed a single triple for
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me --> label --> 
"Steve Baskauf"
If I changed the label in the hb123457 file to "Steven J Baskauf", 
OpenLink showed http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me 
as having two labels, "Steve Baskauf" and "Steven J Baskauf".  When I 
had OpenLink query the 
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf file itself, it 
merged the label property with the other properties of the 
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me URI.

So at least in the case of one Linked Data client (OpenLink), redundant 
information is not recorded as additional triples.  So at least for 
Linked Data clients similar to OpenLink, in RDF the method of labeling a 
URI used as the object of a Darwin Core term seems to be a good way of 
providing both types of information (URI and text) for "local 
consumption" (within a single RDF file for an occurrence).

Steve

Steve Baskauf wrote:
> OK, I just had another thought/question about the approach suggested 
> here.  If there were only one occurrence record in one RDF XML file, 
> then this approach would be great.  I could create an XSLT that would 
> make a human readable view of the RDF file that made use of the 
> <rdfs:label> information to display the person's name as a string.  
> However, if I have a database containing 10000 occurrence records and 
> each record is represented by an RDF XML file (containing that label 
> statement) that is provided when the HTTP URI guid for that occurrence 
> is dereferenced, then I am making the assertion that
> PERSON <http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me> --> 
> has name --> "Steve Baskauf"
> 10000 times.  Would a linked data client that was collecting metadata 
> about my collection record 10000 triples, counting each label 
> description as a separate assertion because it was made by a separate 
> statement in a separate file, or would it be "smart" enough to realize 
> that it was really the same thing being said 10000 times and just 
> record one triple?  What this really boils down to is "trust" of 
> sources making assertions-does a linked data client "trust" that one 
> assertion of the label property of 
> http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me is as good as 
> another, or will it feel compelled to keep track of all of the 
> assertions so that a user of the triple store can draw their own 
> conclusions about the validity of all of the individual assertions?
>
> It would be very simple to make the assertion that "Steve Baskauf" is 
> a label for http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me in 
> the FOAF file itself, but then the XSLT method wouldn't work because 
> the XSL wouldn't be dereferencing the foaf.rdf file.
>
> Steve
>
>
>
>
>
> Roderic Page wrote:
>> I think part of the problem here results from trying to satisfy  
>> modelling the data and having something that is easy to read (i.e.,  
>> having both a URI and a literal for the same tag). The result is messy  
>> and inconsistent.
>>
>> I think Peter Ansell's first option is a good RDF solution, namely:
>>
>> http://dl.dropbox.com/u/639486/tdwg/1.xml
>>
>> =================
>> <?xml version="1.0" encoding="UTF-8"?>
>> <rdf:RDF
>>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>>     xmlns:dwc="http://rs.tdwg.org/dwc/terms/">
>>
>>     <!-- occurrence -->
>>     <rdf:Description rdf:about="http://herbarium.org/hb123456">
>>        <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me 
>> "/>
>>     </rdf:Description>
>> 	
>>     <!-- person -->
>>     <rdf:Description rdf:about="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me 
>> ">
>>        <rdfs:label>Steve Baskauf</rdfs:label>
>>     </rdf:Description>
>>
>> </rdf:RDF>
>> =================
>>
>> This document says:
>>
>> OCCURRENCE <http://herbarium.org/hb123456> --> recorded by --> PERSON <http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me 
>>  > --> who has name --> "Steve Baskauf"
>>
>> which I assume is what we want. You can see this graph in the W3C RDF  
>> validator here http://tinyurl.com/39nqho2
>>
>> The RDF has all the information a linked data client needs in order to  
>> say this, and we could also write a XSLT style sheet to render this in  
>> HTML for people to read.
>>
>> Adding <dwc:recordedBy>Steve Baskauf</dwc:recordedBy is a hack that  
>> breaks the model.
>>
>> Note also that if the person doesn't have a URI we still shouldn't use  
>> dwc:recordedBy as a literal. Instead we can do this:
>>
>> http://dl.dropbox.com/u/639486/tdwg/2.xml
>>
>> =================
>> <?xml version="1.0" encoding="UTF-8"?>
>> <rdf:RDF
>>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>>     xmlns:dwc="http://rs.tdwg.org/dwc/terms/">
>>
>>     <!-- occurrence -->
>>     <rdf:Description rdf:about="http://herbarium.org/hb123456">
>>        <!-- person -->
>>        <dwc:recordedBy rdf:parseType="Resource">
>>        	<rdfs:label>Steve Baskauf</rdfs:label>
>>        </dwc:recordedBy>
>>     </rdf:Description>
>> 	
>> </rdf:RDF>
>> =================
>>
>> Here we are saying that the occurrence was recorded by a person called  
>> Steve Baskauf. If you paste this into the W3C validator you get the  
>> same model
>>
>> OCCURRENCE <http://herbarium.org/hb123456> --> recorded by --> PERSON  
>> <xxx> --> who has name --> "Steve Baskauf"
>>
>> Since "Steve Baskauf" in this example doesn't have a URI we get a  
>> "bnode" with a local identifier. See http://tinyurl.com/392qzb3 .
>>
>> In both cases (person with or without a URI) we are saying the same  
>> thing. If you want a literal for <dwc:recordedBy> (say for ease of  
>> display) then I think you want a different tag that is expressly  
>> defined to do just that. For example, http://rs.tdwg.org/ontology/voc/TaxonOccurrence 
>>   has <identifiedTo>  to point to a URI for a taxon, and  
>> <identifiedToString> if you want the literal. I don't know if dwc has  
>> anything equivalent for recordedBy (and can somebody please tell me  
>> why we now have so many vocabularies for the same things?)
>>
>> Personally I think that starting with XML and trying to generate RDF  
>> and HTML from that is going to lead to a world of hurt. I suspect it  
>> makes more sense to:
>>
>> a) model what we want to say
>> b) say it in RDF
>> c) write a XSLT to convert it to HTML for humans
>>
>> Regards
>>
>> Rod
>>
>>
>>
>>
>>
>> On 20 May 2010, at 06:35, Bob Morris wrote:
>>
>>   
>>> Per my discussion in answer to the original problem, I think what you
>>> are tripping on is that the way you want to do this effectively trying
>>> to make a triple with two objects.  I believe it is not really a
>>> modeling question, but rather a question of how RDF/XML is translated
>>> into triples.
>>>
>>> Bob Morris
>>>
>>> On Wed, May 19, 2010 at 9:52 PM, Steve Baskauf
>>> <steve.baskauf at vanderbilt.edu> wrote:
>>>     
>>>> This recent discussion reminds me of a question that I have been
>>>> wondering about for several months and hadn't gotten around to  
>>>> bringing
>>>> up: can you have a Darwin Core XML representation where an element  
>>>> has a
>>>> literal value and an attribute? If the XML is RDF, then I think the
>>>> answer is pretty much "no" as I just found out with the W3C  
>>>> Validator.
>>>> However, in generic XML I don't think there is any rule that says  
>>>> that
>>>> one can't have any attribute that one wants.  The only guidance I  
>>>> know
>>>> of on the subject is:
>>>> http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#implement
>>>> It states that the value of a Darwin Core property should be the  
>>>> content
>>>> of the element rather than stating the value as an attribute.   
>>>> However,
>>>> I have the situation where I want to store or transfer two somewhat
>>>> equivalent representations of the value of a property: a string  
>>>> literal
>>>> form and a URI form.  In the example we've been talking about, I  
>>>> would
>>>> like my generic (non-RDF) XML to do something like this:
>>>>
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <dwr:SimpleDarwinRecordSet
>>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>>>                 xmlns:dwc="http://rs.tdwg.org/dwc/terms/"
>>>>                 xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ 
>>>> "
>>>>                 >
>>>> <dwr:SimpleDarwinRecord>
>>>>     <dwc:occurrenceID
>>>> rdf:resource="http://herbarium.org/hb123456">http://herbarium.org/hb123456 
>>>> </dwc:occurrenceID>
>>>>     <dwc:recordedBy
>>>> rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/ 
>>>> foaf.rdf#me">Steve
>>>> Baskauf</dwc:recordedBy>
>>>>     <dwc:basisOfRecord
>>>> rdf:resource="http://rs.tdwg.org/dwc/dwctype/ 
>>>> PreservedSpecimen">PreservedSpecimen</dwc:basisOfRecord>
>>>>     ... more elements, mostly with string literal values...
>>>> </dwr:SimpleDarwinRecord>
>>>> </dwr:SimpleDarwinRecordSet>
>>>>
>>>> This would meet the basic guidelines of the Darwin Core XML Guide in
>>>> that the literal values would be the contents of the elements.   
>>>> What I
>>>> don't know is if the inclusion of the rdf:resource attributes would
>>>> invalidate the XML if it were validated against someone's schema that
>>>> was silent about attributes or if the schema would have to explicitly
>>>> say that having an rdf:resource attribute was a valid option.  I  
>>>> think I
>>>> don't know enough about XML schemas ...
>>>>
>>>> The reason why I would like to maintain/transfer both types of values
>>>> (literal and URI) is so that I could use the XML data to generate  
>>>> both
>>>> HTML and RDF if I wanted.  The HTML would tell humans that the
>>>> occurrence was a PreservedSpecimen, but the RDF would tell a linked  
>>>> data
>>>> client that the occurrence was a
>>>> http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen .  I realize that  
>>>> for
>>>> my own internal use, the XML can have any format I want, but if I  
>>>> were
>>>> exporting XML for general public use, would it be bad to use the
>>>> approach above?
>>>>
>>>> Steve
>>>>
>>>> As an aside, I wanted to see exactly what the definition was for
>>>> rdf:resource  However, the usual namespace for rdf:
>>>> (http://www.w3.org/1999/02/22-rdf-syntax-ns#) doesn't seem to include
>>>> "resource" in the defined properties.  Very odd!  Maybe I'm just  
>>>> missing
>>>> something...
>>>>
>>>> --
>>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>>> Vanderbilt University Dept. of Biological Sciences
>>>>
>>>> postal mail address:
>>>> VU Station B 351634
>>>> Nashville, TN  37235-1634,  U.S.A.
>>>>
>>>> delivery address:
>>>> 2125 Stevenson Center
>>>> 1161 21st Ave., S.
>>>> Nashville, TN 37235
>>>>
>>>> office: 2128 Stevenson Center
>>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>>> http://bioimages.vanderbilt.edu
>>>>
>>>> _______________________________________________
>>>> tdwg-tag mailing list
>>>> tdwg-tag at lists.tdwg.org
>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>>>
>>>>
>>>>       
>>> -- 
>>> Robert A. Morris
>>> Emeritus Professor  of Computer Science
>>> UMASS-Boston
>>> 100 Morrissey Blvd
>>> Boston, MA 02125-3390
>>> Associate, Harvard University Herbaria
>>> email: ram at cs.umb.edu
>>> web: http://bdei.cs.umb.edu/
>>> web: http://etaxonomy.org/FilteredPush
>>> http://www.cs.umb.edu/~ram
>>> phone (+1)617 287 6466
>>> _______________________________________________
>>> tdwg-tag mailing list
>>> tdwg-tag at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-tag
>>>
>>>     
>>
>> ---------------------------------------------------------
>> Roderic Page
>> Professor of Taxonomy
>> DEEB, FBLS
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QQ, UK
>>
>> Email: r.page at bio.gla.ac.uk
>> Tel: +44 141 330 4778
>> Fax: +44 141 330 2792
>> AIM: rodpage1962 at aim.com
>> Facebook: http://www.facebook.com/profile.php?id=1112517192
>> Twitter: http://twitter.com/rdmpage
>> Blog: http://iphylo.blogspot.com
>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>
>>
>>
>>
>>
>>
>> .
>>
>>   
>
> -- 
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
>   

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20100521/e2862b56/attachment.html 


More information about the tdwg-tag mailing list