Darwin Core generic XML with attributes
This recent discussion reminds me of a question that I have been wondering about for several months and hadn't gotten around to bringing up: can you have a Darwin Core XML representation where an element has a literal value and an attribute? If the XML is RDF, then I think the answer is pretty much "no" as I just found out with the W3C Validator. However, in generic XML I don't think there is any rule that says that one can't have any attribute that one wants. The only guidance I know of on the subject is: http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#implement It states that the value of a Darwin Core property should be the content of the element rather than stating the value as an attribute. However, I have the situation where I want to store or transfer two somewhat equivalent representations of the value of a property: a string literal form and a URI form. In the example we've been talking about, I would like my generic (non-RDF) XML to do something like this:
<?xml version="1.0" encoding="UTF-8"?> <dwr:SimpleDarwinRecordSet xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/" xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/" > dwr:SimpleDarwinRecord <dwc:occurrenceID rdf:resource="http://herbarium.org/hb123456%22%3Ehttp://herbarium.org/hb123456</dwc:occurrenceID> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me%22%3ESteve Baskauf</dwc:recordedBy> <dwc:basisOfRecord rdf:resource="http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen%22%3EPreservedSpecimen</dwc:basisOfRecord> ... more elements, mostly with string literal values... </dwr:SimpleDarwinRecord> </dwr:SimpleDarwinRecordSet>
This would meet the basic guidelines of the Darwin Core XML Guide in that the literal values would be the contents of the elements. What I don't know is if the inclusion of the rdf:resource attributes would invalidate the XML if it were validated against someone's schema that was silent about attributes or if the schema would have to explicitly say that having an rdf:resource attribute was a valid option. I think I don't know enough about XML schemas ...
The reason why I would like to maintain/transfer both types of values (literal and URI) is so that I could use the XML data to generate both HTML and RDF if I wanted. The HTML would tell humans that the occurrence was a PreservedSpecimen, but the RDF would tell a linked data client that the occurrence was a http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen . I realize that for my own internal use, the XML can have any format I want, but if I were exporting XML for general public use, would it be bad to use the approach above?
Steve
As an aside, I wanted to see exactly what the definition was for rdf:resource However, the usual namespace for rdf: (http://www.w3.org/1999/02/22-rdf-syntax-ns#) doesn't seem to include "resource" in the defined properties. Very odd! Maybe I'm just missing something...
Per my discussion in answer to the original problem, I think what you are tripping on is that the way you want to do this effectively trying to make a triple with two objects. I believe it is not really a modeling question, but rather a question of how RDF/XML is translated into triples.
Bob Morris
On Wed, May 19, 2010 at 9:52 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
This recent discussion reminds me of a question that I have been wondering about for several months and hadn't gotten around to bringing up: can you have a Darwin Core XML representation where an element has a literal value and an attribute? If the XML is RDF, then I think the answer is pretty much "no" as I just found out with the W3C Validator. However, in generic XML I don't think there is any rule that says that one can't have any attribute that one wants. The only guidance I know of on the subject is: http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#implement It states that the value of a Darwin Core property should be the content of the element rather than stating the value as an attribute. However, I have the situation where I want to store or transfer two somewhat equivalent representations of the value of a property: a string literal form and a URI form. In the example we've been talking about, I would like my generic (non-RDF) XML to do something like this:
<?xml version="1.0" encoding="UTF-8"?>
<dwr:SimpleDarwinRecordSet xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/" xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/" > dwr:SimpleDarwinRecord dwc:occurrenceID rdf:resource="http://herbarium.org/hb123456"http://herbarium.org/hb123456</dwc:occurrenceID> dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me"Steve Baskauf</dwc:recordedBy> dwc:basisOfRecord rdf:resource="http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen"PreservedSpecimen</dwc:basisOfRecord> ... more elements, mostly with string literal values... </dwr:SimpleDarwinRecord> </dwr:SimpleDarwinRecordSet>
This would meet the basic guidelines of the Darwin Core XML Guide in that the literal values would be the contents of the elements. What I don't know is if the inclusion of the rdf:resource attributes would invalidate the XML if it were validated against someone's schema that was silent about attributes or if the schema would have to explicitly say that having an rdf:resource attribute was a valid option. I think I don't know enough about XML schemas ...
The reason why I would like to maintain/transfer both types of values (literal and URI) is so that I could use the XML data to generate both HTML and RDF if I wanted. The HTML would tell humans that the occurrence was a PreservedSpecimen, but the RDF would tell a linked data client that the occurrence was a http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen . I realize that for my own internal use, the XML can have any format I want, but if I were exporting XML for general public use, would it be bad to use the approach above?
Steve
As an aside, I wanted to see exactly what the definition was for rdf:resource However, the usual namespace for rdf: (http://www.w3.org/1999/02/22-rdf-syntax-ns#) doesn't seem to include "resource" in the defined properties. Very odd! Maybe I'm just missing something...
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
I think part of the problem here results from trying to satisfy modelling the data and having something that is easy to read (i.e., having both a URI and a literal for the same tag). The result is messy and inconsistent.
I think Peter Ansell's first option is a good RDF solution, namely:
http://dl.dropbox.com/u/639486/tdwg/1.xml
================= <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456%22%3E <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me "/> </rdf:Description> <!-- person --> <rdf:Description rdf:about="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me "> rdfs:labelSteve Baskauf</rdfs:label> </rdf:Description>
</rdf:RDF> =================
This document says:
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
--> who has name --> "Steve Baskauf"
which I assume is what we want. You can see this graph in the W3C RDF validator here http://tinyurl.com/39nqho2
The RDF has all the information a linked data client needs in order to say this, and we could also write a XSLT style sheet to render this in HTML for people to read.
Adding dwc:recordedBySteve Baskauf</dwc:recordedBy is a hack that breaks the model.
Note also that if the person doesn't have a URI we still shouldn't use dwc:recordedBy as a literal. Instead we can do this:
http://dl.dropbox.com/u/639486/tdwg/2.xml
================= <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456%22%3E <!-- person --> <dwc:recordedBy rdf:parseType="Resource"> rdfs:labelSteve Baskauf</rdfs:label> </dwc:recordedBy> </rdf:Description> </rdf:RDF> =================
Here we are saying that the occurrence was recorded by a person called Steve Baskauf. If you paste this into the W3C validator you get the same model
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <xxx> --> who has name --> "Steve Baskauf"
Since "Steve Baskauf" in this example doesn't have a URI we get a "bnode" with a local identifier. See http://tinyurl.com/392qzb3 .
In both cases (person with or without a URI) we are saying the same thing. If you want a literal for dwc:recordedBy (say for ease of display) then I think you want a different tag that is expressly defined to do just that. For example, http://rs.tdwg.org/ontology/voc/TaxonOccurrence has <identifiedTo> to point to a URI for a taxon, and <identifiedToString> if you want the literal. I don't know if dwc has anything equivalent for recordedBy (and can somebody please tell me why we now have so many vocabularies for the same things?)
Personally I think that starting with XML and trying to generate RDF and HTML from that is going to lead to a world of hurt. I suspect it makes more sense to:
a) model what we want to say b) say it in RDF c) write a XSLT to convert it to HTML for humans
Regards
Rod
On 20 May 2010, at 06:35, Bob Morris wrote:
Per my discussion in answer to the original problem, I think what you are tripping on is that the way you want to do this effectively trying to make a triple with two objects. I believe it is not really a modeling question, but rather a question of how RDF/XML is translated into triples.
Bob Morris
On Wed, May 19, 2010 at 9:52 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
This recent discussion reminds me of a question that I have been wondering about for several months and hadn't gotten around to bringing up: can you have a Darwin Core XML representation where an element has a literal value and an attribute? If the XML is RDF, then I think the answer is pretty much "no" as I just found out with the W3C Validator. However, in generic XML I don't think there is any rule that says that one can't have any attribute that one wants. The only guidance I know of on the subject is: http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#implement It states that the value of a Darwin Core property should be the content of the element rather than stating the value as an attribute. However, I have the situation where I want to store or transfer two somewhat equivalent representations of the value of a property: a string literal form and a URI form. In the example we've been talking about, I would like my generic (non-RDF) XML to do something like this:
<?xml version="1.0" encoding="UTF-8"?>
<dwr:SimpleDarwinRecordSet xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/" xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ " > dwr:SimpleDarwinRecord dwc:occurrenceID rdf:resource="http://herbarium.org/hb123456"http://herbarium.org/hb123456 </dwc:occurrenceID> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/ foaf.rdf#me">Steve Baskauf</dwc:recordedBy> <dwc:basisOfRecord rdf:resource="http://rs.tdwg.org/dwc/dwctype/ PreservedSpecimen">PreservedSpecimen</dwc:basisOfRecord> ... more elements, mostly with string literal values... </dwr:SimpleDarwinRecord> </dwr:SimpleDarwinRecordSet>
This would meet the basic guidelines of the Darwin Core XML Guide in that the literal values would be the contents of the elements. What I don't know is if the inclusion of the rdf:resource attributes would invalidate the XML if it were validated against someone's schema that was silent about attributes or if the schema would have to explicitly say that having an rdf:resource attribute was a valid option. I think I don't know enough about XML schemas ...
The reason why I would like to maintain/transfer both types of values (literal and URI) is so that I could use the XML data to generate both HTML and RDF if I wanted. The HTML would tell humans that the occurrence was a PreservedSpecimen, but the RDF would tell a linked data client that the occurrence was a http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen . I realize that for my own internal use, the XML can have any format I want, but if I were exporting XML for general public use, would it be bad to use the approach above?
Steve
As an aside, I wanted to see exactly what the definition was for rdf:resource However, the usual namespace for rdf: (http://www.w3.org/1999/02/22-rdf-syntax-ns#) doesn't seem to include "resource" in the defined properties. Very odd! Maybe I'm just missing something...
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: ram@cs.umb.edu web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/FilteredPush http://www.cs.umb.edu/~ram phone (+1)617 287 6466 _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
--------------------------------------------------------- Roderic Page Professor of Taxonomy DEEB, FBLS Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Thanks for the suggestions. I think the rdf:Description ... rdf:label ... method says exactly what I want to say (with regards to my first question).
My second question was about the use of XML in general but was not related to RDF, so issues of translating from RDF/XML to triples don't apply. It is true that I could model what I want in RDF, but what I really want to know is whether it is "legal" to "piggyback" (for my own use) information in an attribute if I'm trying to follow somebody else's XML schema that doesn't specifically mention that attribute. In other words, if someone has established an XML schema (not RDF) intended for the standardized transfer of metadata, can I include literals as the element contents (as intended by the schema), but also include a URI as an attribute (not mentioned in the schema). The attribute name wouldn't have to be rdf:resource, but could be anything I made up. I just used rdf:resource in the example because it seemed to describe what the URI was, and I think that is why the issue of RDF got injected into that question.
Steve
Roderic Page wrote:
I think part of the problem here results from trying to satisfy modelling the data and having something that is easy to read (i.e., having both a URI and a literal for the same tag). The result is messy and inconsistent.
I think Peter Ansell's first option is a good RDF solution, namely:
http://dl.dropbox.com/u/639486/tdwg/1.xml
=================
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456"> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
"/> </rdf:Description>
<!-- person --> <rdf:Description rdf:about="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
"> rdfs:labelSteve Baskauf</rdfs:label> </rdf:Description>
</rdf:RDF>
This document says:
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
--> who has name --> "Steve Baskauf"
which I assume is what we want. You can see this graph in the W3C RDF validator here http://tinyurl.com/39nqho2
The RDF has all the information a linked data client needs in order to say this, and we could also write a XSLT style sheet to render this in HTML for people to read.
Adding dwc:recordedBySteve Baskauf</dwc:recordedBy is a hack that breaks the model.
Note also that if the person doesn't have a URI we still shouldn't use dwc:recordedBy as a literal. Instead we can do this:
http://dl.dropbox.com/u/639486/tdwg/2.xml
=================
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456"> <!-- person --> <dwc:recordedBy rdf:parseType="Resource"> <rdfs:label>Steve Baskauf</rdfs:label> </dwc:recordedBy> </rdf:Description>
</rdf:RDF>
Here we are saying that the occurrence was recorded by a person called Steve Baskauf. If you paste this into the W3C validator you get the same model
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <xxx> --> who has name --> "Steve Baskauf"
Since "Steve Baskauf" in this example doesn't have a URI we get a "bnode" with a local identifier. See http://tinyurl.com/392qzb3 .
In both cases (person with or without a URI) we are saying the same thing. If you want a literal for dwc:recordedBy (say for ease of display) then I think you want a different tag that is expressly defined to do just that. For example, http://rs.tdwg.org/ontology/voc/TaxonOccurrence has <identifiedTo> to point to a URI for a taxon, and <identifiedToString> if you want the literal. I don't know if dwc has anything equivalent for recordedBy (and can somebody please tell me why we now have so many vocabularies for the same things?)
Personally I think that starting with XML and trying to generate RDF and HTML from that is going to lead to a world of hurt. I suspect it makes more sense to:
a) model what we want to say b) say it in RDF c) write a XSLT to convert it to HTML for humans
Regards
Rod
On 20 May 2010, at 06:35, Bob Morris wrote:
Per my discussion in answer to the original problem, I think what you are tripping on is that the way you want to do this effectively trying to make a triple with two objects. I believe it is not really a modeling question, but rather a question of how RDF/XML is translated into triples.
Bob Morris
On Wed, May 19, 2010 at 9:52 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
This recent discussion reminds me of a question that I have been wondering about for several months and hadn't gotten around to bringing up: can you have a Darwin Core XML representation where an element has a literal value and an attribute? If the XML is RDF, then I think the answer is pretty much "no" as I just found out with the W3C Validator. However, in generic XML I don't think there is any rule that says that one can't have any attribute that one wants. The only guidance I know of on the subject is: http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#implement It states that the value of a Darwin Core property should be the content of the element rather than stating the value as an attribute. However, I have the situation where I want to store or transfer two somewhat equivalent representations of the value of a property: a string literal form and a URI form. In the example we've been talking about, I would like my generic (non-RDF) XML to do something like this:
<?xml version="1.0" encoding="UTF-8"?>
<dwr:SimpleDarwinRecordSet xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/" xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ " > dwr:SimpleDarwinRecord dwc:occurrenceID rdf:resource="http://herbarium.org/hb123456"http://herbarium.org/hb123456 </dwc:occurrenceID> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/ foaf.rdf#me">Steve Baskauf</dwc:recordedBy> <dwc:basisOfRecord rdf:resource="http://rs.tdwg.org/dwc/dwctype/ PreservedSpecimen">PreservedSpecimen</dwc:basisOfRecord> ... more elements, mostly with string literal values... </dwr:SimpleDarwinRecord> </dwr:SimpleDarwinRecordSet>
This would meet the basic guidelines of the Darwin Core XML Guide in that the literal values would be the contents of the elements. What I don't know is if the inclusion of the rdf:resource attributes would invalidate the XML if it were validated against someone's schema that was silent about attributes or if the schema would have to explicitly say that having an rdf:resource attribute was a valid option. I think I don't know enough about XML schemas ...
The reason why I would like to maintain/transfer both types of values (literal and URI) is so that I could use the XML data to generate both HTML and RDF if I wanted. The HTML would tell humans that the occurrence was a PreservedSpecimen, but the RDF would tell a linked data client that the occurrence was a http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen . I realize that for my own internal use, the XML can have any format I want, but if I were exporting XML for general public use, would it be bad to use the approach above?
Steve
As an aside, I wanted to see exactly what the definition was for rdf:resource However, the usual namespace for rdf: (http://www.w3.org/1999/02/22-rdf-syntax-ns#) doesn't seem to include "resource" in the defined properties. Very odd! Maybe I'm just missing something...
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: ram@cs.umb.edu web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/FilteredPush http://www.cs.umb.edu/~ram phone (+1)617 287 6466 _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Roderic Page Professor of Taxonomy DEEB, FBLS Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
OK, I just had another thought/question about the approach suggested here. If there were only one occurrence record in one RDF XML file, then this approach would be great. I could create an XSLT that would make a human readable view of the RDF file that made use of the rdfs:label information to display the person's name as a string. However, if I have a database containing 10000 occurrence records and each record is represented by an RDF XML file (containing that label statement) that is provided when the HTTP URI guid for that occurrence is dereferenced, then I am making the assertion that PERSON http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me --> has name --> "Steve Baskauf" 10000 times. Would a linked data client that was collecting metadata about my collection record 10000 triples, counting each label description as a separate assertion because it was made by a separate statement in a separate file, or would it be "smart" enough to realize that it was really the same thing being said 10000 times and just record one triple? What this really boils down to is "trust" of sources making assertions-does a linked data client "trust" that one assertion of the label property of http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me is as good as another, or will it feel compelled to keep track of all of the assertions so that a user of the triple store can draw their own conclusions about the validity of all of the individual assertions?
It would be very simple to make the assertion that "Steve Baskauf" is a label for http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me in the FOAF file itself, but then the XSLT method wouldn't work because the XSL wouldn't be dereferencing the foaf.rdf file.
Steve
Roderic Page wrote:
I think part of the problem here results from trying to satisfy modelling the data and having something that is easy to read (i.e., having both a URI and a literal for the same tag). The result is messy and inconsistent.
I think Peter Ansell's first option is a good RDF solution, namely:
http://dl.dropbox.com/u/639486/tdwg/1.xml
=================
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456"> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
"/> </rdf:Description>
<!-- person --> <rdf:Description rdf:about="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
"> rdfs:labelSteve Baskauf</rdfs:label> </rdf:Description>
</rdf:RDF>
This document says:
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
--> who has name --> "Steve Baskauf"
which I assume is what we want. You can see this graph in the W3C RDF validator here http://tinyurl.com/39nqho2
The RDF has all the information a linked data client needs in order to say this, and we could also write a XSLT style sheet to render this in HTML for people to read.
Adding dwc:recordedBySteve Baskauf</dwc:recordedBy is a hack that breaks the model.
Note also that if the person doesn't have a URI we still shouldn't use dwc:recordedBy as a literal. Instead we can do this:
http://dl.dropbox.com/u/639486/tdwg/2.xml
=================
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456"> <!-- person --> <dwc:recordedBy rdf:parseType="Resource"> <rdfs:label>Steve Baskauf</rdfs:label> </dwc:recordedBy> </rdf:Description>
</rdf:RDF>
Here we are saying that the occurrence was recorded by a person called Steve Baskauf. If you paste this into the W3C validator you get the same model
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <xxx> --> who has name --> "Steve Baskauf"
Since "Steve Baskauf" in this example doesn't have a URI we get a "bnode" with a local identifier. See http://tinyurl.com/392qzb3 .
In both cases (person with or without a URI) we are saying the same thing. If you want a literal for dwc:recordedBy (say for ease of display) then I think you want a different tag that is expressly defined to do just that. For example, http://rs.tdwg.org/ontology/voc/TaxonOccurrence has <identifiedTo> to point to a URI for a taxon, and <identifiedToString> if you want the literal. I don't know if dwc has anything equivalent for recordedBy (and can somebody please tell me why we now have so many vocabularies for the same things?)
Personally I think that starting with XML and trying to generate RDF and HTML from that is going to lead to a world of hurt. I suspect it makes more sense to:
a) model what we want to say b) say it in RDF c) write a XSLT to convert it to HTML for humans
Regards
Rod
On 20 May 2010, at 06:35, Bob Morris wrote:
Per my discussion in answer to the original problem, I think what you are tripping on is that the way you want to do this effectively trying to make a triple with two objects. I believe it is not really a modeling question, but rather a question of how RDF/XML is translated into triples.
Bob Morris
On Wed, May 19, 2010 at 9:52 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
This recent discussion reminds me of a question that I have been wondering about for several months and hadn't gotten around to bringing up: can you have a Darwin Core XML representation where an element has a literal value and an attribute? If the XML is RDF, then I think the answer is pretty much "no" as I just found out with the W3C Validator. However, in generic XML I don't think there is any rule that says that one can't have any attribute that one wants. The only guidance I know of on the subject is: http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#implement It states that the value of a Darwin Core property should be the content of the element rather than stating the value as an attribute. However, I have the situation where I want to store or transfer two somewhat equivalent representations of the value of a property: a string literal form and a URI form. In the example we've been talking about, I would like my generic (non-RDF) XML to do something like this:
<?xml version="1.0" encoding="UTF-8"?>
<dwr:SimpleDarwinRecordSet xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/" xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ " > dwr:SimpleDarwinRecord dwc:occurrenceID rdf:resource="http://herbarium.org/hb123456"http://herbarium.org/hb123456 </dwc:occurrenceID> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/ foaf.rdf#me">Steve Baskauf</dwc:recordedBy> <dwc:basisOfRecord rdf:resource="http://rs.tdwg.org/dwc/dwctype/ PreservedSpecimen">PreservedSpecimen</dwc:basisOfRecord> ... more elements, mostly with string literal values... </dwr:SimpleDarwinRecord> </dwr:SimpleDarwinRecordSet>
This would meet the basic guidelines of the Darwin Core XML Guide in that the literal values would be the contents of the elements. What I don't know is if the inclusion of the rdf:resource attributes would invalidate the XML if it were validated against someone's schema that was silent about attributes or if the schema would have to explicitly say that having an rdf:resource attribute was a valid option. I think I don't know enough about XML schemas ...
The reason why I would like to maintain/transfer both types of values (literal and URI) is so that I could use the XML data to generate both HTML and RDF if I wanted. The HTML would tell humans that the occurrence was a PreservedSpecimen, but the RDF would tell a linked data client that the occurrence was a http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen . I realize that for my own internal use, the XML can have any format I want, but if I were exporting XML for general public use, would it be bad to use the approach above?
Steve
As an aside, I wanted to see exactly what the definition was for rdf:resource However, the usual namespace for rdf: (http://www.w3.org/1999/02/22-rdf-syntax-ns#) doesn't seem to include "resource" in the defined properties. Very odd! Maybe I'm just missing something...
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: ram@cs.umb.edu web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/FilteredPush http://www.cs.umb.edu/~ram phone (+1)617 287 6466 _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Roderic Page Professor of Taxonomy DEEB, FBLS Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
For those who might be interested, I have answered my own question (at least for one instance). I created an rdf/xml file just like http://dl.dropbox.com/u/639486/tdwg/1.xml except that I changed the URI of the resource described by the about attribute to "http://herbarium.org/hb123457". I then had the OpenLink RDF browser http://demo.openlinksw.com/rdfbrowser/ query both files. It only listed a single triple for http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me --> label --> "Steve Baskauf" If I changed the label in the hb123457 file to "Steven J Baskauf", OpenLink showed http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me as having two labels, "Steve Baskauf" and "Steven J Baskauf". When I had OpenLink query the http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf file itself, it merged the label property with the other properties of the http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me URI.
So at least in the case of one Linked Data client (OpenLink), redundant information is not recorded as additional triples. So at least for Linked Data clients similar to OpenLink, in RDF the method of labeling a URI used as the object of a Darwin Core term seems to be a good way of providing both types of information (URI and text) for "local consumption" (within a single RDF file for an occurrence).
Steve
Steve Baskauf wrote:
OK, I just had another thought/question about the approach suggested here. If there were only one occurrence record in one RDF XML file, then this approach would be great. I could create an XSLT that would make a human readable view of the RDF file that made use of the rdfs:label information to display the person's name as a string. However, if I have a database containing 10000 occurrence records and each record is represented by an RDF XML file (containing that label statement) that is provided when the HTTP URI guid for that occurrence is dereferenced, then I am making the assertion that PERSON http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me --> has name --> "Steve Baskauf" 10000 times. Would a linked data client that was collecting metadata about my collection record 10000 triples, counting each label description as a separate assertion because it was made by a separate statement in a separate file, or would it be "smart" enough to realize that it was really the same thing being said 10000 times and just record one triple? What this really boils down to is "trust" of sources making assertions-does a linked data client "trust" that one assertion of the label property of http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me is as good as another, or will it feel compelled to keep track of all of the assertions so that a user of the triple store can draw their own conclusions about the validity of all of the individual assertions?
It would be very simple to make the assertion that "Steve Baskauf" is a label for http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me in the FOAF file itself, but then the XSLT method wouldn't work because the XSL wouldn't be dereferencing the foaf.rdf file.
Steve
Roderic Page wrote:
I think part of the problem here results from trying to satisfy modelling the data and having something that is easy to read (i.e., having both a URI and a literal for the same tag). The result is messy and inconsistent.
I think Peter Ansell's first option is a good RDF solution, namely:
http://dl.dropbox.com/u/639486/tdwg/1.xml
=================
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456"> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
"/> </rdf:Description>
<!-- person --> <rdf:Description rdf:about="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
"> rdfs:labelSteve Baskauf</rdfs:label> </rdf:Description>
</rdf:RDF>
This document says:
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
--> who has name --> "Steve Baskauf"
which I assume is what we want. You can see this graph in the W3C RDF validator here http://tinyurl.com/39nqho2
The RDF has all the information a linked data client needs in order to say this, and we could also write a XSLT style sheet to render this in HTML for people to read.
Adding dwc:recordedBySteve Baskauf</dwc:recordedBy is a hack that breaks the model.
Note also that if the person doesn't have a URI we still shouldn't use dwc:recordedBy as a literal. Instead we can do this:
http://dl.dropbox.com/u/639486/tdwg/2.xml
=================
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456"> <!-- person --> <dwc:recordedBy rdf:parseType="Resource"> <rdfs:label>Steve Baskauf</rdfs:label> </dwc:recordedBy> </rdf:Description>
</rdf:RDF>
Here we are saying that the occurrence was recorded by a person called Steve Baskauf. If you paste this into the W3C validator you get the same model
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <xxx> --> who has name --> "Steve Baskauf"
Since "Steve Baskauf" in this example doesn't have a URI we get a "bnode" with a local identifier. See http://tinyurl.com/392qzb3 .
In both cases (person with or without a URI) we are saying the same thing. If you want a literal for dwc:recordedBy (say for ease of display) then I think you want a different tag that is expressly defined to do just that. For example, http://rs.tdwg.org/ontology/voc/TaxonOccurrence has <identifiedTo> to point to a URI for a taxon, and <identifiedToString> if you want the literal. I don't know if dwc has anything equivalent for recordedBy (and can somebody please tell me why we now have so many vocabularies for the same things?)
Personally I think that starting with XML and trying to generate RDF and HTML from that is going to lead to a world of hurt. I suspect it makes more sense to:
a) model what we want to say b) say it in RDF c) write a XSLT to convert it to HTML for humans
Regards
Rod
On 20 May 2010, at 06:35, Bob Morris wrote:
Per my discussion in answer to the original problem, I think what you are tripping on is that the way you want to do this effectively trying to make a triple with two objects. I believe it is not really a modeling question, but rather a question of how RDF/XML is translated into triples.
Bob Morris
On Wed, May 19, 2010 at 9:52 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
This recent discussion reminds me of a question that I have been wondering about for several months and hadn't gotten around to bringing up: can you have a Darwin Core XML representation where an element has a literal value and an attribute? If the XML is RDF, then I think the answer is pretty much "no" as I just found out with the W3C Validator. However, in generic XML I don't think there is any rule that says that one can't have any attribute that one wants. The only guidance I know of on the subject is: http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#implement It states that the value of a Darwin Core property should be the content of the element rather than stating the value as an attribute. However, I have the situation where I want to store or transfer two somewhat equivalent representations of the value of a property: a string literal form and a URI form. In the example we've been talking about, I would like my generic (non-RDF) XML to do something like this:
<?xml version="1.0" encoding="UTF-8"?>
<dwr:SimpleDarwinRecordSet xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/" xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ " > dwr:SimpleDarwinRecord dwc:occurrenceID rdf:resource="http://herbarium.org/hb123456"http://herbarium.org/hb123456 </dwc:occurrenceID> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/ foaf.rdf#me">Steve Baskauf</dwc:recordedBy> <dwc:basisOfRecord rdf:resource="http://rs.tdwg.org/dwc/dwctype/ PreservedSpecimen">PreservedSpecimen</dwc:basisOfRecord> ... more elements, mostly with string literal values... </dwr:SimpleDarwinRecord> </dwr:SimpleDarwinRecordSet>
This would meet the basic guidelines of the Darwin Core XML Guide in that the literal values would be the contents of the elements. What I don't know is if the inclusion of the rdf:resource attributes would invalidate the XML if it were validated against someone's schema that was silent about attributes or if the schema would have to explicitly say that having an rdf:resource attribute was a valid option. I think I don't know enough about XML schemas ...
The reason why I would like to maintain/transfer both types of values (literal and URI) is so that I could use the XML data to generate both HTML and RDF if I wanted. The HTML would tell humans that the occurrence was a PreservedSpecimen, but the RDF would tell a linked data client that the occurrence was a http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen . I realize that for my own internal use, the XML can have any format I want, but if I were exporting XML for general public use, would it be bad to use the approach above?
Steve
As an aside, I wanted to see exactly what the definition was for rdf:resource However, the usual namespace for rdf: (http://www.w3.org/1999/02/22-rdf-syntax-ns#) doesn't seem to include "resource" in the defined properties. Very odd! Maybe I'm just missing something...
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: ram@cs.umb.edu web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/FilteredPush http://www.cs.umb.edu/~ram phone (+1)617 287 6466 _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Roderic Page Professor of Taxonomy DEEB, FBLS Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Hi Steve,
Maybe I am missing something here, but RDF is a knowledge representation language, therefore stating something (a triple) more than once would not make sense, and duplicates are not kept as far as I know. This unless as you say there would be a provenance mechanism in place, but this last issue is being studied in the context of named graph, which are not yet in widespread use. With named graph triples coming from different sources (basically RDF files) would have a sort of fourth attribute that could make them different from same statements stated elsewhere. Still within the same file this would not lead to multiple equal statements.
Regards,
Stefano Bocconi
Steve Baskauf wrote:
For those who might be interested, I have answered my own question (at least for one instance). I created an rdf/xml file just like http://dl.dropbox.com/u/639486/tdwg/1.xml except that I changed the URI of the resource described by the about attribute to "http://herbarium.org/hb123457". I then had the OpenLink RDF browser http://demo.openlinksw.com/rdfbrowser/ query both files. It only listed a single triple for http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me --> label --> "Steve Baskauf" If I changed the label in the hb123457 file to "Steven J Baskauf", OpenLink showed http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me as having two labels, "Steve Baskauf" and "Steven J Baskauf". When I had OpenLink query the http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf file itself, it merged the label property with the other properties of the http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me URI.
So at least in the case of one Linked Data client (OpenLink), redundant information is not recorded as additional triples. So at least for Linked Data clients similar to OpenLink, in RDF the method of labeling a URI used as the object of a Darwin Core term seems to be a good way of providing both types of information (URI and text) for "local consumption" (within a single RDF file for an occurrence).
Steve
Steve Baskauf wrote:
OK, I just had another thought/question about the approach suggested here. If there were only one occurrence record in one RDF XML file, then this approach would be great. I could create an XSLT that would make a human readable view of the RDF file that made use of the rdfs:label information to display the person's name as a string. However, if I have a database containing 10000 occurrence records and each record is represented by an RDF XML file (containing that label statement) that is provided when the HTTP URI guid for that occurrence is dereferenced, then I am making the assertion that PERSON http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me --> has name --> "Steve Baskauf" 10000 times. Would a linked data client that was collecting metadata about my collection record 10000 triples, counting each label description as a separate assertion because it was made by a separate statement in a separate file, or would it be "smart" enough to realize that it was really the same thing being said 10000 times and just record one triple? What this really boils down to is "trust" of sources making assertions-does a linked data client "trust" that one assertion of the label property of http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me is as good as another, or will it feel compelled to keep track of all of the assertions so that a user of the triple store can draw their own conclusions about the validity of all of the individual assertions?
It would be very simple to make the assertion that "Steve Baskauf" is a label for http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me in the FOAF file itself, but then the XSLT method wouldn't work because the XSL wouldn't be dereferencing the foaf.rdf file.
Steve
Roderic Page wrote:
I think part of the problem here results from trying to satisfy modelling the data and having something that is easy to read (i.e., having both a URI and a literal for the same tag). The result is messy and inconsistent.
I think Peter Ansell's first option is a good RDF solution, namely:
http://dl.dropbox.com/u/639486/tdwg/1.xml
=================
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456"> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
"/> </rdf:Description>
<!-- person --> <rdf:Description rdf:about="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
"> rdfs:labelSteve Baskauf</rdfs:label> </rdf:Description>
</rdf:RDF>
This document says:
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
--> who has name --> "Steve Baskauf"
which I assume is what we want. You can see this graph in the W3C RDF validator here http://tinyurl.com/39nqho2
The RDF has all the information a linked data client needs in order to say this, and we could also write a XSLT style sheet to render this in HTML for people to read.
Adding dwc:recordedBySteve Baskauf</dwc:recordedBy is a hack that breaks the model.
Note also that if the person doesn't have a URI we still shouldn't use dwc:recordedBy as a literal. Instead we can do this:
http://dl.dropbox.com/u/639486/tdwg/2.xml
=================
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456"> <!-- person --> <dwc:recordedBy rdf:parseType="Resource"> <rdfs:label>Steve Baskauf</rdfs:label> </dwc:recordedBy> </rdf:Description>
</rdf:RDF>
Here we are saying that the occurrence was recorded by a person called Steve Baskauf. If you paste this into the W3C validator you get the same model
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <xxx> --> who has name --> "Steve Baskauf"
Since "Steve Baskauf" in this example doesn't have a URI we get a "bnode" with a local identifier. See http://tinyurl.com/392qzb3 .
In both cases (person with or without a URI) we are saying the same thing. If you want a literal for dwc:recordedBy (say for ease of display) then I think you want a different tag that is expressly defined to do just that. For example, http://rs.tdwg.org/ontology/voc/TaxonOccurrence has <identifiedTo> to point to a URI for a taxon, and <identifiedToString> if you want the literal. I don't know if dwc has anything equivalent for recordedBy (and can somebody please tell me why we now have so many vocabularies for the same things?)
Personally I think that starting with XML and trying to generate RDF and HTML from that is going to lead to a world of hurt. I suspect it makes more sense to:
a) model what we want to say b) say it in RDF c) write a XSLT to convert it to HTML for humans
Regards
Rod
On 20 May 2010, at 06:35, Bob Morris wrote:
Per my discussion in answer to the original problem, I think what you are tripping on is that the way you want to do this effectively trying to make a triple with two objects. I believe it is not really a modeling question, but rather a question of how RDF/XML is translated into triples.
Bob Morris
On Wed, May 19, 2010 at 9:52 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
This recent discussion reminds me of a question that I have been wondering about for several months and hadn't gotten around to bringing up: can you have a Darwin Core XML representation where an element has a literal value and an attribute? If the XML is RDF, then I think the answer is pretty much "no" as I just found out with the W3C Validator. However, in generic XML I don't think there is any rule that says that one can't have any attribute that one wants. The only guidance I know of on the subject is: http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#implement It states that the value of a Darwin Core property should be the content of the element rather than stating the value as an attribute. However, I have the situation where I want to store or transfer two somewhat equivalent representations of the value of a property: a string literal form and a URI form. In the example we've been talking about, I would like my generic (non-RDF) XML to do something like this:
<?xml version="1.0" encoding="UTF-8"?>
<dwr:SimpleDarwinRecordSet xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/" xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ " > dwr:SimpleDarwinRecord dwc:occurrenceID rdf:resource="http://herbarium.org/hb123456"http://herbarium.org/hb123456 </dwc:occurrenceID> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/ foaf.rdf#me">Steve Baskauf</dwc:recordedBy> <dwc:basisOfRecord rdf:resource="http://rs.tdwg.org/dwc/dwctype/ PreservedSpecimen">PreservedSpecimen</dwc:basisOfRecord> ... more elements, mostly with string literal values... </dwr:SimpleDarwinRecord> </dwr:SimpleDarwinRecordSet>
This would meet the basic guidelines of the Darwin Core XML Guide in that the literal values would be the contents of the elements. What I don't know is if the inclusion of the rdf:resource attributes would invalidate the XML if it were validated against someone's schema that was silent about attributes or if the schema would have to explicitly say that having an rdf:resource attribute was a valid option. I think I don't know enough about XML schemas ...
The reason why I would like to maintain/transfer both types of values (literal and URI) is so that I could use the XML data to generate both HTML and RDF if I wanted. The HTML would tell humans that the occurrence was a PreservedSpecimen, but the RDF would tell a linked data client that the occurrence was a http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen . I realize that for my own internal use, the XML can have any format I want, but if I were exporting XML for general public use, would it be bad to use the approach above?
Steve
As an aside, I wanted to see exactly what the definition was for rdf:resource However, the usual namespace for rdf: (http://www.w3.org/1999/02/22-rdf-syntax-ns#) doesn't seem to include "resource" in the defined properties. Very odd! Maybe I'm just missing something...
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: ram@cs.umb.edu web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/FilteredPush http://www.cs.umb.edu/~ram phone (+1)617 287 6466 _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Roderic Page Professor of Taxonomy DEEB, FBLS Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Stefano, From the standpoint of the files as RDF and the Linked Data world, it would be a waste of time to ever define a label for a URI-identified resource more than once. However, I am thinking about this problem from the standpoint of the GUID requirements for providing both human and machine-readable representations of a resource. If one must create the RDF to meet the GUID requirement for making available a machine-readable representation, then it would be a relatively simple matter to create the required human-readable representation through AJAX or some other method that can use the RDF XML file as a data source. AJAX is "stupid" in the sense that it doesn't "understand" RDF. It just uses an XML file as a source of data. So an HTML file using the RDF+XML file as an AJAX data source would need to get the label information from within the particular RDF+XML file containing the representation of the GUID and not get it by dereferencing a link to another file, like the FOAF file that's the object of the dwc:recordedBy triple. That would be the reason for providing the rdfs:label property within multiple files.
I'm working on a functional example of this. When I get it working, I'll post it to the list. Steve
Stefano Bocconi wrote:
Hi Steve,
Maybe I am missing something here, but RDF is a knowledge representation language, therefore stating something (a triple) more than once would not make sense, and duplicates are not kept as far as I know. This unless as you say there would be a provenance mechanism in place, but this last issue is being studied in the context of named graph, which are not yet in widespread use. With named graph triples coming from different sources (basically RDF files) would have a sort of fourth attribute that could make them different from same statements stated elsewhere. Still within the same file this would not lead to multiple equal statements.
Regards,
Stefano Bocconi
Steve Baskauf wrote:
For those who might be interested, I have answered my own question (at least for one instance). I created an rdf/xml file just like http://dl.dropbox.com/u/639486/tdwg/1.xml except that I changed the URI of the resource described by the about attribute to "http://herbarium.org/hb123457". I then had the OpenLink RDF browser http://demo.openlinksw.com/rdfbrowser/ query both files. It only listed a single triple for http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me --> label --> "Steve Baskauf" If I changed the label in the hb123457 file to "Steven J Baskauf", OpenLink showed http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me as having two labels, "Steve Baskauf" and "Steven J Baskauf". When I had OpenLink query the http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf file itself, it merged the label property with the other properties of the http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me URI.
So at least in the case of one Linked Data client (OpenLink), redundant information is not recorded as additional triples. So at least for Linked Data clients similar to OpenLink, in RDF the method of labeling a URI used as the object of a Darwin Core term seems to be a good way of providing both types of information (URI and text) for "local consumption" (within a single RDF file for an occurrence).
Steve
Steve Baskauf wrote:
OK, I just had another thought/question about the approach suggested here. If there were only one occurrence record in one RDF XML file, then this approach would be great. I could create an XSLT that would make a human readable view of the RDF file that made use of the rdfs:label information to display the person's name as a string. However, if I have a database containing 10000 occurrence records and each record is represented by an RDF XML file (containing that label statement) that is provided when the HTTP URI guid for that occurrence is dereferenced, then I am making the assertion that PERSON http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me --> has name --> "Steve Baskauf" 10000 times. Would a linked data client that was collecting metadata about my collection record 10000 triples, counting each label description as a separate assertion because it was made by a separate statement in a separate file, or would it be "smart" enough to realize that it was really the same thing being said 10000 times and just record one triple? What this really boils down to is "trust" of sources making assertions-does a linked data client "trust" that one assertion of the label property of http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me is as good as another, or will it feel compelled to keep track of all of the assertions so that a user of the triple store can draw their own conclusions about the validity of all of the individual assertions?
It would be very simple to make the assertion that "Steve Baskauf" is a label for http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me in the FOAF file itself, but then the XSLT method wouldn't work because the XSL wouldn't be dereferencing the foaf.rdf file.
Steve
Roderic Page wrote:
I think part of the problem here results from trying to satisfy modelling the data and having something that is easy to read (i.e., having both a URI and a literal for the same tag). The result is messy and inconsistent.
I think Peter Ansell's first option is a good RDF solution, namely:
http://dl.dropbox.com/u/639486/tdwg/1.xml
=================
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456"> <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
"/> </rdf:Description>
<!-- person --> <rdf:Description rdf:about="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
"> rdfs:labelSteve Baskauf</rdfs:label> </rdf:Description>
</rdf:RDF>
This document says:
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
--> who has name --> "Steve Baskauf"
which I assume is what we want. You can see this graph in the W3C RDF validator here http://tinyurl.com/39nqho2
The RDF has all the information a linked data client needs in order to say this, and we could also write a XSLT style sheet to render this in HTML for people to read.
Adding dwc:recordedBySteve Baskauf</dwc:recordedBy is a hack that breaks the model.
Note also that if the person doesn't have a URI we still shouldn't use dwc:recordedBy as a literal. Instead we can do this:
http://dl.dropbox.com/u/639486/tdwg/2.xml
=================
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/%22%3E
<!-- occurrence --> <rdf:Description rdf:about="http://herbarium.org/hb123456"> <!-- person --> <dwc:recordedBy rdf:parseType="Resource"> <rdfs:label>Steve Baskauf</rdfs:label> </dwc:recordedBy> </rdf:Description>
</rdf:RDF>
Here we are saying that the occurrence was recorded by a person called Steve Baskauf. If you paste this into the W3C validator you get the same model
OCCURRENCE http://herbarium.org/hb123456 --> recorded by --> PERSON <xxx> --> who has name --> "Steve Baskauf"
Since "Steve Baskauf" in this example doesn't have a URI we get a "bnode" with a local identifier. See http://tinyurl.com/392qzb3 .
In both cases (person with or without a URI) we are saying the same thing. If you want a literal for dwc:recordedBy (say for ease of display) then I think you want a different tag that is expressly defined to do just that. For example, http://rs.tdwg.org/ontology/voc/TaxonOccurrence has <identifiedTo> to point to a URI for a taxon, and <identifiedToString> if you want the literal. I don't know if dwc has anything equivalent for recordedBy (and can somebody please tell me why we now have so many vocabularies for the same things?)
Personally I think that starting with XML and trying to generate RDF and HTML from that is going to lead to a world of hurt. I suspect it makes more sense to:
a) model what we want to say b) say it in RDF c) write a XSLT to convert it to HTML for humans
Regards
Rod
On 20 May 2010, at 06:35, Bob Morris wrote:
Per my discussion in answer to the original problem, I think what you are tripping on is that the way you want to do this effectively trying to make a triple with two objects. I believe it is not really a modeling question, but rather a question of how RDF/XML is translated into triples.
Bob Morris
On Wed, May 19, 2010 at 9:52 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
This recent discussion reminds me of a question that I have been wondering about for several months and hadn't gotten around to bringing up: can you have a Darwin Core XML representation where an element has a literal value and an attribute? If the XML is RDF, then I think the answer is pretty much "no" as I just found out with the W3C Validator. However, in generic XML I don't think there is any rule that says that one can't have any attribute that one wants. The only guidance I know of on the subject is: http://rs.tdwg.org/dwc/terms/guides/xml/index.htm#implement It states that the value of a Darwin Core property should be the content of the element rather than stating the value as an attribute. However, I have the situation where I want to store or transfer two somewhat equivalent representations of the value of a property: a string literal form and a URI form. In the example we've been talking about, I would like my generic (non-RDF) XML to do something like this:
<?xml version="1.0" encoding="UTF-8"?>
<dwr:SimpleDarwinRecordSet xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dwc="http://rs.tdwg.org/dwc/terms/" xmlns:dwr="http://rs.tdwg.org/dwc/xsd/simpledarwincore/ " > dwr:SimpleDarwinRecord dwc:occurrenceID rdf:resource="http://herbarium.org/hb123456"http://herbarium.org/hb123456 </dwc:occurrenceID> dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/ foaf.rdf#me"Steve Baskauf</dwc:recordedBy> dwc:basisOfRecord rdf:resource="http://rs.tdwg.org/dwc/dwctype/ PreservedSpecimen"PreservedSpecimen</dwc:basisOfRecord> ... more elements, mostly with string literal values... </dwr:SimpleDarwinRecord> </dwr:SimpleDarwinRecordSet>
This would meet the basic guidelines of the Darwin Core XML Guide in that the literal values would be the contents of the elements. What I don't know is if the inclusion of the rdf:resource attributes would invalidate the XML if it were validated against someone's schema that was silent about attributes or if the schema would have to explicitly say that having an rdf:resource attribute was a valid option. I think I don't know enough about XML schemas ...
The reason why I would like to maintain/transfer both types of values (literal and URI) is so that I could use the XML data to generate both HTML and RDF if I wanted. The HTML would tell humans that the occurrence was a PreservedSpecimen, but the RDF would tell a linked data client that the occurrence was a http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen . I realize that for my own internal use, the XML can have any format I want, but if I were exporting XML for general public use, would it be bad to use the approach above?
Steve
As an aside, I wanted to see exactly what the definition was for rdf:resource However, the usual namespace for rdf: (http://www.w3.org/1999/02/22-rdf-syntax-ns#) doesn't seem to include "resource" in the defined properties. Very odd! Maybe I'm just missing something...
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: ram@cs.umb.edu web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/FilteredPush http://www.cs.umb.edu/~ram phone (+1)617 287 6466 _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Roderic Page Professor of Taxonomy DEEB, FBLS Graham Kerr Building University of Glasgow Glasgow G12 8QQ, UK
Email: r.page@bio.gla.ac.uk Tel: +44 141 330 4778 Fax: +44 141 330 2792 AIM: rodpage1962@aim.com Facebook: http://www.facebook.com/profile.php?id=1112517192 Twitter: http://twitter.com/rdmpage Blog: http://iphylo.blogspot.com Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
.
Hi Steve:
On May 21, 2010, at 9:28 AM, Steve Baskauf wrote:
AJAX is "stupid" in the sense that it doesn't "understand" RDF. It just uses an XML file as a source of data. So an HTML file using the RDF+XML file as an AJAX data source would need to get the label information from within the particular RDF+XML file containing the representation of the GUID and not get it by dereferencing a link to another file, like the FOAF file that's the object of the dwc:recordedBy triple.
I'm not really following your argument here - maybe I'm missing some detail. RDF can be serialized to XML (and in fact your example is precisely such a serialization), so if your AJAX code needs XML as source, there is one. There are also JavaScript libraries that can invoke XSLTs on an XML source and render the resulting HTML (if that's what you need), and finally, there are also RDF->JSON converters. So I'm not sure where you see the bottleneck or hurdle.
-hilmar
What I'm saying is that if the XML-RDF file used as the data source contains only the element <dwc:recordedBy rdf:resource="http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me%22/%3E there is no way for AJAX, XSLT, or anything else to know that the string for my name is "Steve Baskauf" without dereferencing the foaf.rdf file and interpreting it. Thus the rendered HTML can't display the text "Steve Baskauf" for a human user as the value of recordedBy. However, including the element rdfs:labelSteve Baskauf</rdfs:label> somewhere in the RDF+XML file makes it possible for any of the rendering methods you mentioned to get that information without dereferencing another URI and without "understanding" RDF.
This was the issue which I think this got lost somewhere earlier in the thread. Steve
Hilmar Lapp wrote:
Hi Steve:
On May 21, 2010, at 9:28 AM, Steve Baskauf wrote:
AJAX is "stupid" in the sense that it doesn't "understand" RDF. It just uses an XML file as a source of data. So an HTML file using the RDF+XML file as an AJAX data source would need to get the label information from within the particular RDF+XML file containing the representation of the GUID and not get it by dereferencing a link to another file, like the FOAF file that's the object of the dwc:recordedBy triple.
I'm not really following your argument here - maybe I'm missing some detail. RDF can be serialized to XML (and in fact your example is precisely such a serialization), so if your AJAX code needs XML as source, there is one. There are also JavaScript libraries that can invoke XSLTs on an XML source and render the resulting HTML (if that's what you need), and finally, there are also RDF->JSON converters. So I'm not sure where you see the bottleneck or hurdle.
-hilmar
=========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : ===========================================================
participants (5)
-
Bob Morris
-
Hilmar Lapp
-
Roderic Page
-
Stefano Bocconi
-
Steve Baskauf