RFC 2119: Re: Public comment on the Darwin Core RDF Guide
Bob Morris noted that it may be appropriate for the DarwinCore RDF Guide to follow RFC 2119 (something for which at least the TDWG GUID applicability statement provides precedent).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Internet Engineering Task Force (IETF). RFC 2119. Key words for use in RFCs to Indicate Requirement Levels. http://www.ietf.org/rfc/rfc2119.txt
This RFC notes: "These words are often capitalized" and "Imperatives of the type defined in this memo must be used with care and sparingly. In particular, they MUST only be used where it is actually required for interoperation or to limit behavior which has potential for causing harm". There are 69 occurrences of "should" in the guide, many of these instances appear to have the intent of colloqual english and probably other words should be substituted in those cases.
I've taken a stab here at a set of cases where it feels like it might be appropriate to use the RFC 2119 imperatives.
A question is whether it is the role of the guide to use MUST/MUST NOT/REQURED anywhere (except where that is forced by inheritance from elsewhere)? A potential place for asserting MUST/MUST NOT is the distinction between dwc and dwciri namespaces. I've put more discussion under the headings 2.4.1.2 and 2.5 below.
-Paul
----
1.3.2.1 Persistent Identifiers
s/the provider should take care to ensure/the provider SHOULD take care to ensure/
s/it must be converted to an IRI/it MUST be converted to an IRI/
----
1.3.2.2 HTTP IRIs as self-resolving GUIDs
s/through RDF should plan to implement GUIDs/through RDF MUST plan to implement GUIDs/
For consistency with "Must" in the GUID applicability statement.
----
1.4.1 Well-known vocabularies
s/the provider should assign the term an IRI,/the provider SHOULD assign the term an IRI,/
----
1.4.3 Use of Darwin Core terms in RDF
s/each value should be referenced/each value MUST be referenced/
By the nature of object references.
----
1.4.4 Limitations of this guide
s/Darwin Core property terms should be used as RDF predicates and specifies that Darwin Core class terms should be used in rdf:type/Darwin Core property terms SHOULD be used as RDF predicates and specifies that Darwin Core class terms SHOULD be used in rdf:type/
----
1.5.5 Implications for expressing Darwin Core string values as RDF
s/in which RDF should be structured/in which RDF SHOULD be structured/
----
Example 1:
s/Predicates must be identified by IRIs/Predicates MUST be identified by IRIs/
----
2.2 Subject resources
s/it must be referenced by an IRI/it MUST be referenced by an IRI/
----
2.2.2 Associating a string identifier with a subject resource
s/dcterms:identifier should be used to/dcterms:identifier SHOULD be used to/
s/it is acceptable to present it as a string literal value for dcterms:identifier/it MAY be presented as a string literal value for dcterms:identifier/
----
2.3.1.1 rdf:type statement
s/The class should be identified by an IRI reference/The class MUST be identified by an IRI reference/
----
2.3.1.3 Explicit vs. inferred type declarations
s/data providers should exercise caution in using any such term in a non-standard way/data providers SHOULD NOT use any such term in a non-standard way/
s/the provider should type the resource/the provider MAY type the resource/ ?? Or ?? s/the provider should type the resource/the provider SHOULD type the resource/
----
2.3.1.4 Other predicates used to indicate type
s/in an RDF description should be considered optional, while including rdf:type should be considered highly recommended/in an RDF description is OPTIONAL, while rdf:type SHOULD be included/
s/A dwciri: analogue (Section 2.5) of dwc:basisOfRecord should not be used/A dwciri: analogue (Section 2.5) of dwc:basisOfRecord MUST NOT be used/
----
2.3.1.5 Classes to be used for type declarations of resources described using Darwin Core
s/that should also be used/that SHOULD also be used/
----
Example 9:
s/a provider should include an xml:lang/a provider SHOULD include an xml:lang/
----
Example 10:
I'm not sure about this one:
/language tags should be interpreted by clients
s/may initially choose to expose literals without datatype attributes, they should/MAY initially choose to expose literals without datatype attributes, they SHOULD/
----
2.4.1.2 Terms intended for use with literal objects
I'll put a stake in the ground for discussion here: Is the guide sufficiently normative to assert MUST (other than where that property is inherited from elsewhere)? If so, then the distinction between dwc and dwciri is the place to make that assertion:
s/that terms in the dwc: namespace should be restricted to use with literal objects/that terms in the dwc: namespace MUST be restricted to use with literal objects/
----
2.4.2.1.1 Objects identified by LSIDs
This one needs to be checked for consistency with the LSID Applicability Statements:
s/version of the LSID should be used instead/version of the LSID SHOULD be used instead/
----
Example 17:
s/particular terms should be used with literal objects, or with IRI reference objects/particular terms SHOULD be used with literal objects, or SHOULD be used with IRI reference objects/
----
2.4.3.2 Literal values for non-literal resources in Darwin Core
s/existing Darwin Core term in the dwc: namespace should have the same structure/existing Darwin Core term in the dwc: namespace SHOULD have the same structure/
----
2.5 Terms in the dwciri: namespace
This is the companion case to 2.4.1.2 - is the distinction between dwciri and dwc to be asserted as SHOULD or MUST - using them incorrectly does have the "potential for causing harm" in the language of RFC 2119, but does the guide rise to the level of specifying an "absolute requirement of the specification".
s/IRI reference objects and should NOT be used with literal/ IRI reference objects and MUST NOT be used with literal/
----
2.5.1 Definition of dwciri: terms
s/resource described by a dwciri: property should be the subject of a triple for each value on the list/resource described by a dwciri: property SHOULD be the subject of a triple for each value on the list/
A counter example here comes from the Harvard List of Botanists, where http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/305dcac3-a748-4f47-8a18-3... references the collector team "J. D. Hooker & A. Gray", that is, http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/cc3b7080-5fbb-4ea5-9655-4... and http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/3f8c70aa-1862-4784-8a53-f...
----
2.5.3 Expectation of clients encountering RDF containing dwc: and dwciri: terms
Instances of "should" here again cut to the heart of how strong the guidance is concerning dwc and dwciri.
----
Table 3
s/they should be used rather than using the Darwin Core ID/they SHOULD be used rather than using the Darwin Core ID/
----
Example 22:
s/The following points about the Example 22 should be noted:/Note the following points about Example 22:/
s/related non-literal resources should use/related non-literal resources SHOULD use/
/LSIDs are the objects of triples, they should be
As with 2.4.2.1.1, this guidance needs to be checked against the GUID and LSID applicability statements. I think I'm reading the GUID applicability statement as s/should/MUST/ here.
---- 2.7.1 What purpose do convenience terms serve?
s/In general, it should not be necessary for a data provider/In general, a data provider does not need/
----
2.7.3 Ownership of a collection item
s/of the collection item should be indicated/of the collection item SHOULD be indicated/
----
2.7.4 Description of a taxonomic entity
s/It is considered to be out of the scope of this document to specify how taxon concepts should be rendered as RDF/It is out of the scope of this document to specify how to render taxon concepts as RDF/
s/terms for taxonomic entities should be properties of dwc:Identification/terms for taxonomic entities MAY be properties of dwc:Identification/
s/The task of describing taxonomic entities using RDF must be an effort outside of Darwin Core/The task of describing taxonomic entities using RDF is out of scope of this document/
----
2.8.3 Expressing Darwin Core association terms as RDF with URI references
s/it should be used to declare the type/rdf:type SHOULD be used to declare the type/
----
Tables 3.4 to 3.7.
s/should/SHOULD/
--------
[oops, sent from wrong address, re-sending... will reply to Paul's reply separately...]
I'm very skeptical of applying 2119 to vocabulary specifications. I don't think there's any clear agreement on what constitutes conformance to a vocabulary specification, or even what kind of thing might conform to one, or how you would test conformance. The examples you give are all over the place: conformance is required sometimes of a document, sometimes a curation process, sometimes a person. And what constitutes conformance - is it truth (the taxon MUST be a genus ???), or just intent (the intended claim MUST be that the taxon is a genus ???), or what? How do you test something that uses a vocabulary?
I think the talk of "vendors" in 2119 is telling. It gives the intended context of application: you are trying to come to an agreement with someone regarding whether to share some artifact, so you negotiate objective conformance criteria that the provider/seller can verifiably meet and the consumer/buyer can expect and verify. I bet very few of the requirements of this or any other vocabulary are objective enough to meet the kind of standard that engineers and vendors would expect from a specification. Seller/buyer scenarios don't sound much like the use of vocabularies in TDWG community. But even if they did, I think there's too much dissonance between 2119-land and vocabulary-land to risk use of 2119 in a vocabulary spec.
Do you know of any precedent for 2119 language to be used in a vocabulary specification? I'd be very interested to see that... TDWG should not be a pioneer when it comes to this kind of thing.
Also note that in 2119, if an artifact fails to meet a SHOULD, then a justification MUST be provided; this is an escalation of the spec that will need scrutiny and consensus in each case.
Jonathan
On Wed, Nov 26, 2014 at 4:10 PM, Paul J. Morris mole@morris.net wrote: Bob Morris noted that it may be appropriate for the DarwinCore RDF Guide to follow RFC 2119 (something for which at least the TDWG GUID applicability statement provides precedent).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Internet Engineering Task Force (IETF). RFC 2119. Key words for use in RFCs to Indicate Requirement Levels. http://www.ietf.org/rfc/rfc2119.txt
This RFC notes: "These words are often capitalized" and "Imperatives of the type defined in this memo must be used with care and sparingly. In particular, they MUST only be used where it is actually required for interoperation or to limit behavior which has potential for causing harm". There are 69 occurrences of "should" in the guide, many of these instances appear to have the intent of colloqual english and probably other words should be substituted in those cases.
I've taken a stab here at a set of cases where it feels like it might be appropriate to use the RFC 2119 imperatives.
A question is whether it is the role of the guide to use MUST/MUST NOT/REQURED anywhere (except where that is forced by inheritance from elsewhere)? A potential place for asserting MUST/MUST NOT is the distinction between dwc and dwciri namespaces. I've put more discussion under the headings 2.4.1.2 and 2.5 below.
-Paul
----
1.3.2.1 Persistent Identifiers
s/the provider should take care to ensure/the provider SHOULD take care to ensure/
s/it must be converted to an IRI/it MUST be converted to an IRI/
----
1.3.2.2 HTTP IRIs as self-resolving GUIDs
s/through RDF should plan to implement GUIDs/through RDF MUST plan to implement GUIDs/
For consistency with "Must" in the GUID applicability statement.
----
1.4.1 Well-known vocabularies
s/the provider should assign the term an IRI,/the provider SHOULD assign the term an IRI,/
----
1.4.3 Use of Darwin Core terms in RDF
s/each value should be referenced/each value MUST be referenced/
By the nature of object references.
----
1.4.4 Limitations of this guide
s/Darwin Core property terms should be used as RDF predicates and specifies that Darwin Core class terms should be used in rdf:type/Darwin Core property terms SHOULD be used as RDF predicates and specifies that Darwin Core class terms SHOULD be used in rdf:type/
----
1.5.5 Implications for expressing Darwin Core string values as RDF
s/in which RDF should be structured/in which RDF SHOULD be structured/
----
Example 1:
s/Predicates must be identified by IRIs/Predicates MUST be identified by IRIs/
----
2.2 Subject resources
s/it must be referenced by an IRI/it MUST be referenced by an IRI/
----
2.2.2 Associating a string identifier with a subject resource
s/dcterms:identifier should be used to/dcterms:identifier SHOULD be used to/
s/it is acceptable to present it as a string literal value for dcterms:identifier/it MAY be presented as a string literal value for dcterms:identifier/
----
2.3.1.1 rdf:type statement
s/The class should be identified by an IRI reference/The class MUST be identified by an IRI reference/
----
2.3.1.3 Explicit vs. inferred type declarations
s/data providers should exercise caution in using any such term in a non-standard way/data providers SHOULD NOT use any such term in a non-standard way/
s/the provider should type the resource/the provider MAY type the resource/ ?? Or ?? s/the provider should type the resource/the provider SHOULD type the resource/
----
2.3.1.4 Other predicates used to indicate type
s/in an RDF description should be considered optional, while including rdf:type should be considered highly recommended/in an RDF description is OPTIONAL, while rdf:type SHOULD be included/
s/A dwciri: analogue (Section 2.5) of dwc:basisOfRecord should not be used/A dwciri: analogue (Section 2.5) of dwc:basisOfRecord MUST NOT be used/
----
2.3.1.5 Classes to be used for type declarations of resources described using Darwin Core
s/that should also be used/that SHOULD also be used/
----
Example 9:
s/a provider should include an xml:lang/a provider SHOULD include an xml:lang/
----
Example 10:
I'm not sure about this one:
/language tags should be interpreted by clients
s/may initially choose to expose literals without datatype attributes, they should/MAY initially choose to expose literals without datatype attributes, they SHOULD/
----
2.4.1.2 Terms intended for use with literal objects
I'll put a stake in the ground for discussion here: Is the guide sufficiently normative to assert MUST (other than where that property is inherited from elsewhere)? If so, then the distinction between dwc and dwciri is the place to make that assertion:
s/that terms in the dwc: namespace should be restricted to use with literal objects/that terms in the dwc: namespace MUST be restricted to use with literal objects/
----
2.4.2.1.1 Objects identified by LSIDs
This one needs to be checked for consistency with the LSID Applicability Statements:
s/version of the LSID should be used instead/version of the LSID SHOULD be used instead/
----
Example 17:
s/particular terms should be used with literal objects, or with IRI reference objects/particular terms SHOULD be used with literal objects, or SHOULD be used with IRI reference objects/
----
2.4.3.2 Literal values for non-literal resources in Darwin Core
s/existing Darwin Core term in the dwc: namespace should have the same structure/existing Darwin Core term in the dwc: namespace SHOULD have the same structure/
----
2.5 Terms in the dwciri: namespace
This is the companion case to 2.4.1.2 - is the distinction between dwciri and dwc to be asserted as SHOULD or MUST - using them incorrectly does have the "potential for causing harm" in the language of RFC 2119, but does the guide rise to the level of specifying an "absolute requirement of the specification".
s/IRI reference objects and should NOT be used with literal/ IRI reference objects and MUST NOT be used with literal/
----
2.5.1 Definition of dwciri: terms
s/resource described by a dwciri: property should be the subject of a triple for each value on the list/resource described by a dwciri: property SHOULD be the subject of a triple for each value on the list/
A counter example here comes from the Harvard List of Botanists, where http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/305dcac3-a748-4f47-8a18-3... references the collector team "J. D. Hooker & A. Gray", that is, http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/cc3b7080-5fbb-4ea5-9655-4... and http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/3f8c70aa-1862-4784-8a53-f...
----
2.5.3 Expectation of clients encountering RDF containing dwc: and dwciri: terms
Instances of "should" here again cut to the heart of how strong the guidance is concerning dwc and dwciri.
----
Table 3
s/they should be used rather than using the Darwin Core ID/they SHOULD be used rather than using the Darwin Core ID/
----
Example 22:
s/The following points about the Example 22 should be noted:/Note the following points about Example 22:/
s/related non-literal resources should use/related non-literal resources SHOULD use/
/LSIDs are the objects of triples, they should be
As with 2.4.2.1.1, this guidance needs to be checked against the GUID and LSID applicability statements. I think I'm reading the GUID applicability statement as s/should/MUST/ here.
---- 2.7.1 What purpose do convenience terms serve?
s/In general, it should not be necessary for a data provider/In general, a data provider does not need/
----
2.7.3 Ownership of a collection item
s/of the collection item should be indicated/of the collection item SHOULD be indicated/
----
2.7.4 Description of a taxonomic entity
s/It is considered to be out of the scope of this document to specify how taxon concepts should be rendered as RDF/It is out of the scope of this document to specify how to render taxon concepts as RDF/
s/terms for taxonomic entities should be properties of dwc:Identification/terms for taxonomic entities MAY be properties of dwc:Identification/
s/The task of describing taxonomic entities using RDF must be an effort outside of Darwin Core/The task of describing taxonomic entities using RDF is out of scope of this document/
----
2.8.3 Expressing Darwin Core association terms as RDF with URI references
s/it should be used to declare the type/rdf:type SHOULD be used to declare the type/
----
Tables 3.4 to 3.7.
s/should/SHOULD/
--------
-- Paul J. Morris Biodiversity Informatics Manager Harvard University Herbaria/Museum of Comparative Zoölogy mole@morris.net AA3SD PGP public key available _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Well, this is just my opinion, so take it for what it is. I think specifications are like contracts, and should (a) talk about realizable, detectable states of affairs, (b) assign responsibility, and (c) give relatively unambiguous conformance criteria, at least in their normative passages (informative is a different matter). 2119 language escalates the tone of any document from collegial to adversarial. Syntactic constraints like "MUST be a string literal in ISO8601 format" are actionable, so they're not too bad, but "the provider SHOULD take care to ensure that the URL does not change over time" is extremely vague and subjective and would be very hard to assess in a dispute. Similarly "MUST plan to implement" - how is that assessed? As good-faith advice these are fine; as a term of a contract they're an invitation to a dispute.
I looked through several of the documents you listed and they will be great fodder for the critique of vocabulary specifications I'm working on in my spare time. Generally they support my belief that use of 2119 language in vocabulary specifications is gratuitous, distracting, and often wrong.
Jonathan
On Sun, 30 Nov 2014 07:47:46 -0500 Jonathan A Rees rees@mumble.net wrote:
I'm very skeptical of applying 2119 to vocabulary specifications. I don't think there's any clear agreement on what constitutes conformance to a vocabulary specification
On the level of specifying the meaning of a vocabulary, I think I agree with you. It doesn't feel like specifying that dwc:scientificName MUST/SHOULD/MAY carry something that is actually a scientific name is particularly helpful.
However, the DarwinCore RDF guide feels like an implementer's guide in how to write applications that produce and consume RDF using TDWG DarwinCore vocabulary, a specification that, in order to be understood and not cause problems for consuming applications, producers of DarwinCore in RDF should e.g. assert dwc:recordedBy "Asa Gray" or dwciri:recordedBy http://viaf.org/viaf/7504476 but not dwc:recordedBy http://viaf.org/viaf/7504476 or dwciri:recordedBy "Asa Gray".
http://www.w3.org/TR/EARL10-Guide/ feels particularly relevant - it is a developer's guide that accompanies the EARL vocabulary specification http://www.w3.org/TR/EARL10/. In an earlier draft of EARL, the language of RFC 2119 was explicitly invoked http://www.w3.org/TR/2009/WD-EARL10-Schema-20091029/, in the latest version, the RFC 2119 language was removed from the vocabulary specification and relegated to the develper's guide. This feels very paralell to TDWG DarwinCore and the TDWG DarwinCore RDF guide.
-Paul
On Sun, 30 Nov 2014 07:47:46 -0500 Jonathan A Rees rees@mumble.net wrote:
The examples you give are all over the place: conformance is required sometimes of a document, sometimes a curation process, sometimes a person
I concur. If RFC 2119 language is to be included in the DarwinCore RDF Guide, it need more careful analysis than than the initial set of proposals that I made.
-Paul
As I've been working through implementing RDF generation in a few applications and seeking to conform to the guide, I've found myself spending a good bit of time hunting through the document looking for guidance on particular situations, this leads me to a suggestion for the guide: Include, at the end of the guide, a single comprehensive example of an Occurrence record, annotated to point to relevant sections in the guide. This could serve both to quickly answer questions and as a visual index to the rest of the guide.
-Paul
Paul, That's exciting that you are trying to generate RDF using real data!
I think we initially considered including something in the guide like what you have suggested, but the problem is that what constitutes "an Occurrence record" varies depending on the model one has in mind when serializing the record as RDF. Historically, "occurrences" were considered to be a superclass that included specimens, and any property remotely related to a specimen could be included as part of an occurrence record. A provider exposing an occurrence record might give it properties such as dwc:eventDate, dwc:preparations, and dwc:locality. However, a different provider might consider dwc:eventDate to be the property of a dwc:Event instance, dwc:preparations to be the property of a dwc:PreservedSpecimen, and dwc:locality to be the property of a dcterms:Location instance and link those instances to a separate Occurrence instance via object properties.
Which of these is correct? At this point there is no consensus as to whether one of these approaches is better than the other. We avoided putting extensive examples within the guide document itself, since the guide will become part of the standard and will probably not be changed frequently, whereas best practices for deciding the types of resources with which properties should be associated is likely to develop over time and with the experience of usage. For that reason, we have included examples in the ancillary documents that are associated with the guide, but which do not form part of the standard. The "examples using 'pure' Darwin Core" [1] and "Examples using Darwin-SW object properties" [2] illustrate the extremes that I've described above.
Steve
[1] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfOccurrences [2] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesDarwinSW
Paul J. Morris wrote:
As I've been working through implementing RDF generation in a few applications and seeking to conform to the guide, I've found myself spending a good bit of time hunting through the document looking for guidance on particular situations, this leads me to a suggestion for the guide: Include, at the end of the guide, a single comprehensive example of an Occurrence record, annotated to point to relevant sections in the guide. This could serve both to quickly answer questions and as a visual index to the rest of the guide.
-Paul
Ah, Steve, your examples well illustrate the reason to avoid assigning rdfs:domain, as well as why both are perfectly good illustrations neither of which should be deprecated. Communities of practice can exploit either or both, and the only communities that are nailed are those that labor under an rdfs:domain for such things as dwc:EventDate
Bob
On Fri, Dec 12, 2014 at 10:59 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Paul, That's exciting that you are trying to generate RDF using real data!
I think we initially considered including something in the guide like what you have suggested, but the problem is that what constitutes "an Occurrence record" varies depending on the model one has in mind when serializing the record as RDF. Historically, "occurrences" were considered to be a superclass that included specimens, and any property remotely related to a specimen could be included as part of an occurrence record. A provider exposing an occurrence record might give it properties such as dwc:eventDate, dwc:preparations, and dwc:locality. However, a different provider might consider dwc:eventDate to be the property of a dwc:Event instance, dwc:preparations to be the property of a dwc:PreservedSpecimen, and dwc:locality to be the property of a dcterms:Location instance and link those instances to a separate Occurrence instance via object properties.
Which of these is correct? At this point there is no consensus as to whether one of these approaches is better than the other. We avoided putting extensive examples within the guide document itself, since the guide will become part of the standard and will probably not be changed frequently, whereas best practices for deciding the types of resources with which properties should be associated is likely to develop over time and with the experience of usage. For that reason, we have included examples in the ancillary documents that are associated with the guide, but which do not form part of the standard. The "examples using 'pure' Darwin Core" [1] and "Examples using Darwin-SW object properties" [2] illustrate the extremes that I've described above.
Steve
[1] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfOccurrences [2] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesDarwinSW
Paul J. Morris wrote:
As I've been working through implementing RDF generation in a few applications and seeking to conform to the guide, I've found myself spending a good bit of time hunting through the document looking for guidance on particular situations, this leads me to a suggestion for the guide: Include, at the end of the guide, a single comprehensive example of an Occurrence record, annotated to point to relevant sections in the guide. This could serve both to quickly answer questions and as a visual index to the rest of the guide.
-Paul
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Umm. I don't understand why what you said is relevant. Nobody that I know of has assigned domains to any of the existing Darwin Core terms. If you have Darwin-SW in mind, it only assigns domains to object properties that it mints and I don't see how that would prevent supporting either or both kinds of use. The problem in my mind is figuring out how to do queries that would catch both kinds of uses, e.g.
SELECT ?Occurrence WHERE { ?Occurrence dwc:eventDate "2014-12-13"^^xsd:date. ?Occurrence dwc:locality "Smith Pond". }
which would work for the simple version, but not Darwin-SW. Obviously, one could easily create a more complex query that would work in simple cases like this example, but the complexity would expand greatly if one wanted to require matches with 3 or more patterns. Steve
Bob Morris wrote:
Ah, Steve, your examples well illustrate the reason to avoid assigning rdfs:domain, as well as why both are perfectly good illustrations neither of which should be deprecated. Communities of practice can exploit either or both, and the only communities that are nailed are those that labor under an rdfs:domain for such things as dwc:EventDate
Bob
On Fri, Dec 12, 2014 at 10:59 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Paul, That's exciting that you are trying to generate RDF using real data!
I think we initially considered including something in the guide like what you have suggested, but the problem is that what constitutes "an Occurrence record" varies depending on the model one has in mind when serializing the record as RDF. Historically, "occurrences" were considered to be a superclass that included specimens, and any property remotely related to a specimen could be included as part of an occurrence record. A provider exposing an occurrence record might give it properties such as dwc:eventDate, dwc:preparations, and dwc:locality. However, a different provider might consider dwc:eventDate to be the property of a dwc:Event instance, dwc:preparations to be the property of a dwc:PreservedSpecimen, and dwc:locality to be the property of a dcterms:Location instance and link those instances to a separate Occurrence instance via object properties.
Which of these is correct? At this point there is no consensus as to whether one of these approaches is better than the other. We avoided putting extensive examples within the guide document itself, since the guide will become part of the standard and will probably not be changed frequently, whereas best practices for deciding the types of resources with which properties should be associated is likely to develop over time and with the experience of usage. For that reason, we have included examples in the ancillary documents that are associated with the guide, but which do not form part of the standard. The "examples using 'pure' Darwin Core" [1] and "Examples using Darwin-SW object properties" [2] illustrate the extremes that I've described above.
Steve
[1] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfOccurrences [2] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesDarwinSW
Paul J. Morris wrote:
As I've been working through implementing RDF generation in a few applications and seeking to conform to the guide, I've found myself spending a good bit of time hunting through the document looking for guidance on particular situations, this leads me to a suggestion for the guide: Include, at the end of the guide, a single comprehensive example of an Occurrence record, annotated to point to relevant sections in the guide. This could serve both to quickly answer questions and as a visual index to the rest of the guide.
-Paul
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
At some point we should consider integrating examples in the Darwin Core repository (it's new home) on Github (https://github.com/tdwg/dwc). If you agree, we should use the new reference in the RDF Guide. I have created an issue for this (https://github.com/tdwg/dwc/issues/52).
On Sat, Dec 13, 2014 at 1:49 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu
wrote:
Umm. I don't understand why what you said is relevant. Nobody that I know of has assigned domains to any of the existing Darwin Core terms. If you have Darwin-SW in mind, it only assigns domains to object properties that it mints and I don't see how that would prevent supporting either or both kinds of use. The problem in my mind is figuring out how to do queries that would catch both kinds of uses, e.g.
SELECT ?Occurrence WHERE { ?Occurrence dwc:eventDate "2014-12-13"^^xsd:date. ?Occurrence dwc:locality "Smith Pond". }
which would work for the simple version, but not Darwin-SW. Obviously, one could easily create a more complex query that would work in simple cases like this example, but the complexity would expand greatly if one wanted to require matches with 3 or more patterns. Steve
Bob Morris wrote:
Ah, Steve, your examples well illustrate the reason to avoid assigning rdfs:domain, as well as why both are perfectly good illustrations neither of which should be deprecated. Communities of practice can exploit either or both, and the only communities that are nailed are those that labor under an rdfs:domain for such things as dwc:EventDate
Bob
On Fri, Dec 12, 2014 at 10:59 PM, Steve Baskaufsteve.baskauf@vanderbilt.edu steve.baskauf@vanderbilt.edu wrote:
Paul, That's exciting that you are trying to generate RDF using real data!
I think we initially considered including something in the guide like what you have suggested, but the problem is that what constitutes "an Occurrence record" varies depending on the model one has in mind when serializing the record as RDF. Historically, "occurrences" were considered to be a superclass that included specimens, and any property remotely related to a specimen could be included as part of an occurrence record. A provider exposing an occurrence record might give it properties such as dwc:eventDate, dwc:preparations, and dwc:locality. However, a different provider might consider dwc:eventDate to be the property of a dwc:Event instance, dwc:preparations to be the property of a dwc:PreservedSpecimen, and dwc:locality to be the property of a dcterms:Location instance and link those instances to a separate Occurrence instance via object properties.
Which of these is correct? At this point there is no consensus as to whether one of these approaches is better than the other. We avoided putting extensive examples within the guide document itself, since the guide will become part of the standard and will probably not be changed frequently, whereas best practices for deciding the types of resources with which properties should be associated is likely to develop over time and with the experience of usage. For that reason, we have included examples in the ancillary documents that are associated with the guide, but which do not form part of the standard. The "examples using 'pure' Darwin Core" [1] and "Examples using Darwin-SW object properties" [2] illustrate the extremes that I've described above.
Steve
[1] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfOccurrences [2] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesDarwinSW
Paul J. Morris wrote:
As I've been working through implementing RDF generation in a few applications and seeking to conform to the guide, I've found myself spending a good bit of time hunting through the document looking for guidance on particular situations, this leads me to a suggestion for the guide: Include, at the end of the guide, a single comprehensive example of an Occurrence record, annotated to point to relevant sections in the guide. This could serve both to quickly answer questions and as a visual index to the rest of the guide.
-Paul
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it.http://bioimages.vanderbilt.eduhttp://vanderbilt.edu/trees
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it.http://bioimages.vanderbilt.eduhttp://vanderbilt.edu/trees
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
If the Darwin Core repository is now at Github, then I think that would probably be the best home for all of the ancillary documents, including examples. Their current location on the RDF Task Group's site is only a temporary place for them. The other alternative is the TDWG website, but we have yet to see whether it will become functional again or not.
Perhaps a good strategy would be to create a stable landing page for all of the ancillary RDF Guide documents at Github. Then references in the guide can point to that page, rather than to a number of individual pages whose URLs might be more likely to change. Does that sound like a good idea?
Steve
John Wieczorek wrote:
At some point we should consider integrating examples in the Darwin Core repository (it's new home) on Github (https://github.com/tdwg/dwc). If you agree, we should use the new reference in the RDF Guide. I have created an issue for this (https://github.com/tdwg/dwc/issues/52).
On Sat, Dec 13, 2014 at 1:49 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu mailto:steve.baskauf@vanderbilt.edu> wrote:
Umm. I don't understand why what you said is relevant. Nobody that I know of has assigned domains to any of the existing Darwin Core terms. If you have Darwin-SW in mind, it only assigns domains to object properties that it mints and I don't see how that would prevent supporting either or both kinds of use. The problem in my mind is figuring out how to do queries that would catch both kinds of uses, e.g. SELECT ?Occurrence WHERE { ?Occurrence dwc:eventDate "2014-12-13"^^xsd:date. ?Occurrence dwc:locality "Smith Pond". } which would work for the simple version, but not Darwin-SW. Obviously, one could easily create a more complex query that would work in simple cases like this example, but the complexity would expand greatly if one wanted to require matches with 3 or more patterns. Steve Bob Morris wrote:
Ah, Steve, your examples well illustrate the reason to avoid assigning rdfs:domain, as well as why both are perfectly good illustrations neither of which should be deprecated. Communities of practice can exploit either or both, and the only communities that are nailed are those that labor under an rdfs:domain for such things as dwc:EventDate Bob On Fri, Dec 12, 2014 at 10:59 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu> <mailto:steve.baskauf@vanderbilt.edu> wrote:
Paul, That's exciting that you are trying to generate RDF using real data! I think we initially considered including something in the guide like what you have suggested, but the problem is that what constitutes "an Occurrence record" varies depending on the model one has in mind when serializing the record as RDF. Historically, "occurrences" were considered to be a superclass that included specimens, and any property remotely related to a specimen could be included as part of an occurrence record. A provider exposing an occurrence record might give it properties such as dwc:eventDate, dwc:preparations, and dwc:locality. However, a different provider might consider dwc:eventDate to be the property of a dwc:Event instance, dwc:preparations to be the property of a dwc:PreservedSpecimen, and dwc:locality to be the property of a dcterms:Location instance and link those instances to a separate Occurrence instance via object properties. Which of these is correct? At this point there is no consensus as to whether one of these approaches is better than the other. We avoided putting extensive examples within the guide document itself, since the guide will become part of the standard and will probably not be changed frequently, whereas best practices for deciding the types of resources with which properties should be associated is likely to develop over time and with the experience of usage. For that reason, we have included examples in the ancillary documents that are associated with the guide, but which do not form part of the standard. The "examples using 'pure' Darwin Core" [1] and "Examples using Darwin-SW object properties" [2] illustrate the extremes that I've described above. Steve [1] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfOccurrences [2] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesDarwinSW Paul J. Morris wrote:
As I've been working through implementing RDF generation in a few applications and seeking to conform to the guide, I've found myself spending a good bit of time hunting through the document looking for guidance on particular situations, this leads me to a suggestion for the guide: Include, at the end of the guide, a single comprehensive example of an Occurrence record, annotated to point to relevant sections in the guide. This could serve both to quickly answer questions and as a visual index to the rest of the guide. -Paul
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 322-4942 <tel:%28615%29%20322-4942> If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 322-4942 <tel:%28615%29%20322-4942> If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
Yes it sounds like a good idea, especially if it is convenient and stable for examples to point back to fragments in the Guide. When I read examples of something compliant to a spec or guidance, my most frequent head scratching starts in the example at a point where I ask myself "Why the heck are they doing it that way?" That's when I have to go back to the authority document, praying that I've landed in the right place.
On Sat, Dec 13, 2014 at 9:42 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
If the Darwin Core repository is now at Github, then I think that would probably be the best home for all of the ancillary documents, including examples. Their current location on the RDF Task Group's site is only a temporary place for them. The other alternative is the TDWG website, but we have yet to see whether it will become functional again or not.
Perhaps a good strategy would be to create a stable landing page for all of the ancillary RDF Guide documents at Github. Then references in the guide can point to that page, rather than to a number of individual pages whose URLs might be more likely to change. Does that sound like a good idea?
Steve
John Wieczorek wrote:
At some point we should consider integrating examples in the Darwin Core repository (it's new home) on Github (https://github.com/tdwg/dwc). If you agree, we should use the new reference in the RDF Guide. I have created an issue for this (https://github.com/tdwg/dwc/issues/52).
On Sat, Dec 13, 2014 at 1:49 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Umm. I don't understand why what you said is relevant. Nobody that I know of has assigned domains to any of the existing Darwin Core terms. If you have Darwin-SW in mind, it only assigns domains to object properties that it mints and I don't see how that would prevent supporting either or both kinds of use. The problem in my mind is figuring out how to do queries that would catch both kinds of uses, e.g.
SELECT ?Occurrence WHERE { ?Occurrence dwc:eventDate "2014-12-13"^^xsd:date. ?Occurrence dwc:locality "Smith Pond". }
which would work for the simple version, but not Darwin-SW. Obviously, one could easily create a more complex query that would work in simple cases like this example, but the complexity would expand greatly if one wanted to require matches with 3 or more patterns. Steve
Bob Morris wrote:
Ah, Steve, your examples well illustrate the reason to avoid assigning rdfs:domain, as well as why both are perfectly good illustrations neither of which should be deprecated. Communities of practice can exploit either or both, and the only communities that are nailed are those that labor under an rdfs:domain for such things as dwc:EventDate
Bob
On Fri, Dec 12, 2014 at 10:59 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Paul, That's exciting that you are trying to generate RDF using real data!
I think we initially considered including something in the guide like what you have suggested, but the problem is that what constitutes "an Occurrence record" varies depending on the model one has in mind when serializing the record as RDF. Historically, "occurrences" were considered to be a superclass that included specimens, and any property remotely related to a specimen could be included as part of an occurrence record. A provider exposing an occurrence record might give it properties such as dwc:eventDate, dwc:preparations, and dwc:locality. However, a different provider might consider dwc:eventDate to be the property of a dwc:Event instance, dwc:preparations to be the property of a dwc:PreservedSpecimen, and dwc:locality to be the property of a dcterms:Location instance and link those instances to a separate Occurrence instance via object properties.
Which of these is correct? At this point there is no consensus as to whether one of these approaches is better than the other. We avoided putting extensive examples within the guide document itself, since the guide will become part of the standard and will probably not be changed frequently, whereas best practices for deciding the types of resources with which properties should be associated is likely to develop over time and with the experience of usage. For that reason, we have included examples in the ancillary documents that are associated with the guide, but which do not form part of the standard. The "examples using 'pure' Darwin Core" [1] and "Examples using Darwin-SW object properties" [2] illustrate the extremes that I've described above.
Steve
[1] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfOccurrences [2] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesDarwinSW
Paul J. Morris wrote:
As I've been working through implementing RDF generation in a few applications and seeking to conform to the guide, I've found myself spending a good bit of time hunting through the document looking for guidance on particular situations, this leads me to a suggestion for the guide: Include, at the end of the guide, a single comprehensive example of an Occurrence record, annotated to point to relevant sections in the guide. This could serve both to quickly answer questions and as a visual index to the rest of the guide.
-Paul
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees
Sorry if what I wrote was either irrelevant or unclear, or both. My intent was to say this: You gave examples where there have been ongoing arguments in favor of two answers to a question of the form "to what class should a predicate P be applied." But in my view there is often no reason to settle that question in a guidance document or spec, perhaps unless the intent is to settle the arguments. To me, your discussion of dwc:EventDate seems like an example of such, at least because there have been arguments on the table that have not(?) been successfully contradicted. The point I too cryptically mean to raise is: it is a common problem that such arguments are, in the case of rdfs reasoning, often settled de-facto when ontologies impose rdfs:domain. Settling an extensive argument in an ontology should not be taken lightly, if for no other reason than that forestalling arguments is one of the main accomplishments of a well-crafted ontology.
Hope that is clearer. If so, it doesn't bother me if it is declared irrelevant. Bob
On Sat, Dec 13, 2014 at 7:49 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Umm. I don't understand why what you said is relevant. Nobody that I know of has assigned domains to any of the existing Darwin Core terms. If you have Darwin-SW in mind, it only assigns domains to object properties that it mints and I don't see how that would prevent supporting either or both kinds of use. The problem in my mind is figuring out how to do queries that would catch both kinds of uses, e.g.
SELECT ?Occurrence WHERE { ?Occurrence dwc:eventDate "2014-12-13"^^xsd:date. ?Occurrence dwc:locality "Smith Pond". }
which would work for the simple version, but not Darwin-SW. Obviously, one could easily create a more complex query that would work in simple cases like this example, but the complexity would expand greatly if one wanted to require matches with 3 or more patterns. Steve
Bob Morris wrote:
Ah, Steve, your examples well illustrate the reason to avoid assigning rdfs:domain, as well as why both are perfectly good illustrations neither of which should be deprecated. Communities of practice can exploit either or both, and the only communities that are nailed are those that labor under an rdfs:domain for such things as dwc:EventDate
Bob
On Fri, Dec 12, 2014 at 10:59 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Paul, That's exciting that you are trying to generate RDF using real data!
I think we initially considered including something in the guide like what you have suggested, but the problem is that what constitutes "an Occurrence record" varies depending on the model one has in mind when serializing the record as RDF. Historically, "occurrences" were considered to be a superclass that included specimens, and any property remotely related to a specimen could be included as part of an occurrence record. A provider exposing an occurrence record might give it properties such as dwc:eventDate, dwc:preparations, and dwc:locality. However, a different provider might consider dwc:eventDate to be the property of a dwc:Event instance, dwc:preparations to be the property of a dwc:PreservedSpecimen, and dwc:locality to be the property of a dcterms:Location instance and link those instances to a separate Occurrence instance via object properties.
Which of these is correct? At this point there is no consensus as to whether one of these approaches is better than the other. We avoided putting extensive examples within the guide document itself, since the guide will become part of the standard and will probably not be changed frequently, whereas best practices for deciding the types of resources with which properties should be associated is likely to develop over time and with the experience of usage. For that reason, we have included examples in the ancillary documents that are associated with the guide, but which do not form part of the standard. The "examples using 'pure' Darwin Core" [1] and "Examples using Darwin-SW object properties" [2] illustrate the extremes that I've described above.
Steve
[1] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfOccurrences [2] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesDarwinSW
Paul J. Morris wrote:
As I've been working through implementing RDF generation in a few applications and seeking to conform to the guide, I've found myself spending a good bit of time hunting through the document looking for guidance on particular situations, this leads me to a suggestion for the guide: Include, at the end of the guide, a single comprehensive example of an Occurrence record, annotated to point to relevant sections in the guide. This could serve both to quickly answer questions and as a visual index to the rest of the guide.
-Paul
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees
OK, I think I get what you are saying.
Just to clarify, in general, the basic Darwin Core vocabulary includes very few machine-interpretable semantics (domain, range, subclass, subproperty, sameAs). I think that the subproperty declarations for the "ID" terms are just about all that is left. The RDF Guide similarly remains silent on these kinds of relationships, except where they exist in "adopted" terms from Dublin Core. The guide really just attempts to frame known best-practices in terms of how they would apply to the DwC vocabulary, and to deal with the quirks that have been introduced by DwC originally being designed to facilitate spreadsheets and flat database tables.
My feeling was that there was a consensus that once the basic layer (the vocabulary itself) was cleaned up with clearer definitions and the guide (a second layer) was in place to apply well-known best practices to that layer, there would be a subsequent attempt to determine how these two "layers" could be used by higher level layers (i.e. ontologies) to generate actual usable RDF graphs. This determination would be based primarily on use cases and how successful various possible approaches would be at facilitating those use cases. So the question of whether there are valid reasons why ontologies should impose rdfs:domain properties should be part of that next discussion, and on the level of the RDF Task Group rather than the whole tdwg-content list. We already have this as an open issue in the RDF Task Group issue tracker: https://code.google.com/p/tdwg-rdf/issues/detail?id=10
So I think this is an important issue, but I think it's veering out of scope for the discussion about the guide. Returning to the original question of examples, my feeling is that it just isn't practical to include what are likely to be highly mutable examples in what's going to become a non-normative part of the standard. As you suggested in a later email, I think having back links from the examples on GitHub to particular sections of the guide would be the way to go.
Steve
Bob Morris wrote:
Sorry if what I wrote was either irrelevant or unclear, or both. My intent was to say this: You gave examples where there have been ongoing arguments in favor of two answers to a question of the form "to what class should a predicate P be applied." But in my view there is often no reason to settle that question in a guidance document or spec, perhaps unless the intent is to settle the arguments. To me, your discussion of dwc:EventDate seems like an example of such, at least because there have been arguments on the table that have not(?) been successfully contradicted. The point I too cryptically mean to raise is: it is a common problem that such arguments are, in the case of rdfs reasoning, often settled de-facto when ontologies impose rdfs:domain. Settling an extensive argument in an ontology should not be taken lightly, if for no other reason than that forestalling arguments is one of the main accomplishments of a well-crafted ontology.
Hope that is clearer. If so, it doesn't bother me if it is declared irrelevant. Bob
On Sat, Dec 13, 2014 at 7:49 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Umm. I don't understand why what you said is relevant. Nobody that I know of has assigned domains to any of the existing Darwin Core terms. If you have Darwin-SW in mind, it only assigns domains to object properties that it mints and I don't see how that would prevent supporting either or both kinds of use. The problem in my mind is figuring out how to do queries that would catch both kinds of uses, e.g.
SELECT ?Occurrence WHERE { ?Occurrence dwc:eventDate "2014-12-13"^^xsd:date. ?Occurrence dwc:locality "Smith Pond". }
which would work for the simple version, but not Darwin-SW. Obviously, one could easily create a more complex query that would work in simple cases like this example, but the complexity would expand greatly if one wanted to require matches with 3 or more patterns. Steve
Bob Morris wrote:
Ah, Steve, your examples well illustrate the reason to avoid assigning rdfs:domain, as well as why both are perfectly good illustrations neither of which should be deprecated. Communities of practice can exploit either or both, and the only communities that are nailed are those that labor under an rdfs:domain for such things as dwc:EventDate
Bob
On Fri, Dec 12, 2014 at 10:59 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Paul, That's exciting that you are trying to generate RDF using real data!
I think we initially considered including something in the guide like what you have suggested, but the problem is that what constitutes "an Occurrence record" varies depending on the model one has in mind when serializing the record as RDF. Historically, "occurrences" were considered to be a superclass that included specimens, and any property remotely related to a specimen could be included as part of an occurrence record. A provider exposing an occurrence record might give it properties such as dwc:eventDate, dwc:preparations, and dwc:locality. However, a different provider might consider dwc:eventDate to be the property of a dwc:Event instance, dwc:preparations to be the property of a dwc:PreservedSpecimen, and dwc:locality to be the property of a dcterms:Location instance and link those instances to a separate Occurrence instance via object properties.
Which of these is correct? At this point there is no consensus as to whether one of these approaches is better than the other. We avoided putting extensive examples within the guide document itself, since the guide will become part of the standard and will probably not be changed frequently, whereas best practices for deciding the types of resources with which properties should be associated is likely to develop over time and with the experience of usage. For that reason, we have included examples in the ancillary documents that are associated with the guide, but which do not form part of the standard. The "examples using 'pure' Darwin Core" [1] and "Examples using Darwin-SW object properties" [2] illustrate the extremes that I've described above.
Steve
[1] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfOccurrences [2] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesDarwinSW
Paul J. Morris wrote:
As I've been working through implementing RDF generation in a few applications and seeking to conform to the guide, I've found myself spending a good bit of time hunting through the document looking for guidance on particular situations, this leads me to a suggestion for the guide: Include, at the end of the guide, a single comprehensive example of an Occurrence record, annotated to point to relevant sections in the guide. This could serve both to quickly answer questions and as a visual index to the rest of the guide.
-Paul
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees
On Sat, 13 Dec 2014 11:52:39 -0600 Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
As you suggested in a later email, I think having back links from the examples on GitHub to particular sections of the guide would be the way to go.
I like this idea.
-Paul
I had intended to make a comment on this topic earlier, but got too busy. I wanted to provide an example where use of this kind of language might be good, but for which it is not clear to me which of the key words would be appropriate.
The section of the guide that discusses typed literals [1] notes that a literal has a different meaning depending on whether it has a datatype attribute or not. The literal:
"35.85"^^xsd:decimal
is a decimal number. The literal:
"35.85"
has the implied datatype attribute /xsd:string/. Thus a client that is capable of semantic processing of ingested literals would not interpret these two literals as being the same thing. One is an abstract mathematical entity and the other is a string of characters.
So should the guidelines say something like "literals MUST be typed so that clients can properly interpret them"? That might be desirable if we assume that most clients will be doing serious semantic processing and will depend on receiving accurate information from providers. However, many providers probably won't (at least initially) bother to figure out what kind of datatype to assign to their literals; they will probably just spew out whatever strings are in their existing database and we have no way to enforce that they MUST do it. So should we just say "literals SHOULD be typed so that clients can properly interpret them"? On the other hand, we may be keen to get triples exposed as soon as possible and figure that it is the consuming client's job to figure out what literals "mean", rather than placing that burden on the provider. The clients will probably have to do that anyway since we have no way to enforce that providers include datatypes. So should the spec say "literals MAY be typed so that clients can properly interpret them"? Or maybe we could assume that some kind of cleanup would be done on triples from the wild before they are placed into some kind of community triplestore? In that case, we don't really care that much and could say that datatypes are OPTIONAL.
My point is that until we start seeing what the use cases are and how RDF ends up getting implemented in real life, it would be difficult to know which of the keywords should be used in the guide. I certainly would not know which keywords I should use in many cases.
Steve
[1] https://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposalRevised#2.4.1.1_T...
Paul J. Morris wrote:
Bob Morris noted that it may be appropriate for the DarwinCore RDF Guide to follow RFC 2119 (something for which at least the TDWG GUID applicability statement provides precedent).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Internet Engineering Task Force (IETF). RFC 2119. Key words for use in RFCs to Indicate Requirement Levels. http://www.ietf.org/rfc/rfc2119.txt
This RFC notes: "These words are often capitalized" and "Imperatives of the type defined in this memo must be used with care and sparingly. In particular, they MUST only be used where it is actually required for interoperation or to limit behavior which has potential for causing harm". There are 69 occurrences of "should" in the guide, many of these instances appear to have the intent of colloqual english and probably other words should be substituted in those cases.
I've taken a stab here at a set of cases where it feels like it might be appropriate to use the RFC 2119 imperatives.
A question is whether it is the role of the guide to use MUST/MUST NOT/REQURED anywhere (except where that is forced by inheritance from elsewhere)? A potential place for asserting MUST/MUST NOT is the distinction between dwc and dwciri namespaces. I've put more discussion under the headings 2.4.1.2 and 2.5 below.
-Paul
1.3.2.1 Persistent Identifiers
s/the provider should take care to ensure/the provider SHOULD take care to ensure/
s/it must be converted to an IRI/it MUST be converted to an IRI/
1.3.2.2 HTTP IRIs as self-resolving GUIDs
s/through RDF should plan to implement GUIDs/through RDF MUST plan to implement GUIDs/
For consistency with "Must" in the GUID applicability statement.
1.4.1 Well-known vocabularies
s/the provider should assign the term an IRI,/the provider SHOULD assign the term an IRI,/
1.4.3 Use of Darwin Core terms in RDF
s/each value should be referenced/each value MUST be referenced/
By the nature of object references.
1.4.4 Limitations of this guide
s/Darwin Core property terms should be used as RDF predicates and specifies that Darwin Core class terms should be used in rdf:type/Darwin Core property terms SHOULD be used as RDF predicates and specifies that Darwin Core class terms SHOULD be used in rdf:type/
1.5.5 Implications for expressing Darwin Core string values as RDF
s/in which RDF should be structured/in which RDF SHOULD be structured/
Example 1:
s/Predicates must be identified by IRIs/Predicates MUST be identified by IRIs/
2.2 Subject resources
s/it must be referenced by an IRI/it MUST be referenced by an IRI/
2.2.2 Associating a string identifier with a subject resource
s/dcterms:identifier should be used to/dcterms:identifier SHOULD be used to/
s/it is acceptable to present it as a string literal value for dcterms:identifier/it MAY be presented as a string literal value for dcterms:identifier/
2.3.1.1 rdf:type statement
s/The class should be identified by an IRI reference/The class MUST be identified by an IRI reference/
2.3.1.3 Explicit vs. inferred type declarations
s/data providers should exercise caution in using any such term in a non-standard way/data providers SHOULD NOT use any such term in a non-standard way/
s/the provider should type the resource/the provider MAY type the resource/ ?? Or ?? s/the provider should type the resource/the provider SHOULD type the resource/
2.3.1.4 Other predicates used to indicate type
s/in an RDF description should be considered optional, while including rdf:type should be considered highly recommended/in an RDF description is OPTIONAL, while rdf:type SHOULD be included/
s/A dwciri: analogue (Section 2.5) of dwc:basisOfRecord should not be used/A dwciri: analogue (Section 2.5) of dwc:basisOfRecord MUST NOT be used/
2.3.1.5 Classes to be used for type declarations of resources described using Darwin Core
s/that should also be used/that SHOULD also be used/
Example 9:
s/a provider should include an xml:lang/a provider SHOULD include an xml:lang/
Example 10:
I'm not sure about this one:
/language tags should be interpreted by clients
s/may initially choose to expose literals without datatype attributes, they should/MAY initially choose to expose literals without datatype attributes, they SHOULD/
2.4.1.2 Terms intended for use with literal objects
I'll put a stake in the ground for discussion here: Is the guide sufficiently normative to assert MUST (other than where that property is inherited from elsewhere)? If so, then the distinction between dwc and dwciri is the place to make that assertion:
s/that terms in the dwc: namespace should be restricted to use with literal objects/that terms in the dwc: namespace MUST be restricted to use with literal objects/
2.4.2.1.1 Objects identified by LSIDs
This one needs to be checked for consistency with the LSID Applicability Statements:
s/version of the LSID should be used instead/version of the LSID SHOULD be used instead/
Example 17:
s/particular terms should be used with literal objects, or with IRI reference objects/particular terms SHOULD be used with literal objects, or SHOULD be used with IRI reference objects/
2.4.3.2 Literal values for non-literal resources in Darwin Core
s/existing Darwin Core term in the dwc: namespace should have the same structure/existing Darwin Core term in the dwc: namespace SHOULD have the same structure/
2.5 Terms in the dwciri: namespace
This is the companion case to 2.4.1.2 - is the distinction between dwciri and dwc to be asserted as SHOULD or MUST - using them incorrectly does have the "potential for causing harm" in the language of RFC 2119, but does the guide rise to the level of specifying an "absolute requirement of the specification".
s/IRI reference objects and should NOT be used with literal/ IRI reference objects and MUST NOT be used with literal/
2.5.1 Definition of dwciri: terms
s/resource described by a dwciri: property should be the subject of a triple for each value on the list/resource described by a dwciri: property SHOULD be the subject of a triple for each value on the list/
A counter example here comes from the Harvard List of Botanists, where http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/305dcac3-a748-4f47-8a18-3... references the collector team "J. D. Hooker & A. Gray", that is, http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/cc3b7080-5fbb-4ea5-9655-4... and http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/3f8c70aa-1862-4784-8a53-f...
2.5.3 Expectation of clients encountering RDF containing dwc: and dwciri: terms
Instances of "should" here again cut to the heart of how strong the guidance is concerning dwc and dwciri.
Table 3
s/they should be used rather than using the Darwin Core ID/they SHOULD be used rather than using the Darwin Core ID/
Example 22:
s/The following points about the Example 22 should be noted:/Note the following points about Example 22:/
s/related non-literal resources should use/related non-literal resources SHOULD use/
/LSIDs are the objects of triples, they should be
As with 2.4.2.1.1, this guidance needs to be checked against the GUID and LSID applicability statements. I think I'm reading the GUID applicability statement as s/should/MUST/ here.
2.7.1 What purpose do convenience terms serve?
s/In general, it should not be necessary for a data provider/In general, a data provider does not need/
2.7.3 Ownership of a collection item
s/of the collection item should be indicated/of the collection item SHOULD be indicated/
2.7.4 Description of a taxonomic entity
s/It is considered to be out of the scope of this document to specify how taxon concepts should be rendered as RDF/It is out of the scope of this document to specify how to render taxon concepts as RDF/
s/terms for taxonomic entities should be properties of dwc:Identification/terms for taxonomic entities MAY be properties of dwc:Identification/
s/The task of describing taxonomic entities using RDF must be an effort outside of Darwin Core/The task of describing taxonomic entities using RDF is out of scope of this document/
2.8.3 Expressing Darwin Core association terms as RDF with URI references
s/it should be used to declare the type/rdf:type SHOULD be used to declare the type/
Tables 3.4 to 3.7.
s/should/SHOULD/
On Sun, 14 Dec 2014 15:51:45 -0600 Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
My point is that until we start seeing what the use cases are and how RDF ends up getting implemented in real life, it would be difficult to know which of the keywords should be used in the guide. I certainly would not know which keywords I should use in many cases.
This fits with Jonathan Rees' comment, and RFC 2119 may be a poor fit for this document.
-Paul
participants (5)
-
Bob Morris
-
John Wieczorek
-
Jonathan A Rees
-
Paul J. Morris
-
Steve Baskauf