Re: [tdwg-content] RFC 2119: Re: Public comment on the Darwin Core RDF Guide

14 Dec 2014

      I had intended to make a comment on this topic earlier, but got too 
busy.  I wanted to provide an example where use of this kind of language 
might be good, but for which it is not clear to me which of the key 
words would be appropriate.

The section of the guide that discusses typed literals [1] notes that a 
literal has a different meaning depending on whether it has a datatype 
attribute or not.  The literal:

"35.85"^^xsd:decimal

is a decimal number.  The literal:

"35.85"

has the implied datatype attribute /xsd:string/.  Thus a client that is 
capable of semantic processing of ingested literals would not interpret 
these two literals as being the same thing.  One is an abstract 
mathematical entity and the other is a string of characters.

So should the guidelines say something like "literals MUST be typed so 
that clients can properly interpret them"?  That might be desirable if 
we assume that most clients will be doing serious semantic processing 
and will depend on receiving accurate information from providers.  
However, many providers probably won't (at least initially) bother to 
figure out what kind of datatype to assign to their literals; they will 
probably just spew out whatever strings are in their existing database 
and we have no way to enforce that they MUST do it.  So should we just 
say "literals SHOULD be typed so that clients can properly interpret 
them"?  On the other hand, we may be keen to get triples exposed as soon 
as possible and figure that it is the consuming client's job to figure 
out what literals "mean", rather than placing that burden on the 
provider.  The clients will probably have to do that anyway since we 
have no way to enforce that providers include datatypes.  So should the 
spec say "literals MAY be typed so that clients can properly interpret 
them"?  Or maybe we could assume that some kind of cleanup would be done 
on triples from the wild before they are placed into some kind of 
community triplestore?  In that case, we don't really care that much and 
could say that datatypes are OPTIONAL.

My point is that until we start seeing what the use cases are and how 
RDF ends up getting implemented in real life, it would be difficult to 
know which of the keywords should be used in the guide.  I certainly 
would not know which keywords I should use in many cases.

Steve

[1] 
https://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposalRevised#2.4.1.1_T...

Paul J. Morris wrote:
...
Bob Morris noted that it may be appropriate for the DarwinCore RDF
Guide to follow RFC 2119 (something for which at least the TDWG GUID
applicability statement provides precedent).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
Internet Engineering Task Force (IETF). RFC 2119. Key words for use in
RFCs to Indicate Requirement Levels. http://www.ietf.org/rfc/rfc2119.txt
This RFC notes: "These words are often capitalized" and "Imperatives of
the type defined in this memo must be used with care and sparingly.  In
particular, they MUST only be used where it is actually required for
interoperation or to limit behavior which has potential for causing
harm".   There are 69 occurrences of "should" in the guide, many of
these instances appear to have the intent of colloqual english and
probably other words should be substituted in those cases.
I've taken a stab here at a set of cases where it feels like it might
be appropriate to use the RFC 2119 imperatives.
A question is whether it is the role of the guide to use MUST/MUST
NOT/REQURED anywhere (except where that is forced by inheritance from
elsewhere)?  A potential place for asserting MUST/MUST NOT is the
distinction between dwc and dwciri namespaces.  I've put more
discussion under the headings 2.4.1.2 and 2.5 below.
-Paul
----
1.3.2.1 Persistent Identifiers
s/the provider should take care to ensure/the provider SHOULD take care
to ensure/
s/it must be converted to an IRI/it MUST be converted to an IRI/
----
1.3.2.2 HTTP IRIs as self-resolving GUIDs
s/through RDF should plan to implement GUIDs/through RDF MUST plan to
implement GUIDs/
For consistency with "Must" in the GUID applicability statement.
----
1.4.1 Well-known vocabularies
s/the provider should assign the term an IRI,/the provider SHOULD
assign the term an IRI,/
----
1.4.3 Use of Darwin Core terms in RDF
s/each value should be referenced/each value MUST be referenced/
By the nature of object references.
----
1.4.4 Limitations of this guide
s/Darwin Core property terms should be used as RDF predicates and
specifies that Darwin Core class terms should be used in
rdf:type/Darwin Core property terms SHOULD be used as RDF predicates
and specifies that Darwin Core class terms SHOULD be used in rdf:type/
----
1.5.5 Implications for expressing Darwin Core string values as RDF
s/in which RDF should be structured/in which RDF SHOULD be structured/
----
Example 1:
s/Predicates must be identified by IRIs/Predicates MUST be identified
by IRIs/
----
2.2 Subject resources
s/it must be referenced by an IRI/it MUST be referenced by an IRI/
----
2.2.2 Associating a string identifier with a subject resource
s/dcterms:identifier should be used to/dcterms:identifier SHOULD be
used to/
s/it is acceptable to present it as a string literal value for
dcterms:identifier/it MAY be presented as a string literal value for
dcterms:identifier/
----
2.3.1.1 rdf:type statement
s/The class should be identified by an IRI reference/The class MUST be
identified by an IRI reference/
----
2.3.1.3 Explicit vs. inferred type declarations
s/data providers should exercise caution in using any such term in a
non-standard way/data providers SHOULD NOT use any such term in a
non-standard way/
s/the provider should type the resource/the provider MAY type the
resource/ ?? Or ??
s/the provider should type the resource/the provider SHOULD type the
resource/
----
2.3.1.4 Other predicates used to indicate type
s/in an RDF description should be considered optional, while including
rdf:type should be considered highly recommended/in an RDF description
is OPTIONAL, while rdf:type SHOULD be included/
s/A dwciri: analogue (Section 2.5) of dwc:basisOfRecord should not be
used/A dwciri: analogue (Section 2.5) of dwc:basisOfRecord MUST NOT be
used/
----
2.3.1.5 Classes to be used for type declarations of resources described
using Darwin Core
s/that should also be used/that SHOULD also be used/
----
Example 9:
s/a provider should include an xml:lang/a provider SHOULD include an
xml:lang/
----
Example 10:
I'm not sure about this one:
/language tags should be interpreted by clients
s/may initially choose to expose literals without datatype attributes,
they should/MAY initially choose to expose literals without datatype
attributes, they SHOULD/
----
2.4.1.2 Terms intended for use with literal objects
I'll put a stake in the ground for discussion here:  Is the guide
sufficiently normative to assert MUST (other than where that property
is inherited from elsewhere)?  If so, then the distinction between dwc
and dwciri is the place to make that assertion:
s/that terms in the dwc: namespace should be restricted to use with
literal objects/that terms in the dwc: namespace MUST be restricted to
use with literal objects/
----
2.4.2.1.1 Objects identified by LSIDs
This one needs to be checked for consistency with the LSID
Applicability Statements:
s/version of the LSID should be used instead/version of the LSID SHOULD
be used instead/
----
Example 17:
s/particular terms should be used with literal objects, or with IRI
reference objects/particular terms SHOULD be used with literal objects,
or SHOULD be used with IRI reference objects/
----
2.4.3.2 Literal values for non-literal resources in Darwin Core
s/existing Darwin Core term in the dwc: namespace should have the same
structure/existing Darwin Core term in the dwc: namespace SHOULD have
the same structure/
----
2.5 Terms in the dwciri: namespace
This is the companion case to 2.4.1.2 - is the distinction between
dwciri and dwc to be asserted as SHOULD or MUST - using them
incorrectly does have the "potential for causing harm" in the language
of RFC 2119, but does the guide rise to the level of specifying an
"absolute requirement of the specification".
s/IRI reference objects and should NOT be used with literal/ IRI
reference objects and MUST NOT be used with literal/
----
2.5.1 Definition of dwciri: terms
s/resource described by a dwciri: property should be the subject of a
triple for each value on the list/resource described by a dwciri:
property SHOULD be the subject of a triple for each value on the list/
A counter example here comes from the Harvard List of Botanists, where
  http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/305dcac3-a748-4f47-8a18-3...
references the collector team "J. D. Hooker & A. Gray", that is,
http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/cc3b7080-5fbb-4ea5-9655-4...
and
http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/3f8c70aa-1862-4784-8a53-f...
----
2.5.3 Expectation of clients encountering RDF containing dwc: and
dwciri: terms
Instances of "should" here again cut to the heart of how strong the
guidance is concerning dwc and dwciri.
----
Table 3
s/they should be used rather than using the Darwin Core ID/they SHOULD
be used rather than using the Darwin Core ID/
----
Example 22:
s/The following points about the Example 22 should be noted:/Note the
following points about Example 22:/
s/related non-literal resources should use/related non-literal
resources SHOULD use/
/LSIDs are the objects of triples, they should be
As with 2.4.2.1.1, this guidance needs to be checked against the GUID
and LSID applicability statements.  I think I'm reading the GUID
applicability statement as s/should/MUST/ here.
----
2.7.1 What purpose do convenience terms serve?
s/In general, it should not be necessary for a data provider/In
general, a data provider does not need/
----
2.7.3 Ownership of a collection item
s/of the collection item should be indicated/of the collection item
SHOULD be indicated/
----
2.7.4 Description of a taxonomic entity
s/It is considered to be out of the scope of this document to specify
how taxon concepts should be rendered as RDF/It is out of the scope of
this document to specify how to render taxon concepts as RDF/
s/terms for taxonomic entities should be properties of
dwc:Identification/terms for taxonomic entities MAY be properties of
dwc:Identification/
s/The task of describing taxonomic entities using RDF must be an effort
outside of Darwin Core/The task of describing taxonomic entities using
RDF is out of scope of this document/
----
2.8.3 Expressing Darwin Core association terms as RDF with URI
references
s/it should be used to declare the type/rdf:type SHOULD be used to
declare the type/
----
Tables 3.4 to 3.7.
s/should/SHOULD/
--------
-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu
http://vanderbilt.edu/trees

Re: [tdwg-content] RFC 2119: Re: Public comment on the Darwin Core RDF Guide

Steve Baskauf