[tdwg-content] A less radical proposal for Darwin Core

joel sachs jsachs at csee.umbc.edu
Thu Jun 27 17:45:28 CEST 2013

This is an open letter to Steve Baskauf.

For those who don't know, Steve came to TDWG a few years ago for answers 
on how to share data about plants, and and data about images of plants. He 
was, at the time, supplying three aggregators with data, and had to format his data 
differently for each one. Each of the aggregators had, in theory, embraced 
the semantic web, as had TDWG. So he started, in 2009, to ask some 
innocuous questions on the TDWG-content list regarding how to use TDWG 
standards. These questions led to such a cacophony of answers that he 
realized that TDWG needed help, and he agreed, in 2011, to co-convene the 
TDWG RDF-Task group, under the sponsorship of the TDWG TAG.

He shares the vision, espoused by many, that it would be better to build a 
"semantic layer" on top of Darwin Core, than to have to rebuild a 
"semantic Darwin Core" from scratch. The reason for this is that everyone 
has different semantic web needs, and, if forced to start from scratch, 
will build different semantic Darwin Cores. Then BCO and Darwin-SW and 
DWC-FP and Pyle/Whiton will have no way of understanding each other, while 
if they shared a basic underlying vocabulary [1], there would at least be 
a chance for data interoperability.

<email to Steve>

I proposed a moratorium on additions to Darwin Core until some fundamental 
things are cleaned up. I'm only aware of two additions pending: the 
addition of MaterialSample and MaterialSampleID to the "dwctype:" 
namespace; and the Darwin Core RDF guide, still receiving feedback from 
the task group. I do think that the MaterialSample proposal would be 
better framed atop a clearer semantics of existing terms and usage 
patterns, but who knows. I trust the proposers of the new terms to make 
good choices.

My current proposal (below) follows Guralnick's advice to narrow scope. It 
is partly motivated by the fact that you made a perfectly reasonable plea 
for clarification regarding the meaning of "occurrence", and no one is 
responding. (Yes, you only made the plea three days ago, but you've made 
many versions of it in the past.) In my opinion, it is unreasonable to 
expect the RDF Task Group to produce a guide for expressing Darwin Core in 
RDF unless some effort is made to clarify the semantics of a few Darwin 
Core terms.

So my current, less radical, proposal is this:

That the RDF Task Group not release the draft guide for public comment 
until we (i.e. TDWG) improve the documentation around dwc:Occurrence and 
dwctype:Occurrence. By "improve", I mean:

i. Configure the rs.tdwg.org server to set an xml mime type for 
http://rs.tdwg.org/dwc/rdf/human-dwctype.xsl and 
http://rs.tdwg.org/dwc/rdf/human.xsl . (Currently the mime type is set to 
"text/plain", which results in users being unable to view either the raw 
RDF or the human-readable RDF in a standard browser.)

ii. In a single document, served by tdwg.org, list the valid 
usages/interpretations of dwc:Occurrence and dwctype:Occurrence. (I 
believe there are six.) Currently, to be aware of the various semantics of 
"Occurrence", a person needs to read half a dozen or so different 
documents. That's not necessarily bad, but there should at least be a 
"Guide to Standard 450", that links to and briefly explains the contents 
and purposes of each document.

Is that too much to ask of ourselves?

Now, you might say, "Fine, we should do that, but I'm not delaying release 
of the guide, which has already taken way too long to produce." And if you 
say that, I'll support you. But if we go ahead and release the guide 
without fixing some basic underlying things, I don't think it will have 
the impact that it should, for several reasons. Here's one of them:

<Example of how we're limiting ourselves>
The draft guide is excellent. For many examples of what people want to do, 
the guide essentially creates a matrix with the dimensions "technical 
approaches" and "conceptual approaches", and proceeds to fill in every 
cell of this matrix with a valid usage pattern. It is meticulous. Yet 
there is a significant omission. There is no example of representing a 
Darwin Core occurrence in RDF! In the guide's defence, there is no way to 
do this without making significant assumptions about just what a Darwin 
Core occurrence is.
</Example of how we're limiting ourselves>

Anyway, those are my thoughts, and some of my arguments. I will defer to 
your judgement on the best way to proceed.


</email to Steve>

1. Notice I said "vocabulary" and not "ontology". The less ontology there 
is in the shared Core, the easier it will be for people to build on it to 
suit their needs. But a lack of ontology does not imply a lack of 
semantics. For proof, look at a dictionary - it has the complete semantics 
of the language. Similarly, Darwin Core - a "bag of terms", i.e. a 
glossary - is a perfect candidate to be a shared base layer for the 
semantic web … once it clarifies a few basic things.

