[tdwg-content] A less radical proposal for Darwin Core

Steve Baskauf steve.baskauf at vanderbilt.edu
Thu Jun 27 19:37:22 CEST 2013


Joel,

As far as the release of the DwC RDF guide for public comment, I'm not 
intending to release it until there is consensus support (to the extent 
that I can figure out what that means) from the Task Group.  So I 
definitely would not release it if you and other core members of the 
Task Group did not feel that it was ready.  I have been working toward a 
June 30 deadline primarily because if people don't have deadlines, they 
usually don't take action and because I won't have much time to work on 
the project after August 1. 

A technical point about your point (i.) below.  As far as I can tell, 
all of the RDF for Darwin Core is now coming from the Google Code site 
rather than directly from rd.tdwg.org . John Wieczorek could confirm 
that.   So although the xsl issue is somewhat annoying and should be 
fixed, I think the raw RDF can be viewed pretty easily by browsing in 
the http://code.google.com/p/darwincore/source/browse/#svn%2Ftrunk%2Frdf 
directory (if people know to look there).  I can never remember how to 
get there, but there are links to the raw DwC RDF in section 0.3.7 of 
the Beginner's Guide (http://code.google.com/p/tdwg-rdf/wiki/Beginners ) 
as well as the source RDF of many other vocabularies. 

As far as item (ii) is concerned, I'm willing to go with the consensus 
of the group.  My main concern is "who will do it?".  There are many 
things that should be done to improve the navigation and readability of 
the general TDWG Website but although that issue often gets discussed, 
no action has thus far been taken.  The Darwin Core documentation is 
comparatively easy to navigate but it has the problem that it 
technically can't be changed without making a change to the standard 
itself.  So I'm not sure where something served by tdwg.org would go 
without it just getting lost among the obsolete material in the maze of 
the TDWG Website.  It might be logical to put a document such as you 
suggest on the DwC Google Code site 
(http://code.google.com/p/darwincore/), which I think is the default 
location for ancillary information about Darwin Core that isn't actually 
part of the standard.

You note that "There is no example of representing a Darwin Core 
occurrence in RDF!"  The DwC RDF guide is intentionally written to be 
devoid of detailed implementation examples because of the problem that 
you have identified: the lack of a consensus data model/ontology for 
expressing biodiversity class relationships as RDF.  The guide tries to 
follow the example of the non-RDF parts of Darwin Core, which is to 
provide terms and tell people how to use them without explicitly 
specifying their ranges or domains.  I also have tried to avoid 
including anything that is likely to have to change much over time 
because if the Guide becomes part of the DwC Standard (alongside the 
Text and XML Guides) it can't be easily changed.  However, I intend that 
the ancillary information (not part of the actual standard) will provide 
a variety of actual implementation examples.  Those haven't all been 
written yet, partly because I didn't want to have to redo them if 
approaches suggested by the Guide itself didn't fly with the Task 
Group.  However, I just finished a preserved specimen example using 
Darwin-SW this morning 
(http://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesDarwinSW ) which 
hasn't been proofread carefully yet.  I will try to give a shot at 
marking up the same metadata using taxonconcept.org object properties if 
I can and hopefully Rob and John Deck will provide an example of how 
they would serve metadata using the BiSciCol object properties.  These 
examples can evolve and be added to as experimentation and experience 
guides us.  But by keeping them out of the standard itself, they can do 
that freely without requiring the invocation of an official change to 
the standard.  If at some point there is a consensus data model (or 
models), they could be formalized as TDWG Technical Standard that could 
sit on top of the basic DwC RDF guidelines.  But my feeling is that it 
isn't necessary to wait for that model to complete the basic RDF Guide.  
I would be really happy to have a consensus or clarification about 
dwc:/dwctype:Occurrence.  But I don't think the RDF Guide depends on that.

Others may wish to weigh in on this.
Steve

joel sachs wrote:
> This is an open letter to Steve Baskauf.
>
> <context>
> For those who don't know, Steve came to TDWG a few years ago for 
> answers on how to share data about plants, and and data about images 
> of plants. He was, at the time, supplying three aggregators with data, 
> and had to format his data differently for each one. Each of the 
> aggregators had, in theory, embraced the semantic web, as had TDWG. So 
> he started, in 2009, to ask some innocuous questions on the 
> TDWG-content list regarding how to use TDWG standards. These questions 
> led to such a cacophony of answers that he realized that TDWG needed 
> help, and he agreed, in 2011, to co-convene the TDWG RDF-Task group, 
> under the sponsorship of the TDWG TAG.
>
> He shares the vision, espoused by many, that it would be better to 
> build a "semantic layer" on top of Darwin Core, than to have to 
> rebuild a "semantic Darwin Core" from scratch. The reason for this is 
> that everyone has different semantic web needs, and, if forced to 
> start from scratch, will build different semantic Darwin Cores. Then 
> BCO and Darwin-SW and DWC-FP and Pyle/Whiton will have no way of 
> understanding each other, while if they shared a basic underlying 
> vocabulary [1], there would at least be a chance for data 
> interoperability.
> </context>
>
> <email to Steve>
> Steve,
>
> I proposed a moratorium on additions to Darwin Core until some 
> fundamental things are cleaned up. I'm only aware of two additions 
> pending: the addition of MaterialSample and MaterialSampleID to the 
> "dwctype:" namespace; and the Darwin Core RDF guide, still receiving 
> feedback from the task group. I do think that the MaterialSample 
> proposal would be better framed atop a clearer semantics of existing 
> terms and usage patterns, but who knows. I trust the proposers of the 
> new terms to make good choices.
>
> My current proposal (below) follows Guralnick's advice to narrow 
> scope. It is partly motivated by the fact that you made a perfectly 
> reasonable plea for clarification regarding the meaning of 
> "occurrence", and no one is responding. (Yes, you only made the plea 
> three days ago, but you've made many versions of it in the past.) In 
> my opinion, it is unreasonable to expect the RDF Task Group to produce 
> a guide for expressing Darwin Core in RDF unless some effort is made 
> to clarify the semantics of a few Darwin Core terms.
>
> So my current, less radical, proposal is this:
>
> <proposal>
> That the RDF Task Group not release the draft guide for public comment 
> until we (i.e. TDWG) improve the documentation around dwc:Occurrence 
> and dwctype:Occurrence. By "improve", I mean:
>
> i. Configure the rs.tdwg.org server to set an xml mime type for 
> http://rs.tdwg.org/dwc/rdf/human-dwctype.xsl and 
> http://rs.tdwg.org/dwc/rdf/human.xsl . (Currently the mime type is set 
> to "text/plain", which results in users being unable to view either 
> the raw RDF or the human-readable RDF in a standard browser.)
>
> ii. In a single document, served by tdwg.org, list the valid 
> usages/interpretations of dwc:Occurrence and dwctype:Occurrence. (I 
> believe there are six.) Currently, to be aware of the various 
> semantics of "Occurrence", a person needs to read half a dozen or so 
> different documents. That's not necessarily bad, but there should at 
> least be a "Guide to Standard 450", that links to and briefly explains 
> the contents and purposes of each document.
> </proposal>
>
> Is that too much to ask of ourselves?
>
> Now, you might say, "Fine, we should do that, but I'm not delaying 
> release of the guide, which has already taken way too long to 
> produce." And if you say that, I'll support you. But if we go ahead 
> and release the guide without fixing some basic underlying things, I 
> don't think it will have the impact that it should, for several 
> reasons. Here's one of them:
>
> <Example of how we're limiting ourselves>
> The draft guide is excellent. For many examples of what people want to 
> do, the guide essentially creates a matrix with the dimensions 
> "technical approaches" and "conceptual approaches", and proceeds to 
> fill in every cell of this matrix with a valid usage pattern. It is 
> meticulous. Yet there is a significant omission. There is no example 
> of representing a Darwin Core occurrence in RDF! In the guide's 
> defence, there is no way to do this without making significant 
> assumptions about just what a Darwin Core occurrence is.
> </Example of how we're limiting ourselves>
>
> Anyway, those are my thoughts, and some of my arguments. I will defer 
> to your judgement on the best way to proceed.
>
> Best,
> Joel.
>
> </email to Steve>
>
>
> 1. Notice I said "vocabulary" and not "ontology". The less ontology 
> there is in the shared Core, the easier it will be for people to build 
> on it to suit their needs. But a lack of ontology does not imply a 
> lack of semantics. For proof, look at a dictionary - it has the 
> complete semantics of the language. Similarly, Darwin Core - a "bag of 
> terms", i.e. a glossary - is a perfect candidate to be a shared base 
> layer for the semantic web … once it clarifies a few basic things.
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
PMB 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 322-4942
If you fax, please phone or email so that I will know to look for it.
http://bioimages.vanderbilt.edu




More information about the tdwg-content mailing list