A radical proposal for Darwin Core
Hi Everyone,
Darwin Core remains poorly documented, occasionally inconsistent, and frequently misunderstood. (Does anyone disagree with that characterization?) I believe this is one of the reasons we're seeing a proliferation of overlapping and sometimes incompatible ontologies building on Darwin Core terms.
One of the suggestions that came up on the TDWG-RDF mailing list is to have a clean-up-a-thon/document-a-thon for TDWG namespaces and terms. I suggest that, until such a clean up of Darwin Core occurs, TDWG accept no additions to the Darwin Core standard. There are several examples in support of my claim that we're building on a shaky foundation - an obvious one is that, as Steve is currently pointing out, there is no consensus on what constitutes a Darwin Core occurrence. (Can anyone name an instance of the class "http://rs.tdwg.org/dwc/terms/Occurrence%22?)
The clean-up-a-thon proposal was enthusiastically endorsed within the RDF group, but no one volunteered to organize it. I propose that we self-organize, and find a way to carve out two days at the coming meeting to hash out as much as we can, with a follow-on workshop if necessary. But first, I'd be interested to know - am I the only one who feels this way?
Sincerely, Joel.
p.s. I've said this before, but it bears repeating - Darwin Core is almost an excellent standard, and almost ideally suited to be the foundation for a semantic web for biodiversity informatics. I have great respect for those who were involved in its creation and continued curation - for their hard work, and clear thinking, and patience for people like me struggling to understand. But all that work, thought, and patience will be for naught, if the gyre is allowed to widen much further.
Joel -- From an insider-outsider perspective, a couple quick comments: 1) Do you mean Darwin Core is frequently misunderstood by standards developers? Or do you mean Darwin Core is frequently misunderstood by people without specialized skills to read and understand standards? 2) I see the point that some clean-up would be useful but my view is that Darwin Core fulfills its intended purpose for most people who want to map their headers in a spreadsheet to a set of terms in the Core. This support an ecosystem of data that has come available online over the last 15 years. I was talking to Tim Robertson, and I think the number is 3 records per second (per average) coming online via GBIF, the vast majority in Darwin Core format. 3) Is it enough to clean up Darwin Core somehow, wipe our hands and walk away? I guess maybe we could be sharper with term definitions. But is that the problem or is the problem that what we want to do with Darwin Core doesn't fit its history and intended use as an exchange format. 4) I see the bigger challenge being how we grow more semantically meaningful representations that let us do new things (an example might be the Biocollections Ontology (BCO)) versus more limited things we do with Darwin Core.
This is just my naive impression. I am not an expert in RDF or the semantic web. Id like yet more clarity before we get into what might be an challenging task. Could we be even more focused? Can we surgically repair the key things in DwC not do a "clean up"?
Best, Rob
On Mon, Jun 24, 2013 at 1:45 PM, joel sachs jsachs@csee.umbc.edu wrote:
Hi Everyone,
Darwin Core remains poorly documented, occasionally inconsistent, and frequently misunderstood. (Does anyone disagree with that characterization?) I believe this is one of the reasons we're seeing a proliferation of overlapping and sometimes incompatible ontologies building on Darwin Core terms.
One of the suggestions that came up on the TDWG-RDF mailing list is to have a clean-up-a-thon/document-a-thon for TDWG namespaces and terms. I suggest that, until such a clean up of Darwin Core occurs, TDWG accept no additions to the Darwin Core standard. There are several examples in support of my claim that we're building on a shaky foundation - an obvious one is that, as Steve is currently pointing out, there is no consensus on what constitutes a Darwin Core occurrence. (Can anyone name an instance of the class "http://rs.tdwg.org/dwc/terms/Occurrence%22?)
The clean-up-a-thon proposal was enthusiastically endorsed within the RDF group, but no one volunteered to organize it. I propose that we self-organize, and find a way to carve out two days at the coming meeting to hash out as much as we can, with a follow-on workshop if necessary. But first, I'd be interested to know - am I the only one who feels this way?
Sincerely, Joel.
p.s. I've said this before, but it bears repeating - Darwin Core is almost an excellent standard, and almost ideally suited to be the foundation for a semantic web for biodiversity informatics. I have great respect for those who were involved in its creation and continued curation - for their hard work, and clear thinking, and patience for people like me struggling to understand. But all that work, thought, and patience will be for naught, if the gyre is allowed to widen much further.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Rob,
Thanks - responses below.
On Mon, 24 Jun 2013, Robert Guralnick wrote:
Joel -- From an insider-outsider perspective, a couple quick comments:
- Do you mean Darwin Core is frequently misunderstood by standards
developers? Or do you mean Darwin Core is frequently misunderstood by people without specialized skills to read and understand standards?
Good questions. What most concerns me is that, amongst those who are expressing occurrence data on the semantic web, there is a divergence of opinion regarding the semantics of core Darwin Core terms.
- I see the point that some clean-up would be useful but my view is that
Darwin Core fulfills its intended purpose for most people who want to map their headers in a spreadsheet to a set of terms in the Core.
I think you're right. The question is, can we clarify the semantics of existing terms in a way that will enable Darwin Core to be used effectively on the semantic web, without breaking existing infrastructure, and without altering the interpretations of existing Darwin Core records? I think we should try.
This support an ecosystem of data that has come available online over the last 15 years. I was talking to Tim Robertson, and I think the number is 3 records per second (per average) coming online via GBIF, the vast majority in Darwin Core format.
- Is it enough to clean up Darwin Core somehow, wipe our hands and walk
away? I guess maybe we could be sharper with term definitions. But is that the problem or is the problem that what we want to do with Darwin Core doesn't fit its history and intended use as an exchange format.
Most of the arguments and discussions I've seen on this list have involved doing pretty basic things with Darwin Core - mainly representing occurrences of organisms at particular times and places. We saw (or, at least, I saw) a shift away from a specimen-centric point of view when the rdfs:subClass backbone was removed from the dwctype vocabulary (previously called the type hierarchy) circa 2010/2011. But the documentation surrounding other parts of the standard have not caught up with this shift. (This is, IMO, a source of confusion.)
- I see the bigger challenge being how we grow more semantically
meaningful representations that let us do new things
Yes - this is the bigger challenge! Clarifying whether an occurrence is a category of information, a superclass of preserved specimen, a record of an organism's appearance at a particular place and time, or all of the above should be the easy part.
(an example might be the Biocollections Ontology (BCO)) versus more limited things we do with Darwin Core.
This is just my naive impression. I am not an expert in RDF or the semantic web.
No one is. (Well, maybe this one guy at MIT ...)
Id like yet more clarity before we get into what might be an challenging task. Could we be even more focused?
Yes, we must be more focussed than simply saying "we'll clean it all up".
Can we surgically repair the key things in DwC not do a "clean up"?
Possibly. Steve has been working tirelessly to do just that, and he recently asked for suggestions on how to repair our fractured understanding of dwc:occurrence. My suggestion is to tackle a few other key terms while we're at it, in the context of a clarification of the purposes of the two Darwin Core namespaces, and the status of other relevant standards.
Cheers, Joel.
Best, Rob
On Mon, Jun 24, 2013 at 1:45 PM, joel sachs jsachs@csee.umbc.edu wrote: Hi Everyone,
Darwin Core remains poorly documented, occasionally inconsistent, and frequently misunderstood. (Does anyone disagree with that characterization?) I believe this is one of the reasons we're seeing a proliferation of overlapping and sometimes incompatible ontologies building on Darwin Core terms. One of the suggestions that came up on the TDWG-RDF mailing list is to have a clean-up-a-thon/document-a-thon for TDWG namespaces and terms. I suggest that, until such a clean up of Darwin Core occurs, TDWG accept no additions to the Darwin Core standard. There are several examples in support of my claim that we're building on a shaky foundation - an obvious one is that, as Steve is currently pointing out, there is no consensus on what constitutes a Darwin Core occurrence. (Can anyone name an instance of the class "http://rs.tdwg.org/dwc/terms/Occurrence"?) The clean-up-a-thon proposal was enthusiastically endorsed within the RDF group, but no one volunteered to organize it. I propose that we self-organize, and find a way to carve out two days at the coming meeting to hash out as much as we can, with a follow-on workshop if necessary. But first, I'd be interested to know - am I the only one who feels this way? Sincerely, Joel. p.s. I've said this before, but it bears repeating - Darwin Core is almost an excellent standard, and almost ideally suited to be the foundation for a semantic web for biodiversity informatics. I have great respect for those who were involved in its creation and continued curation - for their hard work, and clear thinking, and patience for people like me struggling to understand. But all that work, thought, and patience will be for naught, if the gyre is allowed to widen much further. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
While i agree that house cleaning is in order, I think it would be a big mistake to accept no new additions to the standard in the interim.
Perhaps a strategy is to shoot for a "major release" version (say with a target 1 year out) while continuing with current modifications to the existing standard.
John
On Mon, Jun 24, 2013 at 3:45 PM, joel sachs jsachs@csee.umbc.edu wrote:
Hi Everyone,
Darwin Core remains poorly documented, occasionally inconsistent, and frequently misunderstood. (Does anyone disagree with that characterization?) I believe this is one of the reasons we're seeing a proliferation of overlapping and sometimes incompatible ontologies building on Darwin Core terms.
One of the suggestions that came up on the TDWG-RDF mailing list is to have a clean-up-a-thon/document-a-thon for TDWG namespaces and terms. I suggest that, until such a clean up of Darwin Core occurs, TDWG accept no additions to the Darwin Core standard. There are several examples in support of my claim that we're building on a shaky foundation - an obvious one is that, as Steve is currently pointing out, there is no consensus on what constitutes a Darwin Core occurrence. (Can anyone name an instance of the class "http://rs.tdwg.org/dwc/terms/Occurrence%22?)
The clean-up-a-thon proposal was enthusiastically endorsed within the RDF group, but no one volunteered to organize it. I propose that we self-organize, and find a way to carve out two days at the coming meeting to hash out as much as we can, with a follow-on workshop if necessary. But first, I'd be interested to know - am I the only one who feels this way?
Sincerely, Joel.
p.s. I've said this before, but it bears repeating - Darwin Core is almost an excellent standard, and almost ideally suited to be the foundation for a semantic web for biodiversity informatics. I have great respect for those who were involved in its creation and continued curation - for their hard work, and clear thinking, and patience for people like me struggling to understand. But all that work, thought, and patience will be for naught, if the gyre is allowed to widen much further.
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Mon, 24 Jun 2013, John Deck wrote:
While i agree that house cleaning is in order, I think it would be a big mistake to accept no new additions to the standard in the interim. Perhaps a strategy is to shoot for a "major release" version (say with a target 1 year out) while continuing with current modifications to the existing standard.
John, I'm not against that strategy, but I do think it's dangerous, since the proposed modifications make significant assumptions about how the existing standard is both interpreted and applied.
One example: Suppose the DwC RDF guide is adopted in its current form. Then, dwctype:occurrence will be the standard rdf:type for what we commonly call occurrences. But most readers of the current Darwin Core standard come to the conclusion that *dwc:occurrence* is the rdf:type of what we commonly call occurrences. You could argue that this isn't so bad, because then the next major release of Darwin Core can be informed by the RDF guide. But what if the release gets delayed a couple of years? Then the normative (Type 1) part of the standard will appear to be in conflict with the non-normative (Type 2) RDF examples. From a purely technical point of view, this isn't a problem, since Type 1 documents take precedence over Type 2 documents. But it's a situation we want to avoid.
Best, Joel.
John
On Mon, Jun 24, 2013 at 3:45 PM, joel sachs jsachs@csee.umbc.edu wrote: Hi Everyone,
Darwin Core remains poorly documented, occasionally inconsistent, and frequently misunderstood. (Does anyone disagree with that characterization?) I believe this is one of the reasons we're seeing a proliferation of overlapping and sometimes incompatible ontologies building on Darwin Core terms. One of the suggestions that came up on the TDWG-RDF mailing list is to have a clean-up-a-thon/document-a-thon for TDWG namespaces and terms. I suggest that, until such a clean up of Darwin Core occurs, TDWG accept no additions to the Darwin Core standard. There are several examples in support of my claim that we're building on a shaky foundation - an obvious one is that, as Steve is currently pointing out, there is no consensus on what constitutes a Darwin Core occurrence. (Can anyone name an instance of the class "http://rs.tdwg.org/dwc/terms/Occurrence"?) The clean-up-a-thon proposal was enthusiastically endorsed within the RDF group, but no one volunteered to organize it. I propose that we self-organize, and find a way to carve out two days at the coming meeting to hash out as much as we can, with a follow-on workshop if necessary. But first, I'd be interested to know - am I the only one who feels this way? Sincerely, Joel. p.s. I've said this before, but it bears repeating - Darwin Core is almost an excellent standard, and almost ideally suited to be the foundation for a semantic web for biodiversity informatics. I have great respect for those who were involved in its creation and continued curation - for their hard work, and clear thinking, and patience for people like me struggling to understand. But all that work, thought, and patience will be for naught, if the gyre is allowed to widen much further. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- John Deck (541) 321-0689
I was filing emails and re-read this one. I just wanted to clarify one thing about the categories of standards documents. There is currently no standard that specifies how standards should be documented. There is a draft Standards Documentation Specification (http://www.tdwg.org/standards/147/ viewable online at http://bioimages.vanderbilt.edu/pages/tdwg-stds-spec.pdf ) but its ratification has been stalled for about five years. The Vocabulary Management Task Group (VoMaG) has recommended (http://community.gbif.org/pg/file/read/34059/ Recommendation 12; please read and comment as the public comment period is going on now!) that a new author team be tasked with writing an updated version of this draft standard. Nevertheless, it is the only guideline we have at the moment.
What the existing Standards Documentation Specification document says is that standards documents fall into three categories: Type 1 (normative) documents - which define the standard, Type 2 (non-normative) documents - which explain and justify the standard and which ARE part of the standard, and Type 3 (informative) documents - which provide helpful information but which are NOT actually part of the standard itself and therefore aren't governed by the TDWG Standards process (http://www.tdwg.org/about-tdwg/process/ ). To illustrate this with Darwin Core, the single normative (Type 1) RDF document is http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory.... . There are many (I think hundreds) of non-normative (Type 2) documents that are part of the standard, notably web pages like the term Quick Reference Guide http://rs.tdwg.org/dwc/terms/index.htm and the XML Guide http://rs.tdwg.org/dwc/terms/guides/xml/index.htm . The pages on the Darwin Core Google Code site (e.g. http://code.google.com/p/darwincore/wiki/Occurrence ) are Type 3 documents. It requires an official act in accordance with the DwC Namespace policy (http://rs.tdwg.org/dwc/terms/namespace/index.htm ) to change a Type 2 document. Type 3 documents can be changed at any time with no official action required.
As far as Darwin Core RDF Guide is concerned, the Guide document itself would become a Type 2 document (non-normative) and part of the Darwin Core Standard. Its acceptance is governed by the Standards process and therefore requires public comment, etc. Any term additions and changes (such as the addition of the proposed dwcuri: namespace terms) which are accepted would be made to the Type 1 RDF document as well as to the Type 2 human-readable pages that serve as reference. The RDF examples (e.g. http://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesTaxonConcept and http://code.google.com/p/tdwg-rdf/wiki/DwcRdfExamplesDarwinSW ) which are not included in the actual Guide would be Type 3 documents. They aren't governed by the Standards Process and could reside on either the RDF Task Group Google Code wiki, the Darwin Core Google Code site, or any other place where we might decide to put that kind of reference document (e.g. on the GBIF site if the VoMaG group gets it set up).
I hope this clarifies the situation. There has been a lot of confusion about this in the past. Steve
joel sachs wrote:
guide. But what if the release gets delayed a couple of years? Then the normative (Type 1) part of the standard will appear to be in conflict with the non-normative (Type 2) RDF examples. From a purely technical point of view, this isn't a problem, since Type 1 documents take precedence over Type 2 documents. But it's a situation we want to avoid.
On Mon, 1 Jul 2013, Steve Baskauf wrote:
<snip>
To illustrate this with Darwin Core, the single normative (Type 1) RDF document is http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory....
</snip>
Steve,
Are you sure that that document is *the* normative Darwin Core? Consider:
i. It is not included in the Download from the Darwin Core Cover Page, http://www.tdwg.org/standards/450/ (I admit that this is weak evidence, since every document that is included in the download is outdated. So whatever the normative standard is, it's not included in the Download from the Darwin Core Cover Page - strange but true.)
ii. It does not define any Darwin Core terms. For example, the document defines http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen-2008-11-19 and http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen-2011-10-16 but not http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen
I always assumed that the normative standard was defined by http://rs.tdwg.org/dwc/rdf/dwcterms.rdf and http://rs.tdwg.org/dwc/rdf/dwctype.rdf, and (perhaps) http://rs.tdwg.org/dwc/ and http://rs.tdwg.org/dwc/dwctype/
Could someone please clarify?
Many thanks, Joel.
Joel,
Within the last year I did confirm with John Wieczorek that the dwctermshistory.rdf is THE one normative document of Darwin Core. One would have no way to know that other than personally asking John since I have never found anything in writing which states that. The connection between dwctermshistory.rdf and the RDF served when the terms are dereferenced is a bit tenuous, but if you drill down into the term definitions that get served via the dwcterms.rdf document, they are linked to the historical terms via dcterms:hasVersion and dcterms:replaces properties although I'm not sure I can explain how a semantic client would follow its nose to the dwctermshistory.rdf document. I don't know what went into the decision to set DwC up this way since it was before my time. John W. may have further comments.
In a previous email which I'm not going to attempt to look up in the archives, I asked (begged?) that the Darwin Core RDF documents be clearly marked as to whether they were normative or not because I've been confused about this exact thing for several years. I was thinking that the recommendations of the VoMaG draft report on vocabulary management included clearly demarcating which document is normative in a standard, but I just looked at the report again (http://community.gbif.org/pg/file/read/34059/ ) and didn't see it. The section 4.3 Recommendation 5 says "As part of its documentation, a vocabulary must include machine readable metadata expressed, e.g., in RDF, that describe the main characteristics of the vocabulary." Perhaps this recommendation should be amended to say that one particular characteristic to be described is the identity of the type 1 (normative) document for the vocabulary if it is a standard. Since the VoMaG report is in the middle of its public comment period, this would be an excellent comment to make. I'm not going to make it since I'm one of the authors, but anybody else could.
Steve
joel sachs wrote:
On Mon, 1 Jul 2013, Steve Baskauf wrote:
<snip>
To illustrate this with Darwin Core, the single normative (Type 1) RDF document is http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory....
</snip>
Steve,
Are you sure that that document is *the* normative Darwin Core? Consider:
i. It is not included in the Download from the Darwin Core Cover Page, http://www.tdwg.org/standards/450/ (I admit that this is weak evidence, since every document that is included in the download is outdated. So whatever the normative standard is, it's not included in the Download from the Darwin Core Cover Page - strange but true.)
ii. It does not define any Darwin Core terms. For example, the document defines http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen-2008-11-19 and http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen-2011-10-16 but not http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen
I always assumed that the normative standard was defined by http://rs.tdwg.org/dwc/rdf/dwcterms.rdf and http://rs.tdwg.org/dwc/rdf/dwctype.rdf, and (perhaps) http://rs.tdwg.org/dwc/ and http://rs.tdwg.org/dwc/dwctype/
Could someone please clarify?
Many thanks, Joel.
Ha ha! I can see that there is something defective about my memory. I searched my "sent" emails and guess what I found. http://code.google.com/p/darwincore/issues/detail?id=136 I had forgotten about that. So identifying the normative document is an accepted high priority issue for the next version of DwC
Steve
joel sachs wrote:
On Mon, 1 Jul 2013, Steve Baskauf wrote:
<snip>
To illustrate this with Darwin Core, the single normative (Type 1) RDF document is http://code.google.com/p/darwincore/source/browse/trunk/rdf/dwctermshistory....
</snip>
Steve,
Are you sure that that document is *the* normative Darwin Core? Consider:
i. It is not included in the Download from the Darwin Core Cover Page, http://www.tdwg.org/standards/450/ (I admit that this is weak evidence, since every document that is included in the download is outdated. So whatever the normative standard is, it's not included in the Download from the Darwin Core Cover Page - strange but true.)
ii. It does not define any Darwin Core terms. For example, the document defines http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen-2008-11-19 and http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen-2011-10-16 but not http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen
I always assumed that the normative standard was defined by http://rs.tdwg.org/dwc/rdf/dwcterms.rdf and http://rs.tdwg.org/dwc/rdf/dwctype.rdf, and (perhaps) http://rs.tdwg.org/dwc/ and http://rs.tdwg.org/dwc/dwctype/
Could someone please clarify?
Many thanks, Joel.
participants (4)
-
joel sachs
-
John Deck
-
Robert Guralnick
-
Steve Baskauf