[tdwg-content] Darwin Core vs. Simple Darwin Core

joel sachs jsachs at csee.umbc.edu
Tue Jul 26 17:47:41 CEST 2011

Darwin Core is one of my favourite things. It's simple, elegant, and 
flexible. I wasn't there at design time, so I don't know if it was 
designed with the semantic web in mind, but it looks like it. It is, as 
John put it, primarily a collection of terms [and their definitions]. So 
if two people/agents use the same terms, they will share the same 
semantics. (This is why I think that a "more semantic Darwin Core" is not 
the appropriate goal for a Darwin Core/rdf working group.)

I'm concerned that there's so much confusion concerning DwC, since 
confusion is (typically) a barrier to adoption.

One source of confusion is Simple Darwin Core. A huge fraction of DwC 
records can be expressed as spreadsheets. Since *all* Simple DwC records 
can be expressed as spreadsheets, many people think

Simple Darwin Core = spreadsheet-expressible Darwin Core

(which isn't true). This means that if they want to express their data as 
a spreadsheet, they think they need to conform to Simple Darwin Core.

The requirement of Simple Darwin Core is that there be no repeated 
elements. But the requirement for spreadsheet-expressible Darwin Core is 
that there be no repeated nested elements. I previously argued 
(http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002220.html) in 
favour of using subscripts to represent elements in repeated nests 
(thereby permitting their use in spreadsheets). Even if we don't permit 
that, I'm not sure that the benefits of maintaing a separate Simple Darwin 
Core standard, in addition to the regular Darwin Core standard, are 
greater than the costs in terms of giving people wrong ideas. (I prefer 
the presentation at http://rs.tdwg.org/dwc/terms/guides/xml/index.htm, 
where Simple DwC is presented as simply one of several XML schemas for 
Darwin Core.)

I *think* I see the motivation for Simple DwC. Suppose X wants to use 
Darwin Core, but doesn't know much about databases, and just wants to put 
all his data in a spreadsheet. He might not know what a repeated, nested 
data structure is. So it's easiest to just say to him "don't repeat any 
elements, and you'll be fine - your records will be 
spreadsheet-expressible". I agree that that's a benefit. Are there others?

Thanks -

More information about the tdwg-content mailing list