[tdwg-content] Darwin Core vs. Simple Darwin Core
joel sachs
jsachs at csee.umbc.edu
Tue Jul 26 17:47:41 CEST 2011
Darwin Core is one of my favourite things. It's simple, elegant, and
flexible. I wasn't there at design time, so I don't know if it was
designed with the semantic web in mind, but it looks like it. It is, as
John put it, primarily a collection of terms [and their definitions]. So
if two people/agents use the same terms, they will share the same
semantics. (This is why I think that a "more semantic Darwin Core" is not
the appropriate goal for a Darwin Core/rdf working group.)
I'm concerned that there's so much confusion concerning DwC, since
confusion is (typically) a barrier to adoption.
One source of confusion is Simple Darwin Core. A huge fraction of DwC
records can be expressed as spreadsheets. Since *all* Simple DwC records
can be expressed as spreadsheets, many people think
Simple Darwin Core = spreadsheet-expressible Darwin Core
(which isn't true). This means that if they want to express their data as
a spreadsheet, they think they need to conform to Simple Darwin Core.
The requirement of Simple Darwin Core is that there be no repeated
elements. But the requirement for spreadsheet-expressible Darwin Core is
that there be no repeated nested elements. I previously argued
(http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002220.html) in
favour of using subscripts to represent elements in repeated nests
(thereby permitting their use in spreadsheets). Even if we don't permit
that, I'm not sure that the benefits of maintaing a separate Simple Darwin
Core standard, in addition to the regular Darwin Core standard, are
greater than the costs in terms of giving people wrong ideas. (I prefer
the presentation at http://rs.tdwg.org/dwc/terms/guides/xml/index.htm,
where Simple DwC is presented as simply one of several XML schemas for
Darwin Core.)
I *think* I see the motivation for Simple DwC. Suppose X wants to use
Darwin Core, but doesn't know much about databases, and just wants to put
all his data in a spreadsheet. He might not know what a repeated, nested
data structure is. So it's easiest to just say to him "don't repeat any
elements, and you'll be fine - your records will be
spreadsheet-expressible". I agree that that's a benefit. Are there others?
Thanks -
Joel.
More information about the tdwg-content
mailing list