[tdwg-content] Darwin Core vs. Simple Darwin Core

Wed Aug 3 16:10:29 CEST 2011

Hi Donald,

Sorry for the delayed response (we had a holiday in Canada). A few 
responses ...

>> I am however a little uneasy about Peter's suggested solution.  It 
>> works perfectly when we use RDF/OWL, but the semantics get lost in 
>> other contexts.  DwC is intended to be as neutral as possible on 
>> encodings.

Yes, although if we could do everything we wanted without the semantic 
web, then we wouldn't need the semantic web. The need to distinguisgh 
between DwC and DwC/RDF came up last week both in the exchange between 
Pete and Markus, and also in the one between Steve and Bob. I'm very 
interested in squeezing semantics out of non-rdf data, and so 
see value in distinguishing between use cases for rdf and non-rdf 
representations. On the other hand, I find this to be sometimes 
difficult. For example, any time you introduce the notion of a "class" (as Darwin Core does), the 
notion of "subclass" is pretty natural.

>> There is some advantage to semantics being a secondary layer built on 
>> top of naked terms.

I agree. I like the idea of layered semantics, with not just 
secondary layers, but tertiary and quaternary. Users start with the base 
layer, and then import the further layers that their use case demands.

>> Using dwc:vernacularName_en, etc. would compromise that.

I'm not sure why. Would it be wrong to add a bunch of vernacularName_x 
terms to Darwin Core? (As well as adding, at one of the higher semantic 
layers, a bunch of "vernacularName_x subPropertyOf varnacularName" 
statements.)

>> 1 - Ctenomys sp. by Richard Sage in 2000
>> 2 - Ctenomys sociabilis by James L Patton on 14 September 2001

>> There is no indication whether one of these is preferred (ABCD used an
>> attribute to indicate this).  How should a consumer needing Simple DwC 
>> (e.g. GBIF) interpret this?

The proposed "identificationVerificationStatus" term is necessary, but 
not sufficient to address this, right?

Regards,
Joel.

On Wed, 27 Jul 2011, Donald.Hobern at csiro.au wrote:

> Joel,
>
> Not sure if you saw my reply over the weekend on the vernacularName thread (http://lists.tdwg.org/pipermail/tdwg-content/2011-July/002686.html).  As we expand beyond Simple DwC (interpreted as completely non-repeating, flat DwC), we need to ensure that consumers can reliably and consistently derive the best Simple DwC record for any Occurrence.
>
> Darwin Core is addressing a range of use cases.  We have the interests of taxonomists and collection managers to be able to retrieve as much information as possible about each specimen (or observation).  Class-based DwC, like ABCD, will allow publication of very rich specimen data, with a complete history of identifications, collectors, etc.  On the other hand, we have also many users (including software systems) which really need to know as reliably and efficiently as possible 1) to what species the specimen is currently assigned, 2) where it was collected, 3) when it was collected, and 4) how much evidence there is for these assertions.  In a sense this may be a serious simplification, but this precise level of detail is important for ecologists, planning agencies, software indexes, etc.
>
> That means that we should rigorously define how repeating elements can be included in DwC while allowing users unambiguously to derive this core subset.  My belief is that repeating vernacularName poses no problem in this case.  A consumer can choose to take all, one or none of the supplied vernacularName values without serious harm.  A much bigger problem is that addressed in Peter DeVries' message.  I really want to know the language associated with a vernacularName.  In cases where there are multiple vernacular names in the same language, I'd also like to know if one of them is considered by the provider to be the "preferred" name.  This implies that naked vernacularNames without further metadata may not be as useful as they should be.  I am however a little uneasy about Peter's suggested solution.  It works perfectly when we use RDF/OWL, but the semantics get lost in other contexts.  DwC is intended to be as neutral as possible on encodings.  There is some advantage to semantics being a secondary layer built on top of naked terms.  Using dwc:vernacularName_en, etc. would compromise that.  It may be that the benefits outweigh the disadvantages but it should be considered.
>
> The bigger problem for consumers is the more general issue of cases where more complex DwC does not clearly indicate which values would be the best to select for the Simple DwC what-species-occurred-when-and-where question.  Multiple identifications without a preferred identification is the real problem case.  Take the first example under "Classes and Containment" at http://rs.tdwg.org/dwc/terms/guides/xml/index.htm - this shows a specimen with the following identifications:
>
> 1 - Ctenomys sp. by Richard Sage in 2000
> 2 - Ctenomys sociabilis by James L Patton on 14 September 2001
>
> There is no indication whether one of these is preferred (ABCD used an attribute to indicate this).  How should a consumer needing Simple DwC (e.g. GBIF) interpret this?  Is it safe to assume that the most recent identification is preferred?  That may normally be correct but there are good reasons why it could be a mistaken inference.  In the absence of further detail, should the consumer simply treat this as a Ctenomys of unknown species (in other words select the narrowest taxon including all taxa referenced by identifications).  This seems really unfortunate.
>
> There are various ways to solve the problem, but I believe the value of DwC will best be maintained and enhanced by our ensuring this issue is handled in the specification.
>
> By the way, I find something else really puzzling about this example from the XML Guide.  Why, oh why, does the Taxon object link back to the Identification object rather than the other way around????  This seems to me seriously to compromise the idea that we can reuse a DwC Taxon class in a semantically consistent fashion across collection data and species checklists.
>
> Thanks,
>
> Donald
>
>
>
> Donald Hobern, Director, Atlas of Living Australia
> CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
> Phone: (02) 62464352 Mobile: 0437990208
> Email: Donald.Hobern at csiro.au
> Web: http://www.ala.org.au/
>
> -----Original Message-----
> From: tdwg-content-bounces at lists.tdwg.org [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of joel sachs
> Sent: Wednesday, 27 July 2011 1:48 AM
> To: tdwg-content at lists.tdwg.org
> Subject: [tdwg-content] Darwin Core vs. Simple Darwin Core
>
>
> Darwin Core is one of my favourite things. It's simple, elegant, and flexible. I wasn't there at design time, so I don't know if it was designed with the semantic web in mind, but it looks like it. It is, as John put it, primarily a collection of terms [and their definitions]. So if two people/agents use the same terms, they will share the same semantics. (This is why I think that a "more semantic Darwin Core" is not the appropriate goal for a Darwin Core/rdf working group.)
>
> I'm concerned that there's so much confusion concerning DwC, since confusion is (typically) a barrier to adoption.
>
> One source of confusion is Simple Darwin Core. A huge fraction of DwC records can be expressed as spreadsheets. Since *all* Simple DwC records can be expressed as spreadsheets, many people think
>
> Simple Darwin Core = spreadsheet-expressible Darwin Core
>
> (which isn't true). This means that if they want to express their data as a spreadsheet, they think they need to conform to Simple Darwin Core.
>
> The requirement of Simple Darwin Core is that there be no repeated elements. But the requirement for spreadsheet-expressible Darwin Core is that there be no repeated nested elements. I previously argued
> (http://lists.tdwg.org/pipermail/tdwg-content/2011-January/002220.html) in favour of using subscripts to represent elements in repeated nests (thereby permitting their use in spreadsheets). Even if we don't permit that, I'm not sure that the benefits of maintaing a separate Simple Darwin Core standard, in addition to the regular Darwin Core standard, are greater than the costs in terms of giving people wrong ideas. (I prefer the presentation at http://rs.tdwg.org/dwc/terms/guides/xml/index.htm,
> where Simple DwC is presented as simply one of several XML schemas for Darwin Core.)
>
> I *think* I see the motivation for Simple DwC. Suppose X wants to use Darwin Core, but doesn't know much about databases, and just wants to put all his data in a spreadsheet. He might not know what a repeated, nested data structure is. So it's easiest to just say to him "don't repeat any elements, and you'll be fine - your records will be spreadsheet-expressible". I agree that that's a benefit. Are there others?
>
> Thanks -
> Joel.
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>