[tdwg-content] Darwin Core XML data using class terms

Donald.Hobern at csiro.au Donald.Hobern at csiro.au
Thu Aug 4 06:09:32 CEST 2011

Thanks, Joel.

I didn't get any other responses.  You are correct that my concern is the absence of well-defined heuristics for moving from a complex data record to the preferred simple record.  I'd still be interested in any projects which had sought to address this issue.  It will certainly arise as the use of class-based DwC expands beyond projects focussed on exploring semantic technology within biodiversity informatics to reach those who need much more black-and-white records (or the ability to reject records as insufficiently black-and-white).

The GBIF Data Portal certainly faced the same issue with consumption of ABCD data alongside DwC and I was never totally comfortable with the results.  Part of the problem was that the semantics of an ABCD Unit record with multiple Identifications was not tightly defined.  Some databases used this as a way to present information on a specimen or sample which in fact contains more than one species.  Other databases used it to detail a history of identifications for a specimen, with or without a clear indication of which identification is currently accepted.  

The issue is closely parallel to those in tagging images with multiple scientific names in the EOL Flickr group.  Tagging with multiple names may indicate multiple taxa or an attempt to be useful to consumers using different classifications.  With images for EOL, the problem may not be severe - the implication is that the photo could appropriately be displayed on any page associated with any of the supplied taxonomy:* tags.  For a specimen identification, determining the intent behind multiple identifications is much more central to understanding how the data can be used.


Donald Hobern, Director, Atlas of Living Australia
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
Phone: (02) 62464352 Mobile: 0437990208 
Email: Donald.Hobern at csiro.au
Web: http://www.ala.org.au/ 

-----Original Message-----
From: joel sachs [mailto:jsachs at csee.umbc.edu] 
Sent: Thursday, 4 August 2011 12:17 AM
To: Hobern, Donald (CES, Black Mountain)
Cc: tdwg-content at lists.tdwg.org
Subject: Re: [tdwg-content] Darwin Core XML data using class terms

On Mon, 25 Jul 2011, Donald.Hobern at csiro.au wrote:

> I've been giving some thought to the class terms and the examples of 
> class-based DwC data in the Darwin Core XML Guide 
> (http://rs.tdwg.org/dwc/terms/guides/xml/index.htm - see the Classes 
> and Containment section).  As this list has recently been discussing 
> Simple DwC and the restrictions it imposes, I'd be interested in 
> knowing of the experience of any projects that have sought to use the 
> class terms as a mechanism for streaming richer data using Darwin Core.
> Can anyone provide examples of projects sharing data in this way?

Did anyone respond to you off list, Donald? DeVries and Baskauf/Webb are both exploring class-based approaches, although they found it necessary to go outside Darwin Core to do so. For the bioblitz, we moved from (almost) Simple DwC in the Google Spreadsheet to class-based DwC in the RDF. I'm curious to see other examples, whether rdf or xml-schema based.

> Are there any findings on the success of exporting and consuming 
> class-based DwC data, in particular on the consistency of how such 
> data are interpreted?
> Has anyone given thought to how a consumer would derive the equivalent 
> of a set of best-fit Simple DwC records from the kind of class-based 
> DwC presented in the guide's examples?

One can always de-normlalize (i.e. do a join on occurenceID) to get a 
Simple DwC representation. But there will be multiple records for some 
occurrences, one for each identification. I assume that what you're really 
asking is either: "what heuristics should be used to determine the correct 
identification, when multiple identification exists"? Or "what technical 
solution (e.g. an appropriate annotation framework) can we deploy to 
ensure that identificationVerificationStatus fields exist where possible?" 
Is that correct?


> Thanks,
> Donald
> [cid:image001.png at 01CC4ADE.F6E045F0]
> Donald Hobern, Director, Atlas of Living Australia
> CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601
> Phone: (02) 62464352 Mobile: 0437990208
> Email: Donald.Hobern at csiro.au<mailto:Donald.Hobern at csiro.au>
> Web: http://www.ala.org.au/

More information about the tdwg-content mailing list