Darwin Core XML data using class terms
I've been giving some thought to the class terms and the examples of class-based DwC data in the Darwin Core XML Guide (http://rs.tdwg.org/dwc/terms/guides/xml/index.htm - see the Classes and Containment section). As this list has recently been discussing Simple DwC and the restrictions it imposes, I'd be interested in knowing of the experience of any projects that have sought to use the class terms as a mechanism for streaming richer data using Darwin Core.
Can anyone provide examples of projects sharing data in this way?
Are there any findings on the success of exporting and consuming class-based DwC data, in particular on the consistency of how such data are interpreted?
Has anyone given thought to how a consumer would derive the equivalent of a set of best-fit Simple DwC records from the kind of class-based DwC presented in the guide's examples?
Thanks,
Donald
[cid:image001.png@01CC4ADE.F6E045F0]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
On Mon, 25 Jul 2011, Donald.Hobern@csiro.au wrote:
I've been giving some thought to the class terms and the examples of class-based DwC data in the Darwin Core XML Guide (http://rs.tdwg.org/dwc/terms/guides/xml/index.htm - see the Classes and Containment section). As this list has recently been discussing Simple DwC and the restrictions it imposes, I'd be interested in knowing of the experience of any projects that have sought to use the class terms as a mechanism for streaming richer data using Darwin Core.
Can anyone provide examples of projects sharing data in this way?
Did anyone respond to you off list, Donald? DeVries and Baskauf/Webb are both exploring class-based approaches, although they found it necessary to go outside Darwin Core to do so. For the bioblitz, we moved from (almost) Simple DwC in the Google Spreadsheet to class-based DwC in the RDF. I'm curious to see other examples, whether rdf or xml-schema based.
Are there any findings on the success of exporting and consuming class-based DwC data, in particular on the consistency of how such data are interpreted? Has anyone given thought to how a consumer would derive the equivalent of a set of best-fit Simple DwC records from the kind of class-based DwC presented in the guide's examples?
One can always de-normlalize (i.e. do a join on occurenceID) to get a Simple DwC representation. But there will be multiple records for some occurrences, one for each identification. I assume that what you're really asking is either: "what heuristics should be used to determine the correct identification, when multiple identification exists"? Or "what technical solution (e.g. an appropriate annotation framework) can we deploy to ensure that identificationVerificationStatus fields exist where possible?" Is that correct?
Thanks, Joel.
Thanks,
Donald
[cid:image001.png@01CC4ADE.F6E045F0]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
Thanks, Joel.
I didn't get any other responses. You are correct that my concern is the absence of well-defined heuristics for moving from a complex data record to the preferred simple record. I'd still be interested in any projects which had sought to address this issue. It will certainly arise as the use of class-based DwC expands beyond projects focussed on exploring semantic technology within biodiversity informatics to reach those who need much more black-and-white records (or the ability to reject records as insufficiently black-and-white).
The GBIF Data Portal certainly faced the same issue with consumption of ABCD data alongside DwC and I was never totally comfortable with the results. Part of the problem was that the semantics of an ABCD Unit record with multiple Identifications was not tightly defined. Some databases used this as a way to present information on a specimen or sample which in fact contains more than one species. Other databases used it to detail a history of identifications for a specimen, with or without a clear indication of which identification is currently accepted.
The issue is closely parallel to those in tagging images with multiple scientific names in the EOL Flickr group. Tagging with multiple names may indicate multiple taxa or an attempt to be useful to consumers using different classifications. With images for EOL, the problem may not be severe - the implication is that the photo could appropriately be displayed on any page associated with any of the supplied taxonomy:* tags. For a specimen identification, determining the intent behind multiple identifications is much more central to understanding how the data can be used.
Donald
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.au Web: http://www.ala.org.au/
-----Original Message----- From: joel sachs [mailto:jsachs@csee.umbc.edu] Sent: Thursday, 4 August 2011 12:17 AM To: Hobern, Donald (CES, Black Mountain) Cc: tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] Darwin Core XML data using class terms
On Mon, 25 Jul 2011, Donald.Hobern@csiro.au wrote:
I've been giving some thought to the class terms and the examples of class-based DwC data in the Darwin Core XML Guide (http://rs.tdwg.org/dwc/terms/guides/xml/index.htm - see the Classes and Containment section). As this list has recently been discussing Simple DwC and the restrictions it imposes, I'd be interested in knowing of the experience of any projects that have sought to use the class terms as a mechanism for streaming richer data using Darwin Core.
Can anyone provide examples of projects sharing data in this way?
Did anyone respond to you off list, Donald? DeVries and Baskauf/Webb are both exploring class-based approaches, although they found it necessary to go outside Darwin Core to do so. For the bioblitz, we moved from (almost) Simple DwC in the Google Spreadsheet to class-based DwC in the RDF. I'm curious to see other examples, whether rdf or xml-schema based.
Are there any findings on the success of exporting and consuming class-based DwC data, in particular on the consistency of how such data are interpreted? Has anyone given thought to how a consumer would derive the equivalent of a set of best-fit Simple DwC records from the kind of class-based DwC presented in the guide's examples?
One can always de-normlalize (i.e. do a join on occurenceID) to get a Simple DwC representation. But there will be multiple records for some occurrences, one for each identification. I assume that what you're really asking is either: "what heuristics should be used to determine the correct identification, when multiple identification exists"? Or "what technical solution (e.g. an appropriate annotation framework) can we deploy to ensure that identificationVerificationStatus fields exist where possible?" Is that correct?
Thanks, Joel.
Thanks,
Donald
[cid:image001.png@01CC4ADE.F6E045F0]
Donald Hobern, Director, Atlas of Living Australia CSIRO Ecosystem Sciences, GPO Box 1700, Canberra, ACT 2601 Phone: (02) 62464352 Mobile: 0437990208 Email: Donald.Hobern@csiro.aumailto:Donald.Hobern@csiro.au Web: http://www.ala.org.au/
participants (2)
-
Donald.Hobern@csiro.au
-
joel sachs