[tdwg-content] Darwin Core extension on Freshwater Biodiversity Resources

Dag Endresen (GBIF) dendresen at gbif.org
Fri Jan 27 12:05:52 CET 2012

Dear Dr Michael Haft,

I noticed your Tweet yesterday [1] and the reply from Laura regarding 
developing a Darwin Core extension [2]. Assuming that you might be 
interested to develop an extension for publishing data using the Darwin 
Core Archive format [3][4] I see this as a two step process. First you 
will need to identify the individual terms (data properties). The Darwin 
Core standard provides a vocabulary including the most important terms 
for describing primary biodiversity data records. You are strongly 
encouraged to reuse terms from the Darwin Core standard (and other 
standards) as far as is possible. New terms for the Darwin Core standard 
can be proposed using the link as provided by Laura [2]. Only terms of 
general utility for a broad range of biodiversity datasets are intended 
to be included in the Darwin Core standard. Terms with a more limited 
interest for a thematic community can be defined in a separate 
vocabulary to extend the list of Darwin Core terms. As an example, in 
the community for plant genetic resources (germplasm) we have developed 
such an "extension" [5] following the Darwin Core format and based on 
previously already established terms used in the plant genetic resources 
community [6][7]. If your community is already publishing a vocabulary 
of terms following a similar format as the Darwin Core and with 
individual term definitions accessible using a URI or another persistent 
identifier, then you may of course use your own community vocabulary of 
terms in the next step - without creating a new "extension" to the 
Darwin Core terms.

The next step in the process for sharing your data using the Darwin 
Core Archive format is to develop an extension to the Darwin Core 
Archive schema. (Notice that there is a difference between the 
"extension" to the Darwin Core terms, and an "extension" to the Darwin 
Core Archive schema). The Darwin Core Archive is based on two core 
entities namely the Darwin Core Occurrence [8][9] or the Darwin Core 
Taxon [10][11]. The Darwin Core Archive format is only intended for 
sharing datasets with these unit types (rowType). Further work is 
required if you wish to share other data types using the Darwin Core 
Archive format. Would your data type perhaps be on the "Occurrence" 
[8][9] of organisms in Fresh Water Systems? The Global Biodiversity 
Information Facility (GBIF) provides a Sandbox [12] for testing new 
schema-extensions for the Darwin Core Archive format. Schema-extensions 
included here can be tested using the GBIF Integrated Data Publishing 
Toolkit (IPT) [13]. Notice that you need to select the "Sandbox mode" 
when installing the IPT and that you will need to reinstall the IPT if 
you later wish to use it in production mode. Remember also that you can 
use the Darwin Core Archive format without using the GBIF IPT.

GBIF provides a Vocabulary Server [14] that may assist you with the 
development of your schema-extension (for the Darwin Core Archive). The 
GBIF Vocabulary Server can assist you in building the list of terms and 
providing the XML format for your extension that can be included to the 
Sandbox registry [12]. Notice that the Vocabulary Server is not 
automatically updating the GBIF Resources Registry [12] and that you 
will need to make contact with the GBIF secretariat (for example with 
me) to add your new extension to the Sandbox.

I hope this is of help, and please don't hesitate to make further 
contact if you have questions or comments.

I have copied this message to the TDWG mailing list to invite further 
comments on the development of Darwin Core extensions.

Best regards
Dag Endresen

[1] https://twitter.com/#!/mikehaft/status/162552503228055553
[2] http://code.google.com/p/darwincore/wiki/SubmittingIssues
[3] http://rs.tdwg.org/dwc/terms/guides/text/index.htm
[4] http://tools.gbif.org/dwca-assistant/
[5] http://code.google.com/p/darwincore-germplasm/
[7] http://apps3.fao.org/wiews/mcpd/MCPD_Dec2001_EN.pdf
[8] http://rs.tdwg.org/dwc/terms/Occurrence
[9] http://rs.gbif.org/core/dwc_occurrence.xml
[10] http://rs.tdwg.org/dwc/terms/Taxon
[11] http://rs.gbif.org/core/dwc_taxon.xml
[12] http://rs.gbif.org/sandbox/extension/
[13] http://code.google.com/p/gbif-providertoolkit/
[14] http://vocabularies.gbif.org/extensions

Dag Endresen, PhD
Knowledge Systems Engineer
Global Biodiversity Information Facility (GBIF)
Universitetsparken 15, DK-2100 Copenhagen, Denmark

More information about the tdwg-content mailing list