[tdwg-content] Darwin Core extension on Freshwater Biodiversity Resources
Dag Endresen (GBIF)
dendresen at gbif.org
Fri Jan 27 12:05:52 CET 2012
Dear Dr Michael Haft,
I noticed your Tweet yesterday [1] and the reply from Laura regarding
developing a Darwin Core extension [2]. Assuming that you might be
interested to develop an extension for publishing data using the Darwin
Core Archive format [3][4] I see this as a two step process. First you
will need to identify the individual terms (data properties). The Darwin
Core standard provides a vocabulary including the most important terms
for describing primary biodiversity data records. You are strongly
encouraged to reuse terms from the Darwin Core standard (and other
standards) as far as is possible. New terms for the Darwin Core standard
can be proposed using the link as provided by Laura [2]. Only terms of
general utility for a broad range of biodiversity datasets are intended
to be included in the Darwin Core standard. Terms with a more limited
interest for a thematic community can be defined in a separate
vocabulary to extend the list of Darwin Core terms. As an example, in
the community for plant genetic resources (germplasm) we have developed
such an "extension" [5] following the Darwin Core format and based on
previously already established terms used in the plant genetic resources
community [6][7]. If your community is already publishing a vocabulary
of terms following a similar format as the Darwin Core and with
individual term definitions accessible using a URI or another persistent
identifier, then you may of course use your own community vocabulary of
terms in the next step - without creating a new "extension" to the
Darwin Core terms.
The next step in the process for sharing your data using the Darwin
Core Archive format is to develop an extension to the Darwin Core
Archive schema. (Notice that there is a difference between the
"extension" to the Darwin Core terms, and an "extension" to the Darwin
Core Archive schema). The Darwin Core Archive is based on two core
entities namely the Darwin Core Occurrence [8][9] or the Darwin Core
Taxon [10][11]. The Darwin Core Archive format is only intended for
sharing datasets with these unit types (rowType). Further work is
required if you wish to share other data types using the Darwin Core
Archive format. Would your data type perhaps be on the "Occurrence"
[8][9] of organisms in Fresh Water Systems? The Global Biodiversity
Information Facility (GBIF) provides a Sandbox [12] for testing new
schema-extensions for the Darwin Core Archive format. Schema-extensions
included here can be tested using the GBIF Integrated Data Publishing
Toolkit (IPT) [13]. Notice that you need to select the "Sandbox mode"
when installing the IPT and that you will need to reinstall the IPT if
you later wish to use it in production mode. Remember also that you can
use the Darwin Core Archive format without using the GBIF IPT.
GBIF provides a Vocabulary Server [14] that may assist you with the
development of your schema-extension (for the Darwin Core Archive). The
GBIF Vocabulary Server can assist you in building the list of terms and
providing the XML format for your extension that can be included to the
Sandbox registry [12]. Notice that the Vocabulary Server is not
automatically updating the GBIF Resources Registry [12] and that you
will need to make contact with the GBIF secretariat (for example with
me) to add your new extension to the Sandbox.
I hope this is of help, and please don't hesitate to make further
contact if you have questions or comments.
I have copied this message to the TDWG mailing list to invite further
comments on the development of Darwin Core extensions.
Best regards
Dag Endresen
[1] https://twitter.com/#!/mikehaft/status/162552503228055553
[2] http://code.google.com/p/darwincore/wiki/SubmittingIssues
[3] http://rs.tdwg.org/dwc/terms/guides/text/index.htm
[4] http://tools.gbif.org/dwca-assistant/
[5] http://code.google.com/p/darwincore-germplasm/
[6]
http://www.bioversityinternational.org/index.php?id=19&user_bioversitypublications_pi1[showUid]=2192
[7] http://apps3.fao.org/wiews/mcpd/MCPD_Dec2001_EN.pdf
[8] http://rs.tdwg.org/dwc/terms/Occurrence
[9] http://rs.gbif.org/core/dwc_occurrence.xml
[10] http://rs.tdwg.org/dwc/terms/Taxon
[11] http://rs.gbif.org/core/dwc_taxon.xml
[12] http://rs.gbif.org/sandbox/extension/
[13] http://code.google.com/p/gbif-providertoolkit/
[14] http://vocabularies.gbif.org/extensions
--
Dag Endresen, PhD
Knowledge Systems Engineer
Global Biodiversity Information Facility (GBIF)
Universitetsparken 15, DK-2100 Copenhagen, Denmark
http://community.gbif.org/pg/profile/dag.endresen
More information about the tdwg-content
mailing list