Darwin Core extension on Freshwater Biodiversity Resources

Dear Dr Michael Haft, I noticed your Tweet yesterday [1] and the reply from Laura regarding developing a Darwin Core extension [2]. Assuming that you might be interested to develop an extension for publishing data using the Darwin Core Archive format [3][4] I see this as a two step process. First you will need to identify the individual terms (data properties). The Darwin Core standard provides a vocabulary including the most important terms for describing primary biodiversity data records. You are strongly encouraged to reuse terms from the Darwin Core standard (and other standards) as far as is possible. New terms for the Darwin Core standard can be proposed using the link as provided by Laura [2]. Only terms of general utility for a broad range of biodiversity datasets are intended to be included in the Darwin Core standard. Terms with a more limited interest for a thematic community can be defined in a separate vocabulary to extend the list of Darwin Core terms. As an example, in the community for plant genetic resources (germplasm) we have developed such an "extension" [5] following the Darwin Core format and based on previously already established terms used in the plant genetic resources community [6][7]. If your community is already publishing a vocabulary of terms following a similar format as the Darwin Core and with individual term definitions accessible using a URI or another persistent identifier, then you may of course use your own community vocabulary of terms in the next step - without creating a new "extension" to the Darwin Core terms. The next step in the process for sharing your data using the Darwin Core Archive format is to develop an extension to the Darwin Core Archive schema. (Notice that there is a difference between the "extension" to the Darwin Core terms, and an "extension" to the Darwin Core Archive schema). The Darwin Core Archive is based on two core entities namely the Darwin Core Occurrence [8][9] or the Darwin Core Taxon [10][11]. The Darwin Core Archive format is only intended for sharing datasets with these unit types (rowType). Further work is required if you wish to share other data types using the Darwin Core Archive format. Would your data type perhaps be on the "Occurrence" [8][9] of organisms in Fresh Water Systems? The Global Biodiversity Information Facility (GBIF) provides a Sandbox [12] for testing new schema-extensions for the Darwin Core Archive format. Schema-extensions included here can be tested using the GBIF Integrated Data Publishing Toolkit (IPT) [13]. Notice that you need to select the "Sandbox mode" when installing the IPT and that you will need to reinstall the IPT if you later wish to use it in production mode. Remember also that you can use the Darwin Core Archive format without using the GBIF IPT. GBIF provides a Vocabulary Server [14] that may assist you with the development of your schema-extension (for the Darwin Core Archive). The GBIF Vocabulary Server can assist you in building the list of terms and providing the XML format for your extension that can be included to the Sandbox registry [12]. Notice that the Vocabulary Server is not automatically updating the GBIF Resources Registry [12] and that you will need to make contact with the GBIF secretariat (for example with me) to add your new extension to the Sandbox. I hope this is of help, and please don't hesitate to make further contact if you have questions or comments. I have copied this message to the TDWG mailing list to invite further comments on the development of Darwin Core extensions. Best regards Dag Endresen [1] https://twitter.com/#!/mikehaft/status/162552503228055553 [2] http://code.google.com/p/darwincore/wiki/SubmittingIssues [3] http://rs.tdwg.org/dwc/terms/guides/text/index.htm [4] http://tools.gbif.org/dwca-assistant/ [5] http://code.google.com/p/darwincore-germplasm/ [6] http://www.bioversityinternational.org/index.php?id=19&user_bioversitypublications_pi1[showUid]=2192 [7] http://apps3.fao.org/wiews/mcpd/MCPD_Dec2001_EN.pdf [8] http://rs.tdwg.org/dwc/terms/Occurrence [9] http://rs.gbif.org/core/dwc_occurrence.xml [10] http://rs.tdwg.org/dwc/terms/Taxon [11] http://rs.gbif.org/core/dwc_taxon.xml [12] http://rs.gbif.org/sandbox/extension/ [13] http://code.google.com/p/gbif-providertoolkit/ [14] http://vocabularies.gbif.org/extensions -- Dag Endresen, PhD Knowledge Systems Engineer Global Biodiversity Information Facility (GBIF) Universitetsparken 15, DK-2100 Copenhagen, Denmark http://community.gbif.org/pg/profile/dag.endresen
participants (1)
Dag Endresen (GBIF)