[tdwg-content] Darwin Core vernacularName field

Sat Jul 23 01:31:21 CEST 2011

Dear all,

I recommend reading all of the documents pertaining to the Darwin Core
(http://rs.tdwg.org/dwc/), as there are many misconceptions surfacing
here. I'll try to summarize, but this shouldn't be taken as a
substitute for the "facts", which are the standard a published.

The Darwin Core is first and foremost a set of terms - the common
ground through which we seek to convey biodiversity information, so
that when we label something with one of these terms, we all have the
potential to understand what it means. The canonical form of this set
of terms is defined in RDF. The rest is about implementation, for
which there are many documents and reference specifications as part of
the standard, and many software tools that make use of them.

The Simple Darwin Core is just one of the many ways of using Darwin
Core terms. Simple Darwin Core has its uses and its limitations. It is
easy to produce, reflects a lot of the data we have, and must be
"flat" by design. The Simple Darwin Core can be used as a "core" to
structured data as well, as implemented in the GBIF Integrated
Publishing Toolkit (IPT) software, which accepts a "core" record as a
Taxon or Occurrence and allows that to be extended in a structures
that can be expressed in a star schema - basically one level of remove
from "flat", where all extensions are related only to the core, but
each of which can express a one-to-many relationship to that core
record. This isn't "The Darwin Core", it is one example of what
someone has done with the Darwin Core in software to make it useful.
This text-based use of the Darwin Core is supported by the
documentation in the Darwin Core Text Guide
(http://rs.tdwg.org/dwc/terms/guides/text/index.htm), which describes
the specifications of a Darwin Core Archive.

That doesn't mean Darwin Core can't support highly relational data.
Reference schemas for XML
(http://darwincore.googlecode.com/svn/trunk/xsd/), and documents on
how to use XML for Darwin Core
(http://rs.tdwg.org/dwc/terms/guides/xml/index.htm) are a part of the
standard, and have been both implemented and extended on at least two
occasions (Germplasm Extension for genetic resources -
http://code.google.com/p/darwincore-germplasm/downloads/detail?name=ipt_germplasm_0_1.xml&can=2&q=;
 and Apiary Extension for Herbarium specimen labels -
http://www.apiaryproject.org/about-apiary-project). These schemas can
be used to share documents in XML using, for example, the TapirLink
software (http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink), which
implements another of TDWG's standards - TAPIR (the TDWG Access
Protocol for Information Retrieval -
http://www.tdwg.org/standards/449/). This XML-based use of the Darwin
Core is supported by the Darwin Core XML Guide
(http://rs.tdwg.org/dwc/terms/guides/xml/index.htm), which describes
how to use and construct Darwin Core schemas.

People will naturally ask about Darwin Core in RDF. The canonical form
of the Darwin Core is an RDF document, which contains all of the
attributes of every term, including the RDF attributes that relate all
of the terms to each other, and to terms in other standards such as
Dublin Core. There is no RDF Guide in the body of Darwin Core
documents published with the standard. This was intentional. It
reflects our level of competence as a community in semantic-web
technologies at the time the standard was published. Many excellent
discussions around that topic have taken place here on this list in an
effort to fill the gap for those who would like to link biodiversity
and other data in new ways. That subject begs for dedicated attention
from those who have the skills and resources to lead it forward.

In summary, the Darwin Core is a living standard (in the sense of
being active), having mechanisms to expand and adapt around its core
competency, which is the definition of the meaning of the common terms
through which we would like to promote the sharing of biodiversity
information.

On Fri, Jul 22, 2011 at 10:17 AM, Geoffrey Allen <gsallen at unb.ca> wrote:
> I really appreciate all of the feedback on this point. Lots of interesting
> ideas to think about.
> Looking at the various responses, it seems to me that by not allowing the
> repetition of fields, DwC limits its usefulness to information managers such
> as myself. Some of the work-arounds, such as the GBIF Vernacular Name
> extension that Peter Desmet pointed me towards, look useful in the
> particular example that I gave, but won't work in others. It is also a
> fairly complex process, belying the "simple" part of DwC.
> Such a process definitely wouldn't work with some of the other fields that I
> would like to repeat for our data. I quite dislike the idea of concatenating
> all the the sample collectors from one specimen into a single field since
> that will make the process of finding individuals more challenging. It would
> not be possible to create a relational table for our collectors such as the
> one for vernacular names.
> The other field(s) that I need to repeat pertain to location data. Our
> dataset currently lists location information in at least five different
> systems (decimal Lat/Long; deg. min. sec.; UTM; NTS; and verbatim
> descriptions), and often up to four are used on a given sample. At times the
> UTM data is generated from degrees Lat/Log, but at other times the reverse
> is true (and, of course, there is no way of telling from the database alone
> which is the original). Further, small errors abound in the data that could
> have crept in during conversions, or possibly even reside with the original
> data. The data from over 40,000 specimens have already been entered into the
> database in this manner, and no one is going to go back to double check them
> all. I desperately want to keep ALL the location data out of fear that we
> might not present the one accurate measure, and creating a relational table
> for every geographic point in New Brunswick (let alone the rest of Canada!)
> is out of the question. (This description of the locational data has been
> significantly simplified from the actual reality, so please don't start
> nailing me on technicalities here)
> It seems to me, then, that we will have to maintain the data in our own
> metadata system, and use that to generate DwC (along the lines of what Bob
> Morris recommended in his first response). That's fine by us, but should,
> perhaps, be of some concern to a metadata standards working group. Since
> Darwin Core will not be our de facto standard, its generation and accuracy
> will be of less relevance to us. I fear the DwC records will become out of
> date, or start to reflect errors as it maintenance become less important to
> us. Furthermore, it suggests that we may have to create duplicate sets of
> data, rather than one set that can be easily harvested for use by other
> collections.
> From my perspective, it would be nice if we could mark this biological data
> up in one well designed, flexible metadata standard. The Dublin Core group,
> of course, recognised the importance of flexibility, allowing for Qualified
> DC along with their simple set, and XML is as popular as it is today because
> of it extensibility. I would worry that DwC might be painting itself into a
> corner if it tries to adhere to too narrow a set of rules.
> Perplexing stuff, indeed.
> Thanks again for all your advice,
> Geoffrey
> --------------------------------------------
> Geoffrey Allen
> Digital Projects Librarian
> Electronic Text Centre
> Harriet Irving Library
> University of New Brunswick
> Fredericton, NB  E3B 5H5
> Tel: (506) 447-3250
> Fax: (506) 453-4595
> gsallen at unb.ca
>
>