[tdwg-content] Darwin Core vernacularName field

John Wieczorek tuco at berkeley.edu
Sat Jul 23 01:37:58 CEST 2011


On Fri, Jul 22, 2011 at 11:03 AM, Bob Morris <morris.bob at gmail.com> wrote:
> Wait. You weren't talking about using Simple DwC for data in the
> backend were you? That's not the primary purpose of Simple DwC, which
> is rather, an exchange standard. It could be used in the backend if
> the backend is a single flat table living within its restrictions, but
> you already know that you can't live with that.

Exactly. This illustrates very well why the Darwin Core in general and
Simple Darwin Core specifically should not be misconstrued as a model
for a database design. There was never any such intention.

>> I don't think DwC per-se is in danger of painting itself into a
> corner.  The GBIF Data Portal serves 293M taxon occurrence records
> ingested from 339 different data providers and served in, among other
> forms, DwC.  Seems like a pretty big corner to me. (Though I may be
> the only regular reader of this list who doesn't know whether it
> ingests and/or serves Simple DwC.... ).

Hopefully my last post shed some light on that. GBIF can harvest
Darwin Core Archives, among other forms and methods of transport of
Darwin Core and other data sets. In the Darwin Core Archive, there is
always a core record, which can be represented completely in Simple
Darwin Core.

[snip]

> Bob Morris
>
> On Fri, Jul 22, 2011 at 1:17 PM, Geoffrey Allen <gsallen at unb.ca> wrote:
>> I really appreciate all of the feedback on this point. Lots of interesting
>> ideas to think about.
>> Looking at the various responses, it seems to me that by not allowing the
>> repetition of fields, DwC limits its usefulness to information managers such
>> as myself. Some of the work-arounds, such as the GBIF Vernacular Name
>> extension that Peter Desmet pointed me towards, look useful in the
>> particular example that I gave, but won't work in others. It is also a
>> fairly complex process, belying the "simple" part of DwC.
>> Such a process definitely wouldn't work with some of the other fields that I
>> would like to repeat for our data. I quite dislike the idea of concatenating
>> all the the sample collectors from one specimen into a single field since
>> that will make the process of finding individuals more challenging. It would
>> not be possible to create a relational table for our collectors such as the
>> one for vernacular names.
>> The other field(s) that I need to repeat pertain to location data. Our
>> dataset currently lists location information in at least five different
>> systems (decimal Lat/Long; deg. min. sec.; UTM; NTS; and verbatim
>> descriptions), and often up to four are used on a given sample. At times the
>> UTM data is generated from degrees Lat/Log, but at other times the reverse
>> is true (and, of course, there is no way of telling from the database alone
>> which is the original). Further, small errors abound in the data that could
>> have crept in during conversions, or possibly even reside with the original
>> data. The data from over 40,000 specimens have already been entered into the
>> database in this manner, and no one is going to go back to double check them
>> all. I desperately want to keep ALL the location data out of fear that we
>> might not present the one accurate measure, and creating a relational table
>> for every geographic point in New Brunswick (let alone the rest of Canada!)
>> is out of the question. (This description of the locational data has been
>> significantly simplified from the actual reality, so please don't start
>> nailing me on technicalities here)
>> It seems to me, then, that we will have to maintain the data in our own
>> metadata system, and use that to generate DwC (along the lines of what Bob
>> Morris recommended in his first response). That's fine by us, but should,
>> perhaps, be of some concern to a metadata standards working group. Since
>> Darwin Core will not be our de facto standard, its generation and accuracy
>> will be of less relevance to us. I fear the DwC records will become out of
>> date, or start to reflect errors as it maintenance become less important to
>> us. Furthermore, it suggests that we may have to create duplicate sets of
>> data, rather than one set that can be easily harvested for use by other
>> collections.
>> From my perspective, it would be nice if we could mark this biological data
>> up in one well designed, flexible metadata standard. The Dublin Core group,
>> of course, recognised the importance of flexibility, allowing for Qualified
>> DC along with their simple set, and XML is as popular as it is today because
>> of it extensibility. I would worry that DwC might be painting itself into a
>> corner if it tries to adhere to too narrow a set of rules.
>> Perplexing stuff, indeed.
>> Thanks again for all your advice,
>> Geoffrey
>> --------------------------------------------
>> Geoffrey Allen
>> Digital Projects Librarian
>> Electronic Text Centre
>> Harriet Irving Library
>> University of New Brunswick
>> Fredericton, NB  E3B 5H5
>> Tel: (506) 447-3250
>> Fax: (506) 453-4595
>> gsallen at unb.ca
>>
>>
>
>
>
> --
> Robert A. Morris
>
> Emeritus Professor  of Computer Science
> UMASS-Boston
> 100 Morrissey Blvd
> Boston, MA 02125-3390
> IT Staff
> Filtered Push Project
> Department of Organismal and Evolutionary Biology
> Harvard University
>
>
> email: morris.bob at gmail.com
> web: http://efg.cs.umb.edu/
> web: http://etaxonomy.org/mw/FilteredPush
> http://www.cs.umb.edu/~ram
> phone (+1) 857 222 7992 (mobile)
>


More information about the tdwg-content mailing list