Re: [tdwg-content] Darwin Core vernacularName field

22 Jul 2011

      Wait. You weren't talking about using Simple DwC for data in the
backend were you? That's not the primary purpose of Simple DwC, which
is rather, an exchange standard. It could be used in the backend if
the backend is a single flat table living within its restrictions, but
you already know that you can't live with that.

I don't think DwC per-se is in danger of painting itself into a
corner.  The GBIF Data Portal serves 293M taxon occurrence records
ingested from 339 different data providers and served in, among other
forms, DwC.  Seems like a pretty big corner to me. (Though I may be
the only regular reader of this list who doesn't know whether it
ingests and/or serves Simple DwC.... ).

I'd dare say that you'd get a lot of disagreement from TDWG members
who have designed complex XML-Schemas about how wonderful the
extension mechanisms for XML are, if one cares about structure
constrained by an XML-Schema.  One is pretty much limited to use of
xs:any --- which somewhat defeats the purpose of a schema
language---or something dynamic like WSDL and SOAP wherein the client
discovers the Schema at query time, or runtime-applied rule languages
like Schematron.

Bob Morris

On Fri, Jul 22, 2011 at 1:17 PM, Geoffrey Allen <gsallen@unb.ca> wrote:
...
I really appreciate all of the feedback on this point. Lots of interesting
ideas to think about.
Looking at the various responses, it seems to me that by not allowing the
repetition of fields, DwC limits its usefulness to information managers such
as myself. Some of the work-arounds, such as the GBIF Vernacular Name
extension that Peter Desmet pointed me towards, look useful in the
particular example that I gave, but won't work in others. It is also a
fairly complex process, belying the "simple" part of DwC.
Such a process definitely wouldn't work with some of the other fields that I
would like to repeat for our data. I quite dislike the idea of concatenating
all the the sample collectors from one specimen into a single field since
that will make the process of finding individuals more challenging. It would
not be possible to create a relational table for our collectors such as the
one for vernacular names.
The other field(s) that I need to repeat pertain to location data. Our
dataset currently lists location information in at least five different
systems (decimal Lat/Long; deg. min. sec.; UTM; NTS; and verbatim
descriptions), and often up to four are used on a given sample. At times the
UTM data is generated from degrees Lat/Log, but at other times the reverse
is true (and, of course, there is no way of telling from the database alone
which is the original). Further, small errors abound in the data that could
have crept in during conversions, or possibly even reside with the original
data. The data from over 40,000 specimens have already been entered into the
database in this manner, and no one is going to go back to double check them
all. I desperately want to keep ALL the location data out of fear that we
might not present the one accurate measure, and creating a relational table
for every geographic point in New Brunswick (let alone the rest of Canada!)
is out of the question. (This description of the locational data has been
significantly simplified from the actual reality, so please don't start
nailing me on technicalities here)
It seems to me, then, that we will have to maintain the data in our own
metadata system, and use that to generate DwC (along the lines of what Bob
Morris recommended in his first response). That's fine by us, but should,
perhaps, be of some concern to a metadata standards working group. Since
Darwin Core will not be our de facto standard, its generation and accuracy
will be of less relevance to us. I fear the DwC records will become out of
date, or start to reflect errors as it maintenance become less important to
us. Furthermore, it suggests that we may have to create duplicate sets of
data, rather than one set that can be easily harvested for use by other
collections.
From my perspective, it would be nice if we could mark this biological data
up in one well designed, flexible metadata standard. The Dublin Core group,
of course, recognised the importance of flexibility, allowing for Qualified
DC along with their simple set, and XML is as popular as it is today because
of it extensibility. I would worry that DwC might be painting itself into a
corner if it tries to adhere to too narrow a set of rules.
Perplexing stuff, indeed.
Thanks again for all your advice,
Geoffrey
--------------------------------------------
Geoffrey Allen
Digital Projects Librarian
Electronic Text Centre
Harriet Irving Library
University of New Brunswick
Fredericton, NB  E3B 5H5
Tel: (506) 447-3250
Fax: (506) 453-4595
gsallen@unb.ca
-- 
Robert A. Morris

Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
IT Staff
Filtered Push Project
Department of Organismal and Evolutionary Biology
Harvard University

email: morris.bob@gmail.com
web: http://efg.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)