[tdwg-content] Darwin Core vernacularName field

joel sachs jsachs at csee.umbc.edu
Fri Jul 22 17:42:15 CEST 2011


I'm sorry - I still don't understand. How does telling a data 
provider that he can't re-use a field make his life easier?


On Fri, 22 Jul 2011, John Wieczorek wrote:

> It's a storage issue, a generation issue, a transportation issue, a
> processing issue, a consumption issue - it affects all aspects of a
> workflow. It is meant to help those whose lives are not steeped in
> informatics, and who have no desire to tread there - in fact, the
> majority of those providing data and who would not be able to under
> current conditions without tools such at the GBIF Integrated
> Publishing Toolkit (IPT) or without assistance.
> On Fri, Jul 22, 2011 at 8:16 AM, joel sachs <jsachs at csee.umbc.edu> wrote:
>> Hi John,
>> The description of Simple Darwin Core justifies the restriction by saying
>> that it's just like the restriction in relational databases. But that's a
>> storage issue, not a representation issue. Maybe my real question is: Whose
>> life is Simple Darwin Core supposed to simplify, the data provider's, or the
>> aggregator's?
>> Joel.
>> On Fri, 22 Jul 2011, John Wieczorek wrote:
>>> Joel, is the description of the Simple Darwin Core
>>> (http://rs.tdwg.org/dwc/terms/simple/index.htm) insufficient to
>>> explain the restriction?
>>> I would say that the goal of many of "us" is to encourage everyone to
>>> share biodiversity information. I would even go so far as to say that
>>> our success as biodiversity informaticians will be to make sure that
>>> most people never have to think in rdf. Like any good infrastructure,
>>> it should disappear from everyday concern.
>>> On Fri, Jul 22, 2011 at 6:42 AM, joel sachs <jsachs at csee.umbc.edu> wrote:
>>>> I'd love it if someone could explain the reason for this restriction on
>>>> Simple Darwin Core. It seems somewhat anachronistic, given that we're
>>>> encouraging everyone to think in rdf. On the representation side,
>>>> repetition
>>>> of a field poses no problems for spreadsheets, xml, or rdf.  On the
>>>> storage
>>>> side, it is an issue for RDBMS systems; but, consuming applications can
>>>> address this by creating the kinds of records Bob describes below. Am I
>>>> missing something?
>>>> Many thanks,
>>>> Joel.
>>>> On Thu, 21 Jul 2011, Bob Morris wrote:
>>>>> There's a general issue with repeated attributes in a metadata record
>>>>> of any kind.  Depending on the representation language, when there is
>>>>> more than one such thing in the record, it can be difficult to specify
>>>>> any linkages between them when they are semantically related.
>>>>> One general solution is to have multiple metadata records for the same
>>>>> resource. This can be costly if there is a powerful reason that every
>>>>> such record should carry the complete set of attributes except for the
>>>>> repeated ones, but in the case you put on the table, I think the only
>>>>> powerful reason would take the form "There are a lot of stupid DwC
>>>>> applications out there that might discover a record that has nothing
>>>>> in it but, say, the French vernacular name and a resourceID, and stop
>>>>> there without ever looking for/at another record with the same
>>>>> resourceID and more comprehensive metadata, and integrating the
>>>>> results at the application level."
>>>>> A response might be "But the point of simple DwC is to support simple
>>>>> applications." But "simple application" is not the same thing as
>>>>> "simple minded application", and my guess is that addressing the issue
>>>>> of multiple metadata records at the application side is, for many
>>>>> applications, less programming effort than other workarounds.
>>>>> Bob Morris
>>>>> On Thu, Jul 21, 2011 at 11:23 AM, Geoffrey Allen <gsallen at unb.ca> wrote:
>>>>>> Greeting,
>>>>>> I have recently begun the process of digitising the 60,000 specimen
>>>>>> vouchers
>>>>>> from the UNB herbarium. The textual data for 40,000+ of those has
>>>>>> already
>>>>>> been entered into a database, and I am now trying to map those values
>>>>>> to
>>>>>> DwC
>>>>>> so that we may share the data with other collections.
>>>>>> I have some concern over the fact that simple DwC does not allow the
>>>>>> repetition or extension of certain fields. The vernacularName field is
>>>>>> a
>>>>>> particular problem. New Brunswick is Canada's only officially bilingual
>>>>>> province, as such, our specimens are all identified with both their
>>>>>> English
>>>>>> and French common names in the database. It would be very useful if we
>>>>>> could
>>>>>> extend DwC, creating something along the lines of <vernacularName
>>>>>> lang=en>,
>>>>>> or allow nesting of elements, perhaps in the form:
>>>>>> <vernacularName>
>>>>>> <English>Chives</English>
>>>>>> <French>Ciboulette, brulotte</French>
>>>>>> </vernacularName>
>>>>>> The other option, as I see it, is that we store the English and French
>>>>>> common names in our own fields, and then concatenate the two to create
>>>>>> the
>>>>>> DwC:vernacularName field. I see this option as less than ideal since it
>>>>>> may
>>>>>> hinder search/browsability. It may also cause a host of other problems
>>>>>> from
>>>>>> interpreting to storing the data. The herbarium with whom we first
>>>>>> intent
>>>>>> to
>>>>>> share the data has already expressed a concern that their system cannot
>>>>>> handle the diacritics found in many of the French names (!). They would
>>>>>> like
>>>>>> the Eng. common names, but not the French. This is more difficult to
>>>>>> achieve
>>>>>> if we concat the values.
>>>>>> One additional thought is that the herbarium's imprint, _Flora of New
>>>>>> Brunswick_, also includes common names in Maliseet and Mi'kmaq wherever
>>>>>> possible. Although these two aboriginal languages do not currently
>>>>>> exist
>>>>>> in
>>>>>> the dataset we are using, there is the potential that they may be added
>>>>>> at
>>>>>> some point in the future.
>>>>>> It seems to me that the repetition of fields may be necessary in other
>>>>>> instances too. I am having some difficulty figuring out how to record
>>>>>> all
>>>>>> the location data we have for the specimens, which are indicated using
>>>>>> verbal descriptions, Lat/Long, UTM, and NTS coordinates - in many cases
>>>>>> using all 4 for a single sample, but I will save the details for
>>>>>> another
>>>>>> posting.
>>>>>> I will watch for the group's thoughts on this problem.
>>>>>> Many thanks,
>>>>>> Geoffrey
>>>>>> --------------------------------------------
>>>>>> Geoffrey Allen
>>>>>> Digital Projects Librarian
>>>>>> Electronic Text Centre
>>>>>> Harriet Irving Library
>>>>>> University of New Brunswick
>>>>>> Fredericton, NB  E3B 5H5
>>>>>> Tel: (506) 447-3250
>>>>>> Fax: (506) 453-4595
>>>>>> gsallen at unb.ca
>>>>>> _______________________________________________
>>>>>> tdwg-content mailing list
>>>>>> tdwg-content at lists.tdwg.org
>>>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>>> --
>>>>> Robert A. Morris
>>>>> Emeritus Professor  of Computer Science
>>>>> UMASS-Boston
>>>>> 100 Morrissey Blvd
>>>>> Boston, MA 02125-3390
>>>>> IT Staff
>>>>> Filtered Push Project
>>>>> Department of Organismal and Evolutionary Biology
>>>>> Harvard University
>>>>> email: morris.bob at gmail.com
>>>>> web: http://efg.cs.umb.edu/
>>>>> web: http://etaxonomy.org/mw/FilteredPush
>>>>> http://www.cs.umb.edu/~ram
>>>>> phone (+1) 857 222 7992 (mobile)
>>>>> _______________________________________________
>>>>> tdwg-content mailing list
>>>>> tdwg-content at lists.tdwg.org
>>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>> _______________________________________________
>>>> tdwg-content mailing list
>>>> tdwg-content at lists.tdwg.org
>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content

More information about the tdwg-content mailing list