[tdwg-content] Darwin Core vernacularName field
joel sachs
jsachs at csee.umbc.edu
Fri Jul 22 17:16:32 CEST 2011
Hi John,
The description of Simple Darwin Core justifies the restriction by
saying that it's just like the restriction in relational databases. But
that's a storage issue, not a representation issue. Maybe my real
question is: Whose life is Simple Darwin Core supposed to simplify, the
data provider's, or the aggregator's?
Joel.
On Fri, 22 Jul 2011, John Wieczorek wrote:
> Joel, is the description of the Simple Darwin Core
> (http://rs.tdwg.org/dwc/terms/simple/index.htm) insufficient to
> explain the restriction?
>
> I would say that the goal of many of "us" is to encourage everyone to
> share biodiversity information. I would even go so far as to say that
> our success as biodiversity informaticians will be to make sure that
> most people never have to think in rdf. Like any good infrastructure,
> it should disappear from everyday concern.
>
> On Fri, Jul 22, 2011 at 6:42 AM, joel sachs <jsachs at csee.umbc.edu> wrote:
>> I'd love it if someone could explain the reason for this restriction on
>> Simple Darwin Core. It seems somewhat anachronistic, given that we're
>> encouraging everyone to think in rdf. On the representation side, repetition
>> of a field poses no problems for spreadsheets, xml, or rdf. On the storage
>> side, it is an issue for RDBMS systems; but, consuming applications can
>> address this by creating the kinds of records Bob describes below. Am I
>> missing something?
>>
>> Many thanks,
>> Joel.
>>
>>
>> On Thu, 21 Jul 2011, Bob Morris wrote:
>>
>>> There's a general issue with repeated attributes in a metadata record
>>> of any kind. Depending on the representation language, when there is
>>> more than one such thing in the record, it can be difficult to specify
>>> any linkages between them when they are semantically related.
>>>
>>> One general solution is to have multiple metadata records for the same
>>> resource. This can be costly if there is a powerful reason that every
>>> such record should carry the complete set of attributes except for the
>>> repeated ones, but in the case you put on the table, I think the only
>>> powerful reason would take the form "There are a lot of stupid DwC
>>> applications out there that might discover a record that has nothing
>>> in it but, say, the French vernacular name and a resourceID, and stop
>>> there without ever looking for/at another record with the same
>>> resourceID and more comprehensive metadata, and integrating the
>>> results at the application level."
>>>
>>> A response might be "But the point of simple DwC is to support simple
>>> applications." But "simple application" is not the same thing as
>>> "simple minded application", and my guess is that addressing the issue
>>> of multiple metadata records at the application side is, for many
>>> applications, less programming effort than other workarounds.
>>>
>>>
>>> Bob Morris
>>>
>>>
>>> On Thu, Jul 21, 2011 at 11:23 AM, Geoffrey Allen <gsallen at unb.ca> wrote:
>>>>
>>>> Greeting,
>>>> I have recently begun the process of digitising the 60,000 specimen
>>>> vouchers
>>>> from the UNB herbarium. The textual data for 40,000+ of those has already
>>>> been entered into a database, and I am now trying to map those values to
>>>> DwC
>>>> so that we may share the data with other collections.
>>>> I have some concern over the fact that simple DwC does not allow the
>>>> repetition or extension of certain fields. The vernacularName field is a
>>>> particular problem. New Brunswick is Canada's only officially bilingual
>>>> province, as such, our specimens are all identified with both their
>>>> English
>>>> and French common names in the database. It would be very useful if we
>>>> could
>>>> extend DwC, creating something along the lines of <vernacularName
>>>> lang=en>,
>>>> or allow nesting of elements, perhaps in the form:
>>>> <vernacularName>
>>>> <English>Chives</English>
>>>> <French>Ciboulette, brulotte</French>
>>>> </vernacularName>
>>>> The other option, as I see it, is that we store the English and French
>>>> common names in our own fields, and then concatenate the two to create
>>>> the
>>>> DwC:vernacularName field. I see this option as less than ideal since it
>>>> may
>>>> hinder search/browsability. It may also cause a host of other problems
>>>> from
>>>> interpreting to storing the data. The herbarium with whom we first intent
>>>> to
>>>> share the data has already expressed a concern that their system cannot
>>>> handle the diacritics found in many of the French names (!). They would
>>>> like
>>>> the Eng. common names, but not the French. This is more difficult to
>>>> achieve
>>>> if we concat the values.
>>>> One additional thought is that the herbarium's imprint, _Flora of New
>>>> Brunswick_, also includes common names in Maliseet and Mi'kmaq wherever
>>>> possible. Although these two aboriginal languages do not currently exist
>>>> in
>>>> the dataset we are using, there is the potential that they may be added
>>>> at
>>>> some point in the future.
>>>> It seems to me that the repetition of fields may be necessary in other
>>>> instances too. I am having some difficulty figuring out how to record all
>>>> the location data we have for the specimens, which are indicated using
>>>> verbal descriptions, Lat/Long, UTM, and NTS coordinates - in many cases
>>>> using all 4 for a single sample, but I will save the details for another
>>>> posting.
>>>> I will watch for the group's thoughts on this problem.
>>>> Many thanks,
>>>> Geoffrey
>>>> --------------------------------------------
>>>> Geoffrey Allen
>>>> Digital Projects Librarian
>>>> Electronic Text Centre
>>>> Harriet Irving Library
>>>> University of New Brunswick
>>>> Fredericton, NB E3B 5H5
>>>> Tel: (506) 447-3250
>>>> Fax: (506) 453-4595
>>>> gsallen at unb.ca
>>>>
>>>> _______________________________________________
>>>> tdwg-content mailing list
>>>> tdwg-content at lists.tdwg.org
>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Robert A. Morris
>>>
>>> Emeritus Professor of Computer Science
>>> UMASS-Boston
>>> 100 Morrissey Blvd
>>> Boston, MA 02125-3390
>>> IT Staff
>>> Filtered Push Project
>>> Department of Organismal and Evolutionary Biology
>>> Harvard University
>>>
>>>
>>> email: morris.bob at gmail.com
>>> web: http://efg.cs.umb.edu/
>>> web: http://etaxonomy.org/mw/FilteredPush
>>> http://www.cs.umb.edu/~ram
>>> phone (+1) 857 222 7992 (mobile)
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>> _______________________________________________
>> tdwg-content mailing list
>> tdwg-content at lists.tdwg.org
>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>
>>
>
More information about the tdwg-content
mailing list