Re: [tdwg-content] Darwin Core vernacularName field

22 Jul 2011


      Your point is fair enough, and living with Simple DwC is a Good Thing
for people with not much experience. But by and large the people who
write and support tools like IPT and similar aids are experienced
software engineers who would have little trouble implementing, e.g.
serving multiple records against the same ResourceID.  The issue would
then become what problems does this present to existing or future
consuming applications, and how does the cost of solving those
problems compare to that of solving those that arise from some other
solution, such as having to include an atomizer to parse a
concatenation-based string. (Probably ability to that do that carries
a somewhat lower experience barrier to entry than integrating
records.)

Bob

On Fri, Jul 22, 2011 at 11:31 AM, John Wieczorek <tuco@berkeley.edu> wrote:
...
It's a storage issue, a generation issue, a transportation issue, a
processing issue, a consumption issue - it affects all aspects of a
workflow. It is meant to help those whose lives are not steeped in
informatics, and who have no desire to tread there - in fact, the
majority of those providing data and who would not be able to under
current conditions without tools such at the GBIF Integrated
Publishing Toolkit (IPT) or without assistance.
On Fri, Jul 22, 2011 at 8:16 AM, joel sachs <jsachs@csee.umbc.edu> wrote:
...
Hi John,
The description of Simple Darwin Core justifies the restriction by saying
that it's just like the restriction in relational databases. But that's a
storage issue, not a representation issue. Maybe my real question is: Whose
life is Simple Darwin Core supposed to simplify, the data provider's, or the
aggregator's?
Joel.
On Fri, 22 Jul 2011, John Wieczorek wrote:
...
Joel, is the description of the Simple Darwin Core
(http://rs.tdwg.org/dwc/terms/simple/index.htm) insufficient to
explain the restriction?
I would say that the goal of many of "us" is to encourage everyone to
share biodiversity information. I would even go so far as to say that
our success as biodiversity informaticians will be to make sure that
most people never have to think in rdf. Like any good infrastructure,
it should disappear from everyday concern.
On Fri, Jul 22, 2011 at 6:42 AM, joel sachs <jsachs@csee.umbc.edu> wrote:
...
I'd love it if someone could explain the reason for this restriction on
Simple Darwin Core. It seems somewhat anachronistic, given that we're
encouraging everyone to think in rdf. On the representation side,
repetition
of a field poses no problems for spreadsheets, xml, or rdf.  On the
storage
side, it is an issue for RDBMS systems; but, consuming applications can
address this by creating the kinds of records Bob describes below. Am I
missing something?
Many thanks,
Joel.
On Thu, 21 Jul 2011, Bob Morris wrote:
...
There's a general issue with repeated attributes in a metadata record
of any kind.  Depending on the representation language, when there is
more than one such thing in the record, it can be difficult to specify
any linkages between them when they are semantically related.
One general solution is to have multiple metadata records for the same
resource. This can be costly if there is a powerful reason that every
such record should carry the complete set of attributes except for the
repeated ones, but in the case you put on the table, I think the only
powerful reason would take the form "There are a lot of stupid DwC
applications out there that might discover a record that has nothing
in it but, say, the French vernacular name and a resourceID, and stop
there without ever looking for/at another record with the same
resourceID and more comprehensive metadata, and integrating the
results at the application level."
A response might be "But the point of simple DwC is to support simple
applications." But "simple application" is not the same thing as
"simple minded application", and my guess is that addressing the issue
of multiple metadata records at the application side is, for many
applications, less programming effort than other workarounds.
Bob Morris
On Thu, Jul 21, 2011 at 11:23 AM, Geoffrey Allen <gsallen@unb.ca> wrote:
...
Greeting,
I have recently begun the process of digitising the 60,000 specimen
vouchers
from the UNB herbarium. The textual data for 40,000+ of those has
already
been entered into a database, and I am now trying to map those values
to
DwC
so that we may share the data with other collections.
I have some concern over the fact that simple DwC does not allow the
repetition or extension of certain fields. The vernacularName field is
a
particular problem. New Brunswick is Canada's only officially bilingual
province, as such, our specimens are all identified with both their
English
and French common names in the database. It would be very useful if we
could
extend DwC, creating something along the lines of <vernacularName
lang=en>,
or allow nesting of elements, perhaps in the form:
<vernacularName>
<English>Chives</English>
<French>Ciboulette, brulotte</French>
</vernacularName>
The other option, as I see it, is that we store the English and French
common names in our own fields, and then concatenate the two to create
the
DwC:vernacularName field. I see this option as less than ideal since it
may
hinder search/browsability. It may also cause a host of other problems
from
interpreting to storing the data. The herbarium with whom we first
intent
to
share the data has already expressed a concern that their system cannot
handle the diacritics found in many of the French names (!). They would
like
the Eng. common names, but not the French. This is more difficult to
achieve
if we concat the values.
One additional thought is that the herbarium's imprint, _Flora of New
Brunswick_, also includes common names in Maliseet and Mi'kmaq wherever
possible. Although these two aboriginal languages do not currently
exist
in
the dataset we are using, there is the potential that they may be added
at
some point in the future.
It seems to me that the repetition of fields may be necessary in other
instances too. I am having some difficulty figuring out how to record
all
the location data we have for the specimens, which are indicated using
verbal descriptions, Lat/Long, UTM, and NTS coordinates - in many cases
using all 4 for a single sample, but I will save the details for
another
posting.
I will watch for the group's thoughts on this problem.
Many thanks,
Geoffrey
--------------------------------------------
Geoffrey Allen
Digital Projects Librarian
Electronic Text Centre
Harriet Irving Library
University of New Brunswick
Fredericton, NB  E3B 5H5
Tel: (506) 447-3250
Fax: (506) 453-4595
gsallen@unb.ca
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Robert A. Morris
Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
IT Staff
Filtered Push Project
Department of Organismal and Evolutionary Biology
Harvard University
email: morris.bob@gmail.com
web: http://efg.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- 
Robert A. Morris

Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
IT Staff
Filtered Push Project
Department of Organismal and Evolutionary Biology
Harvard University


email: morris.bob@gmail.com
web: http://efg.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)