Darwin Core vernacularName field

newer
Of Evidence and Individuals (Was...

older
Plea for competency questions. Was...

Geoffrey Allen

21 Jul 2011 21 Jul '11

15:23

Greeting, I have recently begun the process of digitising the 60,000 specimen vouchers from the UNB herbarium. The textual data for 40,000+ of those has already been entered into a database, and I am now trying to map those values to DwC so that we may share the data with other collections. I have some concern over the fact that simple DwC does not allow the repetition or extension of certain fields. The vernacularName field is a particular problem. New Brunswick is Canada's only officially bilingual province, as such, our specimens are all identified with both their English and French common names in the database. It would be very useful if we could extend DwC, creating something along the lines of <vernacularName lang=en>, or allow nesting of elements, perhaps in the form: <vernacularName> <English>Chives</English> <French>Ciboulette, brulotte</French> </vernacularName> The other option, as I see it, is that we store the English and French common names in our own fields, and then concatenate the two to create the DwC:vernacularName field. I see this option as less than ideal since it may hinder search/browsability. It may also cause a host of other problems from interpreting to storing the data. The herbarium with whom we first intent to share the data has already expressed a concern that their system cannot handle the diacritics found in many of the French names (!). They would like the Eng. common names, but not the French. This is more difficult to achieve if we concat the values. One additional thought is that the herbarium's imprint, _Flora of New Brunswick_, also includes common names in Maliseet and Mi'kmaq wherever possible. Although these two aboriginal languages do not currently exist in the dataset we are using, there is the potential that they may be added at some point in the future. It seems to me that the repetition of fields may be necessary in other instances too. I am having some difficulty figuring out how to record all the location data we have for the specimens, which are indicated using verbal descriptions, Lat/Long, UTM, and NTS coordinates - in many cases using all 4 for a single sample, but I will save the details for another posting. I will watch for the group's thoughts on this problem. Many thanks, Geoffrey -------------------------------------------- Geoffrey Allen Digital Projects Librarian Electronic Text Centre Harriet Irving Library University of New Brunswick Fredericton, NB E3B 5H5 Tel: (506) 447-3250 Fax: (506) 453-4595 gsallen@unb.ca

Attachments:

attachment.html (text/html — 4.3 KB)

Show replies by date

Peter Desmet

21 Jul 21 Jul

15:59

Hi Geoffrey, There is (currently) no elegant solution to publish multiple vernacular names in simple DwC, it is one of the limitations of simple DwC. It is however possible to publish multiple vernacular names (and their language and some other information) if you use extensions. There is even a official GBIF Vernacular Name extension: http://rs.gbif.org/extension/gbif/1.0/vernacularname.xml. This extension is a text file where every line is a vernacular name, with a link (via an ID) back to the core file, containing the specimen information. More information here: http://code.google.com/p/gbif-ecat/wiki/DwCArchive However, this extension was intended for the use of species checklists, with a TAXON core file. I don't think it has ever been used to link to an occurrence/specimen core file. I do know that some herbaria record vernacular names on a specimen level, but that is not really best practice, since vernacular names are in fact properties of taxa, and only by relation of specimens: Alpine alumroot and heuchère glabre are vernacular names for Heuchera glabra, and thus of every living or preserved specimen of that species (http://data.canadensys.net/vascan/name/alpine%20alumroot). If you want to see an example of a checklist Darwin Core file using a vernacular name extension, we created one for all the vascular plants in Canada (might be useful for your herbarium): http://data.canadensys.net/vascan/dataset You can also search and create your own Darwin Core files for Canadian plants using our checklist builder: http://data.canadensys.net/vascan/checklist Hope this helps, Peter On Thu, Jul 21, 2011 at 11:23, Geoffrey Allen <gsallen@unb.ca> wrote:

...

Greeting, I have recently begun the process of digitising the 60,000 specimen vouchers from the UNB herbarium. The textual data for 40,000+ of those has already been entered into a database, and I am now trying to map those values to DwC so that we may share the data with other collections. I have some concern over the fact that simple DwC does not allow the repetition or extension of certain fields. The vernacularName field is a particular problem. New Brunswick is Canada's only officially bilingual province, as such, our specimens are all identified with both their English and French common names in the database. It would be very useful if we could extend DwC, creating something along the lines of <vernacularName lang=en>, or allow nesting of elements, perhaps in the form: <vernacularName> <English>Chives</English> <French>Ciboulette, brulotte</French> </vernacularName> The other option, as I see it, is that we store the English and French common names in our own fields, and then concatenate the two to create the DwC:vernacularName field. I see this option as less than ideal since it may hinder search/browsability. It may also cause a host of other problems from interpreting to storing the data. The herbarium with whom we first intent to share the data has already expressed a concern that their system cannot handle the diacritics found in many of the French names (!). They would like the Eng. common names, but not the French. This is more difficult to achieve if we concat the values. One additional thought is that the herbarium's imprint, _Flora of New Brunswick_, also includes common names in Maliseet and Mi'kmaq wherever possible. Although these two aboriginal languages do not currently exist in the dataset we are using, there is the potential that they may be added at some point in the future. It seems to me that the repetition of fields may be necessary in other instances too. I am having some difficulty figuring out how to record all the location data we have for the specimens, which are indicated using verbal descriptions, Lat/Long, UTM, and NTS coordinates - in many cases using all 4 for a single sample, but I will save the details for another posting. I will watch for the group's thoughts on this problem. Many thanks, Geoffrey -------------------------------------------- Geoffrey Allen Digital Projects Librarian Electronic Text Centre Harriet Irving Library University of New Brunswick Fredericton, NB E3B 5H5 Tel: (506) 447-3250 Fax: (506) 453-4595 gsallen@unb.ca

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- Peter Desmet Biodiversity Informatics Manager Canadensys - www.canadensys.net Université de Montréal Biodiversity Centre 4101 rue Sherbrooke est Montreal, QC, H1X2B2 Canada Phone: 514-343-6111 #82354 Fax: 514-343-2288 Email: peter.desmet@umontreal.ca / peter.desmet.cubc@gmail.com Skype: anderhalv Public profile: http://www.linkedin.com/in/peterdesmet

Paul J. Morris

16:12

On Thu, 21 Jul 2011 11:59:19 -0400 Peter Desmet <peter.desmet@umontreal.ca> wrote:

...

I do know that some herbaria record vernacular names on a specimen level, but that is not really best practice, since vernacular names are in fact properties of taxa, and only by relation of specimens:

The place where an association of vernacular names with specimens is appropriate is in ethnobotany, where a specimen may be a voucher for the use of a particular vernacular name by a particular person or group for a particular plant. Other than that, I'd concur, vernacular names are best treated as related to taxa. -Paul -- Paul J. Morris Biodiversity Informatics Manager Harvard University Herbaria/Museum of Comparative Zoölogy mole@morris.net AA3SD PGP public key available

Bob Morris

16:13

There's a general issue with repeated attributes in a metadata record of any kind. Depending on the representation language, when there is more than one such thing in the record, it can be difficult to specify any linkages between them when they are semantically related. One general solution is to have multiple metadata records for the same resource. This can be costly if there is a powerful reason that every such record should carry the complete set of attributes except for the repeated ones, but in the case you put on the table, I think the only powerful reason would take the form "There are a lot of stupid DwC applications out there that might discover a record that has nothing in it but, say, the French vernacular name and a resourceID, and stop there without ever looking for/at another record with the same resourceID and more comprehensive metadata, and integrating the results at the application level." A response might be "But the point of simple DwC is to support simple applications." But "simple application" is not the same thing as "simple minded application", and my guess is that addressing the issue of multiple metadata records at the application side is, for many applications, less programming effort than other workarounds. Bob Morris On Thu, Jul 21, 2011 at 11:23 AM, Geoffrey Allen <gsallen@unb.ca> wrote:

...

Greeting, I have recently begun the process of digitising the 60,000 specimen vouchers from the UNB herbarium. The textual data for 40,000+ of those has already been entered into a database, and I am now trying to map those values to DwC so that we may share the data with other collections. I have some concern over the fact that simple DwC does not allow the repetition or extension of certain fields. The vernacularName field is a particular problem. New Brunswick is Canada's only officially bilingual province, as such, our specimens are all identified with both their English and French common names in the database. It would be very useful if we could extend DwC, creating something along the lines of <vernacularName lang=en>, or allow nesting of elements, perhaps in the form: <vernacularName> <English>Chives</English> <French>Ciboulette, brulotte</French> </vernacularName> The other option, as I see it, is that we store the English and French common names in our own fields, and then concatenate the two to create the DwC:vernacularName field. I see this option as less than ideal since it may hinder search/browsability. It may also cause a host of other problems from interpreting to storing the data. The herbarium with whom we first intent to share the data has already expressed a concern that their system cannot handle the diacritics found in many of the French names (!). They would like the Eng. common names, but not the French. This is more difficult to achieve if we concat the values. One additional thought is that the herbarium's imprint, _Flora of New Brunswick_, also includes common names in Maliseet and Mi'kmaq wherever possible. Although these two aboriginal languages do not currently exist in the dataset we are using, there is the potential that they may be added at some point in the future. It seems to me that the repetition of fields may be necessary in other instances too. I am having some difficulty figuring out how to record all the location data we have for the specimens, which are indicated using verbal descriptions, Lat/Long, UTM, and NTS coordinates - in many cases using all 4 for a single sample, but I will save the details for another posting. I will watch for the group's thoughts on this problem. Many thanks, Geoffrey -------------------------------------------- Geoffrey Allen Digital Projects Librarian Electronic Text Centre Harriet Irving Library University of New Brunswick Fredericton, NB E3B 5H5 Tel: (506) 447-3250 Fax: (506) 453-4595 gsallen@unb.ca

_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content

-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 IT Staff Filtered Push Project Department of Organismal and Evolutionary Biology Harvard University email: morris.bob@gmail.com web: http://efg.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)

joel sachs

22 Jul 22 Jul