[tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

Tony.Rees at csiro.au Tony.Rees at csiro.au
Mon Nov 22 03:50:33 CET 2010


Dear all,

Nico Franz just wrote:

> It's probably more accurate to say that, for better or worse, there are
> multiple discussions going on.

Correct - and returning to my original question, there appear to be 2 contrasting views:

(1) Include authority and other strictly "non canonical name" info in DwC:scientificName as available (as exemplified by Markus, Rich Pyle, also present DwC specification) - however now the canonical elements must be obtained by re-parsing the supplied scientificName content, or supplied separately in DwC:genus, DwC:specificEpithet, etc.

(2) Omit authority and other strictly "non canonical name" info from DwC:scientificName since this can be supplied elsewhere e.g. in DwC: scientificNameAuthorship, and makes the strictly canonical name information available directly rather than having to re-parse the DwC:scientificName element (Rod, Hilmar, Catalogue of Life format, Dmitry (?), also my practice for the last 8+ years although possibly not correct).

In my initial email my thought was that (1) would be an acceptable solution provided that the canonical information was supplied in (e.g. at species level) DwC:genus and DwC:specificEpithet. However I now realise that this solution will not scale, as per the following use case:

Currently I am preparing around 1.9 million records for export as DwCA format. E.g. my record for Philander opossum Linnaeus, 1758 (the previous cited worked example taxon from http://code.google.com/p/darwincore/wiki/Examples#Taxonomic_Treatment,_normalised) reads as follows (paraphrased from the relevant row in my csv file):

  DwC:taxonId=mam10000822
  DwC:scientificName=Philander opossum
  DwC:scientificNameAuthorship=Linnaeus, 1758
  DwC:taxonRank=species
  DwC:taxonomicStatus=accepted
  DwC:nomenclaturalStatus=available
  DwC:nameAccordingTo=CoL2006/ITS
  DwC:originalNameUsageID=
  DwC:namePublishedIn=
  DwC:acceptedNameUsageID=mam10000822
  DwC:parentNameUsage=Philander
  DwC:parentNameUsageID=mam1001153
  DwC:taxonRemarks=
  dc:modified=21-09-2006
  DwC:nomenclaturalCode=ICZN

This follows model (2) above.

Initially I thought that in order to convert into model (1) as recommended, all I would have to do would be to add 2 elements,

  DwC:genus=Philander
  DwC:specificEpithet=opossum

and concatenate (add in) the authority into the DwC:scientificName element. However this is not the total solution, since my file also includes other ranks i.e. genus (not an issue), family, order, class, phylum and kingdom, each of which would then be required to be populated for an entry of that rank, but will basically be blank for entries of all other ranks (since the hierarchy is available by traversing DwC:parentNameUsageID and following that trail upwards). This means that my "table" of currently 1.9m rows x 15 columns then becomes 1.9m rows x 22 columns, quite an overhead for data transfer and subsequent ingestion/parsing into another system. Of course if I had additional ranks too e.g. subgenus, subfamily, infraorder and the rest the size blows out even more - and in any case, with the exception of subgenus, there are no Darwin core elements for other intermediate ranks as far as I can see.

So, I am now beginning to think that the case for a new element DwC:canonicalName or equivalent is strengthened - all I would need is to put the scientific name without authority into that element, the scientific name with authority into DwC:scientificName and the problem is solved in the most efficient manner; also serving the needs of both arguments for either interpretation (1) or interpretation (2) above.

If others agree, is there then a case for going this route, and adding the relevant additional element to DwC?

Regards - Tony



More information about the tdwg-content mailing list