Dear David & all,
as a sidenote from me as a data-user who would like to assist in avoiding errors in interpretation of taxonomic names, - not sure if it helps in planning your future strategies: I'm just wondering, why not explore a few gentle steps towards ... whoknows ... something like elements of a future "BioCode" (being aware of the incompatible parts of current domain-specific IC*N Codes, - but making use of uniting principles).
The usefulness of automated name parsing is limited, - not enough or even misleading in too many cases, e.g. with binomina combined with different homonymous genera, e.g. the following:
Tylonotus bimaculatus Tylonotus rugicollis Tylonotus fryi
The first genus is Tylonotus HALDEMAN 1847 (Coleoptera Cerambycidae), second: Tylonotus FIEBER 1858 (Heteroptera Miridae), third: Tylonotus SCHAUM 1863 (Coleoptera Carabidae). Now imagine if we had an 'expanded parsing' strategy (with the assistance of taxon experts!) resulting in unique namestrings that contain all basic information on the nomenclatural status of names. In the above example, unique ID-strings for the nomenclatural content might look like this:
ZS-Tylonotus_bimaculatus ZS-Lygaeus_rugicollis/2Tylonotus_rugicollis/=Plesiocoris_rugicollis ZS-3Tylonotus_fryi/=Nototylus_fryi
(prefix 'ZS-' for zoological species-group names; the second string shows we have to do with a new generic combination, etc.) Such strings can be perfectly stable, unique, human and machine-readable. They could serve as a solid basis for interpretation of actual name usages [= the GNUB task?] ... in my imagination.
Best regards, Wolfgang
----------------------------------
Wolfgang Lorenz, Tutzing, Germany
2010/11/25 David Remsen (GBIF) dremsen@gbif.org
Now that we've been talking about the variations, perhaps we can move toward doing something about it.
I mentioned we have been talking about having a service built that can read and unpack a DarwinCore Archive, evaluate specific things like we have been discussing that may be expressed inconsistently, set them right, and spit back out a new and more consistent archive file. In order to put out a call for a developer to do this, we need to capture some of those things it should do. Markus and I have started this but we have a lot of end-of-year business and less time than money. We still have some funds available to at least start this process.
I'd like to know if anyone is interested in, and feels qualified to develop a more complete set of requirements for such a service, which we would then try to find a developer to build. We aren't trying to deal with everything at once, mind you. Just a some key things that might make ingesting a DarwinCore Archive for either Taxon data or Occurrence data a bit more consistent in regard to the taxonomic elements. I'd need a couple of days or three to do this as complete as I think is needed so it's that sort of time I'm anticipating you smart people can do in about the same.
For example,
• Checking the integrity of normal and denormal classifications. What are the steps, and conditions to check integrity in normal classifications and to transform a denormal to a normal. In the latter, for example, you have to make sure the same Family value doesn't have two different parents. If so, what then?
- Creating IDs for IDless, normalised records (e.g parentNameUsage)
- Map taxon ranks to our taxon rank vocabulary so that alternative
forms (ssp, subspec, ss., are replaced, when possible, to the standard form). normalising taxon and nomenclatural status
- Splitting a merged name into name and authorship parts.
- Checking the split version is consistent with the complete one if
both are given.
I put the working doc Markus and I have in Dropbox and put it here: https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srci...
Again, we not looking for someone to do the programming but to provide enough details so that person the steps needed to do that work.
So anyone want to help out a data standard over the holidays? We are eager to start because I'd like to get a call for a developer before the middle of December when the little bag of gold gets taken away? If multiple people are interested we will have to draw straws or pistols or some other means for making a good decision. But please contact me directly and we can followup offline.
Best, David Remsen
David Remsen, Senior Programme Officer Electronic Catalog of Names of Known Organisms Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321472 Fax: +45-35321480 Mobile +45 28751472 Skype: dremsen
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content