On Thu, Nov 4, 2010 at 1:14 AM, Peter DeVries <pete.devries@gmail.com> wrote:

Uncertainty: http://arctos.database.museum/guid/KWP:Ento:1703  => This is a Genus Erebia species undetermined.

No, it isn't. We know more than that. It's not Erebia embla, for example.

On Thu, Nov 4, 2010 at 2:10 AM, Richard Pyle <deepreef@bishopmuseum.org> wrote:
Hi Dusty,

> Collections contain things that do not map nicely to a
> single taxon name of any (or no) rank. It's not clear
> to me if this proposal will support those kinds of
> data or not. A few examples:
>
> Uncertainty: http://arctos.database.museum/guid/KWP:Ento:1703

This is an excellent example of something I have to deal with occassionally,
and was going to be part of my never-sent post on dealing with ambiguous
identifications.  In the context of DwC, my feeling is that this taxon
should be represented as "Erebia" in dwc:scientificName, and the two
possible species epithets included in dwc:identificationRemarks.

But that's not the data.
This one could be represented as "Bupleurum" for the Individual instance
representing the sheet, but then I would be inclined to establish two
"child" individuals (semantically related to the "parent" sheet), one each
identified to the two different taxa.

So I picked an easy example. Here's a slightly harder one: http://arctos.database.museum/guid/MVZ:Egg:2355.
I think a lot of data models (including GNUB) treat hybrid formulae as
though they are separate "taxa", with the hybrid formula as the name.
Although it doesn't seem to be addressed in the DwC documentation, I would
put "Canis latrans x Canis lupus familiaris" in dwc:scientificName.

Now....this may be one of those semantics-breaking pseudo-conventions that
the RDF'ers will pull their hair out over (along the lines of Bob's post
concerning different kinds of aggregations), in which case we should
probably have an0other thread on this topic.

> Things that aren't taxonomy at all:
http://arctos.database.museum/guid/UAM:ES:3405

Outside the scope of DwC?

Maybe so, but there it is: http://data.gbif.org/occurrences/242032297/. Excluding that would, I think, force you to exclude things like http://arctos.database.museum/guid/UAM:ES:3359 as well - it's all from the same administrative unit. I don't have or want any control over what Curators enter - any scope-limiting filter will have to happen elsewhere.

The point is simply that these are real data. We won't change them to some approximation of themselves or stuff them into a remarks field somewhere. They'll get more complicated before we're done. Anything that's to be useful to us must acknowledge the realities of collections data.

If anyone is interested, we accomplish the above by separating Identifications and Taxonomy. Arctos has roots deep in the ASC model discussed recently, but the link between specimens and taxonomy was one of our early divergences from that model. Assigning TaxonIDs directly to specimens is a no-win game - you either end up with the really valuable data buried in a remarks field somewhere, or you end up with an infinite list of strings that you must pretend are taxon names. Neither is acceptable. A fairly recent ER diagram can be had from http://arctos.googlecode.com/files/arctos_erd_20100129_single.pdf. Taxonomy and Identifications are in dark purple.

--D