This is an excellent example of something I have to deal with
occassionally,
and was going to be part of my never-sent post on dealing with ambiguous identifications. In the context of DwC, my feeling is that this taxon should be represented as "Erebia" in dwc:scientificName, and the two possible species epithets included in dwc:identificationRemarks.
But that's not the data.
I would argue that it's an *accurate* representation of the data, just not a completely *precise* representation. We all have data that cannot easily be represented in DwC (without resorting to some xxxxRemarks term) -- which is a necessary compromise of a practical data exchange system designed to work across highly heterogenous datasets.
So I picked an easy example. Here's a slightly harder one: http://arctos.database.museum/guid/MVZ:Egg:2355.
Not harder at all. Two individuals (one identified as Pipilo aberti dumeticolus, and the other identified as Molothrus ater obscurus). Both are children of a parent Individual, which either doesn't have any taxon Idientification associated with it (if the object consists of the nest itself, as well as the eggs), or has an Identification of "Passeriformes" associated with it (if the nest itself is considered extraneous material, and the eggs are the real object of interest).
Maybe so, but there it is: http://data.gbif.org/occurrences/242032297/.
Well....I think this pushes (exceeds, really) the intended purpose of DwC. That it was picked up by GBIF is only a result of it having been presented by the content provider.
Excluding that would, I think, force you to exclude things like http://arctos.database.museum/guid/UAM:ES:3359 as well
- it's all from the same administrative unit.
Just because it's from the same administrative unit doesn't mean that it has be, or not be, considered within scope for DwC. I think a fossil is a legitimate within-scope record for DwC. The other information can, perhaps, be presented within the GeologicalContext class (or maybe not). But DwC is a data exchange system for information about organisms.
I don't have or want any control over what Curators enter - any scope-limiting filter will have to happen elsewhere.
That seems to me to be a question of database management within an institution -- not about what subset of that information gets exposed as DwC records. If the database is capable of filtering out the non-biological-relevant stuff at the time the records are generated for packaging within DwC, then such a filter should be applied accordingly. If this is not possible, then consumers will have to deal with the occassional out-of-scope records. I suspect the ratio of in-scope to out-of-scope records is such that the value of the latter vastly exceeds the cost of the former.
The point is simply that these are real data. We won't change them to some approximation of themselves or stuff them into a remarks field somewhere. They'll get more complicated before we're done. Anything that's to be useful to us must acknowledge the realities of collections data.
Fair enough; but as a collection wishing to present data for sharing via the DwC standard, the content provider needs to decide the relative costs/benefits of either filtering out-of-scope records out of the exposed DwC datasets, or accepting some small fraction of out-of-scope records being misinterpreted by consumers/users as in-scope records.
If anyone is interested, we accomplish the above by separating Identifications and Taxonomy. Arctos has roots deep in the ASC model discussed recently, but the link between specimens and taxonomy was one of our early divergences from that model. Assigning TaxonIDs directly to specimens is a no-win game - you either end up with the really valuable data buried in a remarks field somewhere, or you end up with an infinite list of strings that you must pretend are taxon names. Neither is acceptable. A fairly recent ER diagram can be had from http://arctos.googlecode.com/files/arctos_erd_20100129_single.pdf. Taxonomy and Identifications are in dark purple.
This seems to be a very standard way of representing Identifications and taxon names. I'm not sure I understand the issue here. The only part that I'm not clear on is the meaning of the "VARIABLE" attribute of the IDENTIFICATION_TAXONOMY entity. Is this how you enable identifications such as "Erebia youngi or Erebia lafontainei"?
But am I to understand correctly that there is a record in the TAXONOMY table where FULL_TAXON_NAME is populated with "Dark grey shale", with an INFRASPECIFIC_RANK of "Subspecies"? Wouldn't it then be worthwhile to add a field for "IS_BIOLOGICAL" to this table, to allow filtering out such taxa? Or, at least making an effort to put some standard term like "Non-Biological" within the TAXON_REMARKS field?
Getting back to your example identified as "Erebia youngi or Erebia lafontainei". I don't actually see this as breaking the rule I tried to articulate in a previous post, which asserted that a single Individual can have only one legitimate taxon identification. Here's what I wrote:
My proposed solution is to rigidly maintain that an instance of "Individual" can not be partitioned to have multiple separate but concurrently legitimate Identifications associated with it. It can have multiple Identifications, but they would be considered to either be competing with each other (when different taxa are asserted) or reinforing each other (when the same taxon is asserted).
So, although I maintain that my "accurate but less precise" method of presenting this record in DwC is still legitimate, perhaps a better way to represent identifications for your specimen http://arctos.database.museum/guid/KWP:Ento:1703 is as follows:
identificationID: 1 individualID: http://arctos.database.museum/guid/KWP:Ento:1703 taxonID: http://arctos.database.museum/name/Erebia%20youngi identifiedBy: Kenelm W. Philip dateIdentified: 1974-07-04 identificationQualifier: Alternative identificationRemarks: Erebia youngi/lafontainei
identificationID: 2 individualID: http://arctos.database.museum/guid/KWP:Ento:1703 taxonID: http://arctos.database.museum/name/Erebia%20lafontainei identifiedBy: Kenelm W. Philip dateIdentified: 1974-07-04 identificationQualifier: Alternative identificationRemarks: Erebia youngi/lafontainei
The only part I made up here is the dwc:identificationQualifier term of "Alternate". Perhaps when someone proposes a controlled vocabulary for dwc:identificationQualifier, something like "Alternate" could be included, with the meaning that it is one of multiple possible identifications.
The important point is that those multiple possible identifications are still mutually exclusive (and competitive), and hence conforms to the rule I proposed for only one concurrent legitimate identification per Individual.
Aloha, Rich