[tdwg-content] taxonomy != identification

Richard Pyle deepreef at bishopmuseum.org
Thu Nov 4 20:07:25 CET 2010


> > This is an excellent example of something I have to deal with
occassionally,
> > and was going to be part of my never-sent post on dealing with ambiguous
> > identifications.  In the context of DwC, my feeling is that this taxon
> > should be represented as "Erebia" in dwc:scientificName, and the two
> > possible species epithets included in dwc:identificationRemarks.
>	
> But that's not the data.

I would argue that it's an *accurate* representation of the data, just not a
completely *precise* representation.  We all have data that cannot easily be
represented in DwC (without resorting to some xxxxRemarks term) -- which is
a necessary compromise of a practical data exchange system designed to work
across highly heterogenous datasets.
	
> So I picked an easy example. Here's a slightly 
> harder one: http://arctos.database.museum/guid/MVZ:Egg:2355.

Not harder at all.  Two individuals (one identified as Pipilo aberti
dumeticolus, and the other identified as Molothrus ater obscurus). Both are
children of a parent Individual, which either doesn't have any taxon
Idientification associated with it (if the object consists of the nest
itself, as well as the eggs), or has an Identification of "Passeriformes"
associated with it (if the nest itself is considered extraneous material,
and the eggs are the real object of interest).

> Maybe so, but there it is: http://data.gbif.org/occurrences/242032297/. 

Well....I think this pushes (exceeds, really) the intended purpose of DwC.
That it was picked up by GBIF is only a result of it having been presented
by the content provider.

> Excluding that would, I think, force you to exclude things 
> like http://arctos.database.museum/guid/UAM:ES:3359 as well 
- it's all from the same administrative unit. 

Just because it's from the same administrative unit doesn't mean that it has
be, or not be, considered within scope for DwC.  I think a fossil is a
legitimate within-scope record for DwC. The other information can, perhaps,
be presented within the GeologicalContext class (or maybe not).  But DwC is
a data exchange system for information about organisms.

> I don't have or want any control over what Curators 
> enter - any scope-limiting filter will have to happen 
> elsewhere.

That seems to me to be a question of database management within an
institution -- not about what subset of that information gets exposed as DwC
records.  If the database is capable of filtering out the
non-biological-relevant stuff at the time the records are generated for
packaging within DwC, then such a filter should be applied accordingly.  If
this is not possible, then consumers will have to deal with the occassional
out-of-scope records.  I suspect the ratio of in-scope to out-of-scope
records is such that the value of the latter vastly exceeds the cost of the
former.
	
> The point is simply that these are real data. We won't 
> change them to some approximation of themselves or stuff 
> them into a remarks field somewhere. They'll get more 
> complicated before we're done. Anything that's to be 
> useful to us must acknowledge the realities of 
> collections data.

Fair enough; but as a collection wishing to present data for sharing via the
DwC standard, the content provider needs to decide the relative
costs/benefits of either filtering out-of-scope records out of the exposed
DwC datasets, or accepting some small fraction of out-of-scope records being
misinterpreted by consumers/users as in-scope records.
	
> If anyone is interested, we accomplish the above by 
> separating Identifications and Taxonomy. Arctos has 
> roots deep in the ASC model discussed recently, but 
> the link between specimens and taxonomy was one of 
> our early divergences from that model. Assigning 
> TaxonIDs directly to specimens is a no-win game - 
> you either end up with the really valuable data 
> buried in a remarks field somewhere, or you end up 
> with an infinite list of strings that you must 
> pretend are taxon names. Neither is acceptable. 
> A fairly recent ER diagram can be had from 
> http://arctos.googlecode.com/files/arctos_erd_20100129_single.pdf. 
> Taxonomy and Identifications are in dark purple.

This seems to be a very standard way of representing Identifications and
taxon names.  I'm not sure I understand the issue here.  The only part that
I'm not clear on is the meaning of the "VARIABLE" attribute of the
IDENTIFICATION_TAXONOMY entity. Is this how you enable identifications such
as "Erebia youngi or Erebia lafontainei"?

But am I to understand correctly that there is a record in the TAXONOMY
table where FULL_TAXON_NAME is populated with "Dark grey shale", with an
INFRASPECIFIC_RANK of "Subspecies"?  Wouldn't it then be worthwhile to add a
field for "IS_BIOLOGICAL" to this table, to allow filtering out such taxa?
Or, at least making an effort to put some standard term like
"Non-Biological" within the TAXON_REMARKS field?

Getting back to your example identified as "Erebia youngi or Erebia
lafontainei".  I don't actually see this as breaking the rule I tried to
articulate in a previous post, which asserted that a single Individual can
have only one legitimate taxon identification.  Here's what I wrote:

> My proposed solution is to rigidly maintain that an 
> instance of "Individual" can not be partitioned to 
> have multiple separate but concurrently legitimate 
> Identifications associated with it. It can have 
> multiple Identifications, but they would be considered 
> to either be competing with each other (when different 
> taxa are asserted) or reinforing each other (when the 
> same taxon is asserted).

So, although I maintain that my "accurate but less precise" method of
presenting this record in DwC is still legitimate, perhaps a better way to
represent identifications for your specimen
http://arctos.database.museum/guid/KWP:Ento:1703 is as follows:

identificationID: 1
individualID: http://arctos.database.museum/guid/KWP:Ento:1703
taxonID: http://arctos.database.museum/name/Erebia%20youngi
identifiedBy: Kenelm W. Philip
dateIdentified: 1974-07-04 
identificationQualifier: Alternative
identificationRemarks: Erebia youngi/lafontainei 

identificationID: 2
individualID: http://arctos.database.museum/guid/KWP:Ento:1703
taxonID: http://arctos.database.museum/name/Erebia%20lafontainei
identifiedBy: Kenelm W. Philip
dateIdentified: 1974-07-04 
identificationQualifier: Alternative
identificationRemarks: Erebia youngi/lafontainei 

The only part I made up here is the dwc:identificationQualifier term of
"Alternate".  Perhaps when someone proposes a controlled vocabulary for
dwc:identificationQualifier, something like "Alternate" could be included,
with the meaning that it is one of multiple possible identifications.

The important point is that those multiple possible identifications are
still mutually exclusive (and competitive), and hence conforms to the rule I
proposed for only one concurrent legitimate identification per Individual.

Aloha,
Rich




More information about the tdwg-content mailing list