[tdwg-content] practical details of recording a determination What is an Occurrence?

Mon Oct 18 23:50:54 CEST 2010

All,

I'm in Stockholm, and right now it's 10am in Hawaii, and I've effectively
been awake since 7pm Hawaii time -- so my brain is a bit mush. But I'll take
a chance and comment anyway.

> I will leave up to the taxonomy people the 
> different things would be connected to the 
> species concept and how all of their lines 
> would be connected.

In my mind the "fully-normalised" (sensu Döring) relationship graph is
something like this (notation is [One]--<[Many]; [One]--[One]) (Be sure to
view as a fixed-width font, like Courier):

                                                      [identifiedBy]
                                                            |
[Location]--<[Event]--<[Occurrence]>--[Individual]--<[Identification]--[Taxo
nNameUsage]>--[nameAccordingTo]
                |                                           |
|
           [eventTime]                               [dateIdentified]
[scientificName]

I'm following what I *think* Steve defined for [Individual], which is that
it can be either a single individual organism or a defined set of organisms
(e.g., up to at least a population).

So, an Occurrence is the intersection of an Individual and an Event.  An
Event is a Location+Time[+other metadata].  Each Event may have multiple
Occurrences (i.e., one for each distinct Individual at the same
Location+Time).  Also, an Individual may have multiple Occurrences (one for
each Event at which the same Individual was documented).

An Individual may have multiple Identifcations.  I make no distinction
between "Identification" and "Determination" (nor do I make a distinction
between the first identification and subsequent identifications).  I
slightly prefer "Identification", because "Determination" seems to imply
that there is a correct answer, whereas "Identification" (to me, anyway),
implies an opinion.  Steve, I didn't quite follow how you were
distinguishing these two terms -- so if you have a clear reason for
distinguishing them, I'd like to understand it better.

A single Identification should, in my mind, always join a single individual
with a single "TaxonNameUsage" instance.  I'm not 100% sure how
TaxonNameUsage maps in DwC.  I *think* it's an instance of a dwc:Taxon, as
most of the core attributes of a TNU (acceptedNameUsage[ID],
parentNameUsage[ID], originalNameUsage[ID], scientificName, taxonRank) are
represented as terms in the Taxon Class.  But I'm a little fuzzy on whether
a "taxonID" maps directly to a TNUID, or if a TNUID more correcly maps to
taxonConceptID.

> The determination would have any of the properties that are 
> terms listed in the dwc:Identification class (identifiedBy,
> dateIdentified, identificationReferences, identification Remarks,
> identificationQualifier, and typeStatus).  Some properties like 
> dateIdentified and identificationReferences would be string 
> literals and others (especially identifiedBy) should probably 
> be GUIDs but could be literals if they had to be.  

I agree with what Steve wrote above.  However, I'm uncomfortable with
Markus' suggestion of treating dwc:nameAccordingTo as a property of an
Indentification -- even as a shortcut.  I think this is a bit dangerous. If
there is no TaxonID instance (aka "TaxonNameUsage" in my diagram above)
available to link the Identification to, then I would suggest using
identificationReferences as the shortcut.  But that would still force you to
attached scientificName directly to the Identification instance, which I
think is also unwise.  I'd rather the Best Practice be to "manufacture" a
place-holder dwc:Taxon instance (if a proper one doesn't already exist in
the content source), and apply the scientificName property to that Taxon
instance, rather than directly to an Identification.  I know it's often
short-hand to attach the scientificName directly to the Occurrence instance;
but I actually feel less uneasy about that, because it is much more
obviously a shortcut.  But if you're going to the trouble to provide an
instantiated "Identification", then you ought to anchor it to a Taxon
instance (manufactured or real).

But, I guess as Greg said in his post, it may not really matter, as in the
long run, we'll probably be able to make inferences about the proper
Individual<-->TaxonConcept mapping, even when it's not explicitly
documented.

> 1. The original label identifies the species as Juncus 
> diffusissimus.  However, there is no indicator as to who 
> originally identified it or when.  My assumption is that 
> it was the collector (Glen N. Montz) but I don't really 
> know that.  Do I assume that, or list the original 
> determiner as "unknown"?

I would make no assumptions about who was the identifiedBy person.  Instead,
in these cases I handle these cases by either going with "Unspecified", or,
in some cases (when I have confidence), something like "Bishop Museum Staff
Member".  Often I can deduce the identifier with some degree of confidence,
but usually I don't have the time to do this.  The dateIdentified can either
not be provided, or set as some range (e.g., at the very worst, on or after
the eventDate/eventTime, and before today).

This is why I think that identification tags ("annotations" sensu Baskauf)
can be "documentation sources for TNUs.

In the web example given by Steve, we have an idetification as follows:

Juncus diffusissimus Buckl.
Determined by: L. Urbatsch
Determination date: 2009

Completely independantly of the specimen itself, we can infer from the tag
that:

- Sometime between 1 Jan 2009 and 31 Dec 2009, L. Urbatsch regarded the
genus "Juncus" as valid.
- Sometime between 1 Jan 2009 and 31 Dec 2009, L. Urbatsch regarded the
species epithet "diffusissimus" [of Buckl.] as a valid species, placed
within the genus "Juncus".

Thus, we have at least two implied TNUs from this identification, which was
documented on a piece of paper that happens to be fixed to LSU-BR 39823.

The Identification instance would link the Individual (manifest as a
specimen, in this case) to the TNU of "[Juncus] diffusissimus Buckl. sec L.
Urbatsch 2009".  The nameAccordingTo would be "L. Urbatsch 2009".  This may
seem redundant to have "L. Urbatsch 2009" in both the nameAccordingTo
attribute of thr Taxon instance, and in the identifiedBy & dateIdentified
attributes of the Identification instance -- but the fact remains they are
fundamentally different pieces of information.  One establishes an instance
of an (implied) taxon concept, and the other establishes the placement of
LSU-BR 39823 within that taxon concept circumscription.

Eventually, a third party may be able to deduce (perhaps through a suite of
other, external information) a RelationshipAssertion that maps the TNU
"[Juncus] diffusissimus Buckl. sec L. Urbatsch 2009" to some other, perhaps
published and well-defined taxon concept (of the same or different name).
Also, if there are 100 specimens in the collection that L. Urbatsch
identified as "Juncus diffusissimus Buckl." in 2009, then anchoring all 100
Identification instances to the one TNU, allows all of those specimens to
inherit the mapping of the one "[Juncus] diffusissimus Buckl. sec L.
Urbatsch 2009" TNU instance to some other better-defined taxon concept.

I know this is a lot of stuff to keep in one's head at the same time -- but
as cumbersome as it seems, I am conviced it can be packacged into a
relatively straightforward and intuitive user UI, and modelling it this way
improves the utility of the data (maybe dramatically) in the long run.

> 2. Do we draw a distinction between the initial identification and
subsequent annotations?  
> I think the answer should be "no" and that's why I refer to both
generically as "determinations".

I agree.

> 3. There is really no indication given on the annotation 
> labels as to many of the things that we would like to know, 
> such as the concept they had in mind, any source they used (if any), 
> or the reason why they did the annotation.  So how does one 
> connect the name that they applied to the determination when 
> there is no indication of the concept?  

As I said in an earlier post, the single most important way to reduce
taxonomic ambiguity is to try to capture (or confidently deduce) the source
(=mapping to taxon concept).  But if it can't be done, then it can't be done
-- so I'm inclined to establish a "place-holder" dwc:Taxon instance, with no
nameAccordingTo, and no other metadata besides the scientificName.

> Is this just something we can't do for old annotations 
> and just something that we try to do from this point forward?

Probably.

> 4. The last question is one that I really want to some 
> opinions about.  It seems to me that there are a number 
> of reasons why one would apply a determination.  

Hmmm....I don't think this is really useful information.  I don't
undersatand how you would use this information ina  machine-processing sort
of way.  An Identification is an Identification.  In some cases, the
Identifier may not even be aware of the previous identification, and so we
can necessarily infer there was a particular "reason".  And even if there is
a reason, how doe we use that information? What if there is more than one
reason (i.e., if we are restricted to a controlled vocabulary)?

As far as I'm concerned, the Identifications should stand as they are.  If
needed people can annotate the Identification instances; but I don't see the
value in machine-processing these things.

Also:

> Finally, a single determiner might apply 
> several determinations to one individual and indicate 
> in each determination the concept intended (i.e. if 
> you subscribe to Cronquist, you'd call it X; if you 
> like Radford's book, you'd call it Y; if you like 
> Weakley's treatment, you'd call it Z).  

YIKES!  I don't like the idea of loading all that information on an
Identification instance.  If the person wants to make this sort of
assertion, then they should establish the appropriate relationshipAssertion
instances among the various taxonConcepts cited.

Damn.  Now my head is really tired.  And so is the rest of me....

Aloha, and g'night..

Rich