[tdwg-content] practical details of recording a determination What is an Occurrence?

Tue Oct 19 18:15:01 CEST 2010

On Oct 19, 2010, at 11:35 AM, Steve Baskauf wrote:

>  I've tried to recreate your diagram at
> http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif

Note that the visible label gives the correct URL (http://bioimages.vanderbilt.edu/pages/rich-diagram1.gif 
), but for some reason its linked to the wrong URL-- so don't click  
it, just cut & paste it.

Arlin

> Please correct me if I didn't get it right.  My arrow-drawing  
> utility put the arrow heads on the other end of the lines, but I  
> think the arrows still maintain the "many to one" relationships you  
> were trying to represent.  I also replaced eventTime with eventDate  
> since the latter is a broader term that also can include the time.
>
> In principle, I agree with this diagram to the left of  
> taxonNameUsage completely.  (I still need clarification about a few  
> things on the right end.)  My main reason for using determination as  
> a term rather than identification is because it is not ambiguous to  
> refer to the person doing the identifying as the determiner, whereas  
> referring to that person as the "identifier" creates confusion  
> between that person and the identifying string for resources (as in  
> "persistent identifier").  So if we agree that determination,  
> annotation, and identification all mean the same thing (namely an  
> instance of the dwc:Identification class), I'm happy to just use the  
> term "identification".  For the person doing it, I guess  
> dwc:identifiedBy would be the best term although it's a bit awkward  
> in regular speech so I may slip and still say "determiner".
>
> Although I agree in principle that there can be many occurrences at  
> an Event and many events at a Location, I think there are two  
> practical reasons why it may be better to assign separate eventDate  
> and Location metadata to each Occurrence.  The first is that it  
> makes the database structure simpler. As Markus has already noted,  
> we really would prefer for the database to be as "flat" as  
> possible.  When I look at the terms listed in the DwC term page (http://rs.tdwg.org/dwc/terms/index.htm 
> ) under Event, the most important one that I see which everyone  
> should be providing is eventDate.  The rest I would pretty much  
> consider optional and as a shortcut Rich's diagram could be  
> collapsed to make them direct properties of the Occurrence.  The  
> second reason involves the practical matter of defining a Location.   
> I will note that my thinking about this has been deeply influenced  
> by a previous discussion on the topic from 2008-2009 summarized at http://www.sernec.org/files/summary-of-discussion.pdf 
>  on p.78-84.  I don't think most people will want to wade through  
> all of that text, so I'll just sum it up here.  Somebody (I think it  
> might have been Debbie Paul at Morphbank) suggested to me that we  
> really have an intrinsically globally unique identifier for  
> Location.  It's the combination of dwc:decimalLatitude and  
> dwc:decimalLongitude along with dwc:coordinateUncertaintyInMeters to  
> establish precision and dwc:geodeticDatum to establish the reference  
> system.  (If we like geo:lat and geo:long, then the reference system  
> is implied and we are down to three terms to unambiguously define a  
> Location and its uncertainty.  For the benefits of humans, a  
> Locality description is probably also beneficial.  Also, elevation  
> and depth might be provided, although at least in theory elevation  
> could be calculated with a sufficiently good digital elevation  
> model).  I will grant that we don't have this information for a lot  
> of old records, but based on the massive efforts to geolocate  
> specimens, I would say it's pretty clear that this is what we would  
> like to have if we could get it.  I certainly hope that there aren't  
> any serious collectors, observers, and live organism photographers  
> who aren't by this point trying to record this information as they  
> establish new Occurrence records.  If you look at all of the  
> Location terms on the dwc list, most of the other terms are either  
> concessions to the fact that we don't have what we want (e.g. the  
> "verbatum" terms), things we could generate using a computer program  
> if we were clever (like stateProvince, county, etc. - I know at  
> least Mike Giddens has succeeded in doing this), ways of indicating  
> how we got lat and long from old records (e.g.  
> georefererenceSources), or methods to define larger scale Locations  
> that aren't points (e.g. footprintWKT).  I think it is safe to say  
> that in the future (if not now already), many or most Events  
> associated with Occurrences will have an associated button click (on  
> a GPS receiver, camera phone, or GPS enabled camera) that will  
> automatically generate dwc:eventDate, dwc:decimalLatitude,  
> dwc:decimalLongitude (with geodeticDatum=WGS84) and maybe  
> coordinateUncertaintyInMeters.  Thus designing a system that  
> requires that these time/space snapshots be grouped together into  
> artificial "Locations" is really counterproductive when those data  
> are now generated and can be associated with Occurrences  
> automatically.  I don't know if Greg Riccardi of Morphbank is  
> following this thread or not.  If so he may want to comment on this  
> issue based on practical experience at Morphbank.  When the  
> Morphbank system was set up, it required the creation of a separate  
> Location record which was assigned a unique Morphbank identifier.   
> Specimens were then linked to this Location.  What ended up  
> happening was that each Specimen having GPS metadata ended up being  
> assigned to its own separate Location even if it was 20 meters from  
> another specimen.  In effect, each Occurrence record ended up having  
> its own decimalLatitude/decimalLongitude record anyway.  So the  
> system ended up being more complicated than necessary.
>
> As I said, I agree in principle with the left side of Rich's  
> diagram.  Taking the practical considerations I just mentioned into  
> account, I would simplify the diagram as
> http://bioimages.vanderbilt.edu/pages/rich-diagram2.gif
> Superficially, it looks more complicated, but I've gotten rid of  
> several "one to many" relationships and enthroned Occurrence at its  
> accustomed place in the center of the universe (or at least the  
> center of the left side of the diagram).  I don't have any  
> philosophical objections to people structuring their data according  
> to Rich's original diagram and the existing Darwin Core terms  
> certainly make it possible to do so (well except for the Individual  
> thing).  However, I submit that many people will find it simpler  
> (and easier to use tools like Darwin Core Archives) if they use the  
> flatter structure that I have in the revised diagram.
>
> I will save my questions about the right side of Rich's diagram for  
> later.
> Steve
>
> Richard Pyle wrote:
>>
>> All,
>>
>> I'm in Stockholm, and right now it's 10am in Hawaii, and I've  
>> effectively
>> been awake since 7pm Hawaii time -- so my brain is a bit mush. But  
>> I'll take
>> a chance and comment anyway.
>>
>>
>>> I will leave up to the taxonomy people the
>>> different things would be connected to the
>>> species concept and how all of their lines
>>> would be connected.
>>>
>>
>>
>> In my mind the "fully-normalised" (sensu Döring) relationship graph  
>> is
>> something like this (notation is [One]--<[Many]; [One]--[One]) (Be  
>> sure to
>> view as a fixed-width font, like Courier):
>>
>>                                                       [identifiedBy]
>>                                                             |
>> [Location]--<[Event]--<[Occurrence]>--[Individual]-- 
>> <[Identification]--[Taxo
>> nNameUsage]>--[nameAccordingTo]
>>                 |                                           |
>> |
>>            [eventTime]                               [dateIdentified]
>> [scientificName]
>>
>>
>> I'm following what I *think* Steve defined for [Individual], which  
>> is that
>> it can be either a single individual organism or a defined set of  
>> organisms
>> (e.g., up to at least a population).
>>
>> So, an Occurrence is the intersection of an Individual and an  
>> Event.  An
>> Event is a Location+Time[+other metadata].  Each Event may have  
>> multiple
>> Occurrences (i.e., one for each distinct Individual at the same
>> Location+Time).  Also, an Individual may have multiple Occurrences  
>> (one for
>> each Event at which the same Individual was documented).
>>
>> An Individual may have multiple Identifcations.  I make no  
>> distinction
>> between "Identification" and "Determination" (nor do I make a  
>> distinction
>> between the first identification and subsequent identifications).  I
>> slightly prefer "Identification", because "Determination" seems to  
>> imply
>> that there is a correct answer, whereas "Identification" (to me,  
>> anyway),
>> implies an opinion.  Steve, I didn't quite follow how you were
>> distinguishing these two terms -- so if you have a clear reason for
>> distinguishing them, I'd like to understand it better.
>>
>> A single Identification should, in my mind, always join a single  
>> individual
>> with a single "TaxonNameUsage" instance.  I'm not 100% sure how
>> TaxonNameUsage maps in DwC.  I *think* it's an instance of a  
>> dwc:Taxon, as
>> most of the core attributes of a TNU (acceptedNameUsage[ID],
>> parentNameUsage[ID], originalNameUsage[ID], scientificName,  
>> taxonRank) are
>> represented as terms in the Taxon Class.  But I'm a little fuzzy on  
>> whether
>> a "taxonID" maps directly to a TNUID, or if a TNUID more correcly  
>> maps to
>> taxonConceptID.
>>
>>
>>> The determination would have any of the properties that are
>>> terms listed in the dwc:Identification class (identifiedBy,
>>> dateIdentified, identificationReferences, identification Remarks,
>>> identificationQualifier, and typeStatus).  Some properties like
>>> dateIdentified and identificationReferences would be string
>>> literals and others (especially identifiedBy) should probably
>>> be GUIDs but could be literals if they had to be.
>>>
>>
>> I agree with what Steve wrote above.  However, I'm uncomfortable with
>> Markus' suggestion of treating dwc:nameAccordingTo as a property of  
>> an
>> Indentification -- even as a shortcut.  I think this is a bit  
>> dangerous. If
>> there is no TaxonID instance (aka "TaxonNameUsage" in my diagram  
>> above)
>> available to link the Identification to, then I would suggest using
>> identificationReferences as the shortcut.  But that would still  
>> force you to
>> attached scientificName directly to the Identification instance,  
>> which I
>> think is also unwise.  I'd rather the Best Practice be to  
>> "manufacture" a
>> place-holder dwc:Taxon instance (if a proper one doesn't already  
>> exist in
>> the content source), and apply the scientificName property to that  
>> Taxon
>> instance, rather than directly to an Identification.  I know it's  
>> often
>> short-hand to attach the scientificName directly to the Occurrence  
>> instance;
>> but I actually feel less uneasy about that, because it is much more
>> obviously a shortcut.  But if you're going to the trouble to  
>> provide an
>> instantiated "Identification", then you ought to anchor it to a Taxon
>> instance (manufactured or real).
>>
>> But, I guess as Greg said in his post, it may not really matter, as  
>> in the
>> long run, we'll probably be able to make inferences about the proper
>> Individual<-->TaxonConcept mapping, even when it's not explicitly
>> documented.
>>
>>
>>> 1. The original label identifies the species as Juncus
>>> diffusissimus.  However, there is no indicator as to who
>>> originally identified it or when.  My assumption is that
>>> it was the collector (Glen N. Montz) but I don't really
>>> know that.  Do I assume that, or list the original
>>> determiner as "unknown"?
>>>
>>
>> I would make no assumptions about who was the identifiedBy person.   
>> Instead,
>> in these cases I handle these cases by either going with  
>> "Unspecified", or,
>> in some cases (when I have confidence), something like "Bishop  
>> Museum Staff
>> Member".  Often I can deduce the identifier with some degree of  
>> confidence,
>> but usually I don't have the time to do this.  The dateIdentified  
>> can either
>> not be provided, or set as some range (e.g., at the very worst, on  
>> or after
>> the eventDate/eventTime, and before today).
>>
>> This is why I think that identification tags ("annotations" sensu  
>> Baskauf)
>> can be "documentation sources for TNUs.
>>
>> In the web example given by Steve, we have an idetification as  
>> follows:
>>
>> Juncus diffusissimus Buckl.
>> Determined by: L. Urbatsch
>> Determination date: 2009
>>
>> Completely independantly of the specimen itself, we can infer from  
>> the tag
>> that:
>>
>> - Sometime between 1 Jan 2009 and 31 Dec 2009, L. Urbatsch regarded  
>> the
>> genus "Juncus" as valid.
>> - Sometime between 1 Jan 2009 and 31 Dec 2009, L. Urbatsch regarded  
>> the
>> species epithet "diffusissimus" [of Buckl.] as a valid species,  
>> placed
>> within the genus "Juncus".
>>
>> Thus, we have at least two implied TNUs from this identification,  
>> which was
>> documented on a piece of paper that happens to be fixed to LSU-BR  
>> 39823.
>>
>> The Identification instance would link the Individual (manifest as a
>> specimen, in this case) to the TNU of "[Juncus] diffusissimus  
>> Buckl. sec L.
>> Urbatsch 2009".  The nameAccordingTo would be "L. Urbatsch 2009".   
>> This may
>> seem redundant to have "L. Urbatsch 2009" in both the nameAccordingTo
>> attribute of thr Taxon instance, and in the identifiedBy &  
>> dateIdentified
>> attributes of the Identification instance -- but the fact remains  
>> they are
>> fundamentally different pieces of information.  One establishes an  
>> instance
>> of an (implied) taxon concept, and the other establishes the  
>> placement of
>> LSU-BR 39823 within that taxon concept circumscription.
>>
>> Eventually, a third party may be able to deduce (perhaps through a  
>> suite of
>> other, external information) a RelationshipAssertion that maps the  
>> TNU
>> "[Juncus] diffusissimus Buckl. sec L. Urbatsch 2009" to some other,  
>> perhaps
>> published and well-defined taxon concept (of the same or different  
>> name).
>> Also, if there are 100 specimens in the collection that L. Urbatsch
>> identified as "Juncus diffusissimus Buckl." in 2009, then anchoring  
>> all 100
>> Identification instances to the one TNU, allows all of those  
>> specimens to
>> inherit the mapping of the one "[Juncus] diffusissimus Buckl. sec L.
>> Urbatsch 2009" TNU instance to some other better-defined taxon  
>> concept.
>>
>> I know this is a lot of stuff to keep in one's head at the same  
>> time -- but
>> as cumbersome as it seems, I am conviced it can be packacged into a
>> relatively straightforward and intuitive user UI, and modelling it  
>> this way
>> improves the utility of the data (maybe dramatically) in the long  
>> run.
>>
>>
>>> 2. Do we draw a distinction between the initial identification and
>>>
>> subsequent annotations?
>>
>>> I think the answer should be "no" and that's why I refer to both
>>>
>> generically as "determinations".
>>
>> I agree.
>>
>>
>>> 3. There is really no indication given on the annotation
>>> labels as to many of the things that we would like to know,
>>> such as the concept they had in mind, any source they used (if any),
>>> or the reason why they did the annotation.  So how does one
>>> connect the name that they applied to the determination when
>>> there is no indication of the concept?
>>>
>>
>> As I said in an earlier post, the single most important way to reduce
>> taxonomic ambiguity is to try to capture (or confidently deduce)  
>> the source
>> (=mapping to taxon concept).  But if it can't be done, then it  
>> can't be done
>> -- so I'm inclined to establish a "place-holder" dwc:Taxon  
>> instance, with no
>> nameAccordingTo, and no other metadata besides the scientificName.
>>
>>
>>> Is this just something we can't do for old annotations
>>> and just something that we try to do from this point forward?
>>>
>>
>> Probably.
>>
>>
>>> 4. The last question is one that I really want to some
>>> opinions about.  It seems to me that there are a number
>>> of reasons why one would apply a determination.
>>>
>>
>> Hmmm....I don't think this is really useful information.  I don't
>> undersatand how you would use this information ina  machine- 
>> processing sort
>> of way.  An Identification is an Identification.  In some cases, the
>> Identifier may not even be aware of the previous identification,  
>> and so we
>> can necessarily infer there was a particular "reason".  And even if  
>> there is
>> a reason, how doe we use that information? What if there is more  
>> than one
>> reason (i.e., if we are restricted to a controlled vocabulary)?
>>
>> As far as I'm concerned, the Identifications should stand as they  
>> are.  If
>> needed people can annotate the Identification instances; but I  
>> don't see the
>> value in machine-processing these things.
>>
>> Also:
>>
>>
>>> Finally, a single determiner might apply
>>> several determinations to one individual and indicate
>>> in each determination the concept intended (i.e. if
>>> you subscribe to Cronquist, you'd call it X; if you
>>> like Radford's book, you'd call it Y; if you like
>>> Weakley's treatment, you'd call it Z).
>>>
>>
>> YIKES!  I don't like the idea of loading all that information on an
>> Identification instance.  If the person wants to make this sort of
>> assertion, then they should establish the appropriate  
>> relationshipAssertion
>> instances among the various taxonConcepts cited.
>>
>> Damn.  Now my head is really tired.  And so is the rest of me....
>>
>> Aloha, and g'night..
>>
>> Rich
>>
>>
>> .
>>
>>
>
> -- 
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu
> <ATT00001.txt>

-------
Arlin Stoltzfus (arlin at umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD
tel: 240 314 6208; web: www.molevol.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101019/a73e493c/attachment-0001.html