[tdwg-content] How to record "Nativeness"?

Richard Pyle deepreef at bishopmuseum.org
Tue Oct 12 22:11:24 CEST 2010

Damn!  I wish I'd read this before writing that massive reply to Roger
(note: I'm trying to move this into the new thread with the new subject

I agree with most of what Steve wrote, but I still disagree (as I did with
Roger) that the distinction between Steve's two meanings of "Occurrence" is
so stark. I agree there is a fundamental distinction from the perspective of
data management between:




My contention, however, is that "TaxonConcept<-->Location" is often
(usually? always?) just a short-hand (scant metadata) way of representing
"TaxonConcept<-->Occurrence<-->Event<-->Location".  Our domain (biodiversity
information) is full of these overloaded short-hand terms, and they're often
not easy to detect as such (e.g., so many databases simply represent implied
taxon concepts as text-string scientific names).

In my mind, the "essence" of an Occurrence is, ultimately, "organism(s) at
place and time".  The "place and time" part are represented as a dwc:Event
class linked to a dcterms:Location class.  The tricky part is what do we
mean by "organism(s)".  I suspect most would agree that an individual bird
falls within scope of "organism(s)" in the case of dwc Occurrence.  I
further suspect that most would agree that a flock of birds also falls
within scope.

But what about a population of birds?  No?  What is a population, other than
a set of individual organisms?  How is this different from a "flock" (a
smaller set of individual organisms)?

And what about a taxon concept?  No?  What is a taxon concept, other than a
(larger) set of individual organisms?

The fact is, there is a smooth continuum spanning:


Each of the four items above has overlapping scope with adjacent items in
the list (the overlap between the first two is evident in colonial

Steve, many thanks for sending the link to your paper.  I apologize that I
have not read it yet (I often don't have time to stay on top of this list,
so I missed your earlier reference to it), but I will.  Just be aware that
while my contributions to this thread are not in published/peer-reviewd
form, they are nevertheless the result of more than two decades of dealing
with biodiversity datasets, and very careful thinking and reasoning (i.e.,
as much as it may seem otherwise, these are much more than
spur-of-the-momnent rants).


> -----Original Message-----
> From: tdwg-content-bounces at lists.tdwg.org 
> [mailto:tdwg-content-bounces at lists.tdwg.org] On Behalf Of 
> Steve Baskauf
> Sent: Tuesday, October 12, 2010 6:37 AM
> To: joel sachs
> Cc: tdwg-content at lists.tdwg.org; tdwg-bioblitz at googlegroups.com
> Subject: Re: [tdwg-content] What I learned at the TechnoBioBlitz
> This conversation about values for basisOfRecord, 
> establishmentMeans, and the nature of what actually 
> constitutes a dwc:Occurrence is very important.  We have 
> sitting on the table before us several official requests for 
> additions and modifications to Darwin Core:
> http://code.google.com/p/darwincore/issues/detail?id=68
> http://code.google.com/p/darwincore/issues/detail?id=69
> http://code.google.com/p/darwincore/issues/detail?id=80
> and
> http://code.google.com/p/darwincore/issues/detail?id=81
> that cannot and should not be decided until this discussion 
> occurs.  In particular, a discussion of what exactly a 
> dwc:Occurrence is lies at the heart of much of what we are 
> discussing in this thread and is critical to other processes 
> that are moving forward, such as guidelines for how we 
> represent things in RDF.  On this list I requested discussion 
> on this suite of topics when I proposed the Darwin Core 
> modifications, and I requested to members of the TAG that 
> this discussion happen at the TDWG meeting.  It didn't happen 
> either place, so I'm glad it's happening here now.
> Roger has correctly noted that we colloquially talk about 
> Occurrences in two ways that are fundamentally different.  We 
> use Occurrence (1) to mean that a species occurs generically 
> at a particular locality (the "checklist" use), and (2) we 
> talk about particular instances of particular individual 
> organisms being noticed at a particular place at a particular 
> time.  Based on the clarification that John Wieczorek gave in 
> the thread that surrounds 
> http://lists.tdwg.org/pipermail/tdwg-content/2009-October/000280.html,
> an Occurrence record simply asserts that an organism was 
> someplace at a certain time (and doesn't imply any fitness of 
> use such as for documenting distributions).  This is 
> consistent with meaning (2).  I think that the "checklist" 
> use (meaning 1) really should be called something else 
> because it is conceptually something very different.
> Assuming that when we talk about a dwc:Occurrence we intend 
> meaning (2), it is important to clarify what aspect of an 
> organism occurring somewhere at some time we intend for 
> dwc:Occurrence to mean.  When people talk about Occurrences, 
> the conversation often goes awry because people are 
> considering an Occurrence to include more or fewer conceptual 
> entities.  I don't know if images can be embedded in messages 
> sent to the list, so look at this image:
> http://bioimages.vanderbilt.edu/pages/resource-diagram.gif
> before reading further.  In that diagram, I'm trying to be as 
> generic as possible.  I think it is the intention of both 
> TDWG and GBIF to go beyond thinking that Occurrences can only 
> be specimens.  So consider that this generic Occurrence could 
> be a PreservedSpecimen, but could also be an image of an 
> organism, DNA sample, or any other token of the presence of 
> the Organism at a particular time and place (or a 
> HumanObservation that has no token at all).  I have heard 
> people say that an Occurrence is a dctype:Event.  That 
> recognizes the arrow on the left side of the diagram which 
> represents the time and place of the Occurrence.  I have 
> heard people say that if we photograph an organism, that is 
> an "observation" with associated media.  That recognizes the 
> collected metadata (i.e. the "observation") part and the 
> representation of the organism part (the photograph).  When 
> we talk about a PreservedSpecimen being an Occurrence, we 
> probably intend the metadata as well as the physical thing in 
> a jar or glued to a sheet of paper (the representation of the 
> organism) and may or may not include the arrow on the left.  
> I have taken the position that an Occurrence includes all of 
> the components shown in the diagram.  I'm not saying that 
> this is the correct or only view on this subject, but if 
> somebody intends for an Occurrence to mean something else, 
> then they need to be clear about which component(s) of the 
> diagram they are talking about.
> Being conceptually clear about these things is important 
> because that clarity informs the decision-making process 
> about the pending issues that I mentioned, such as whether 
> DigitalStillImage should be added as a DwC type (and hence 
> have a URI and be an accepted value for
> dwc:basisOfRecord) and how we should structure RDF when we 
> try to describe the properties of an Occurrence.  If by 
> "basisOfRecord" we mean a representation or token on which 
> the Occurrence is based (or lack of token in the case of 
> observations), then we should add as DwC types any type of 
> physical or digital artifact that will be used by several 
> people to document that an Occurrence existed at some point.  
> It would not make logical sense to say that sometimes the 
> basisOfRecord can be an artifact like a specimen, but other 
> supporting artifacts such as digital images cannot and must 
> be relegated to being associatedMedia.
> I am not going to say more on this topic right now, partly 
> because I have mid-semester progress reports to finish by the 
> end of the day, but mostly because I wrote a paper discussing 
> these issues and it lays out the conceptual framework I'm 
> talking about better than I can in an email.  I have cited 
> that paper both in my requests for the Darwin Core changes 
> and in previous emails to this list.  However, based on the 
> various emails that have been flying around, I don't think 
> many people on the list have read it.  That paper isn't a 
> spur of the moment rant.
> I spent over a year writing it, solicited and received 
> comments about it from a number of people including several 
> people on the TAG, and went through the peer review process 
> for several months before it was finally published this 
> spring.  It does not necessarily represent "the correct"
> view on the topics that we are discussing, but I believe that 
> it does represent a logically consistent way of 
> conceptualizing Occurrences and how a broad range of types of 
> Occurrences can be described and related to other resources.  
> If others can present clear and consistent alternatives to 
> the framework that I've suggested, I would like to hear what 
> they are.  The article, Biodiversity Informatics 7:14-44 can 
> be accessed at 
> https://journals.ku.edu/index.php/jbi/article/view/3664 .
> In particular, take note of the discussion on p.27-28 
> regarding the criterion for determining whether an Occurrence 
> documents a species'
> distribution, p. 28 where I discuss the difference between 
> the use of dwc:recordedBy and dcterms:created, and p. 29 
> where I suggest controlled values for dwc:establishmentMeans 
> that can be used for differentiating the extent to which an 
> individual documented by an Occurrence occurs "naturally" at 
> its location (native, naturalized, adventive, or cultivated - 
> intended to apply to either plants or animals; a farm or zoo 
> animal would be considered "cultivated"-I would be happy to 
> define and propose these as a controlled vocabulary).  These 
> are all things that have come up in this thread.  I also 
> should note that I have been successfully applying this 
> framework to live plant images at 
> http://bioimages.vanderbilt.edu where I serve RDF that is 
> consistent with the design discussed in the paper.
> I would like to say more about the relationship between 
> LivingSpecimens, Individuals, establishmentMeans, and 
> indicating whether an Occurrence document's a species' 
> distribution, but that will have to wait until later.
> Steve Baskauf
> joel sachs wrote:
> > One of the goals of the recent bioblitz was to think about the 
> > suitability and appropriatness of TDWG standards for 
> citizen science. 
> > Robert Stevenson has volunteered to take the lead on preparing a 
> > technobioblitz lessons learned document, and though the 
> scope of this 
> > document is not yet determined, I think the audience will include 
> > bioblitz organizers, software developers, and TDWG as a 
> whole. I hope 
> > no one is shy about sharing lessons they think they learned, or 
> > suggestions that they have. We can use the bioblitz google 
> group for 
> > this discussion, and copy in tdwg-content when our 
> discussion is standards-specific.
> >
> > Here are some of my immediate observations:
> >
> > 1. Darwin Core is almost exactly right for citizen science. 
> However, 
> > there is a desperate need for examples and templates of its use. To 
> > illustrate this need: one of the developers spoke of the 
> design choice 
> > between "a simple csv file and a Darwin Core record". But a 
> simple csv 
> > file is a legitimate representation of Darwin Core! To be 
> fair to the 
> > developer, such a sentence might not have struck me as 
> absurd a year 
> > ago, before Remsen said "let's use DwC for the bioblitz".
> >
> > We provided a couple of example DwC records (text and rdf) in the 
> > bioblitz data profile [1]. I  think the lessons learned document 
> > should include an on-line catalog of cut-and-pasteable examples 
> > covering a variety of use cases, together with a dead simple 
> > desciption of DwC, something like "Darwin Core is a 
> collection of terms, together with definitions."
> >
> > Here are areas where we augemented or diverged from DwC in 
> the bioblitz:
> >
> > i. We added obs:observedBy [2], since there is no 
> equivalent property 
> > in DwC, and it's important in Citizen Science (though often 
> not available).
> >
> > ii. We used geo:lat and geo:long [3] instead of DwC terms 
> for latitude 
> > and longitude. The geo namespace is a well used and supported 
> > standard, and records with geo coordinates are 
> automatically mapped by 
> > several applications. Since everyone was using GPS  to 
> retrieve their 
> > coordinates, we were able to assume WGS-84 as the datum.
> >
> > If someone had used another Datum, say XYZ, we would have added 
> > columns to the Fusion table so that they could have expressed their 
> > coordiantes in DwC, as, e.g.:
> > DwC:decimalLatitude=41.5
> > DwC:decimalLongitude=-70.7
> > DwC:geodeticDatum=XYZ
> >
> > (I would argue that it should be kosher DwC to express the above as 
> > simply XYZ:lat and XYZ:long. DwC already incorporates terms 
> from other 
> > namespaces, such as Dublin Core, so there is precedent for this.
> >
> > 2. DwC:scientificName might be more user friendly than 
> > taxonomy:binomial and the other taxonomy machine tags EOL uses for 
> > flickr images.  If DwC:scientificName isn't 
> self-explanatory enough, a 
> > user can look it up, and see that any scientific name is 
> acceptable, 
> > at any taxonomic rank, or not having any rank. And once we have a 
> > scientific name, higher ranks can be inferred.
> >
> > 3. Catalogue of Life was an important part of the workflow, 
> but we had 
> > some problems with it. Future bioblitzes might consider using 
> > something like a CoL fork, as recently described by Rod Page [4].
> >
> > 4. We didn't include "basisOfRecord" in the original data 
> profile, and 
> > so it wasn't a column in the Fusion Table [5]. But when a 
> transcriber 
> > felt it was necessary to include in order to capture data in a 
> > particular field sheet, she just added the column to the 
> table. This 
> > flexibility of schema is important, and is in harmony with 
> the semantic web.
> >
> > 5. There seemed to be enthusiasm for another field event at next 
> > year's TDWG. This could be an opportunity to gather other 
> types of data (eg.
> > character data) and thereby
> > i) expose meeting particpants to another set of everyday 
> problems from 
> > the world of biodiversity workflows, and ii) try other TDWG 
> technology 
> > on for size, e.g. the observation exchange format, 
> annotation framework, etc.
> >
> >
> > Happy Thanksgiving to all in Canada -
> > Joel.
> > ----
> >
> >
> > 1. 
> > 
> http://groups.google.com/group/tdwg-bioblitz/web/tdwg-bioblitz-profile
> > -v1-1 2. Slightly bastardizing our old observation ontology - 
> > http://spire.umbc.edu/ontologies/Observation.owl
> > 3. http://www.w3.org/2003/01/geo/
> > 4. 
> > 
> http://iphylo.blogspot.com/2010/10/replicating-and-forking-data-in-201
> > 0.html 5. http://tables.googlelabs.com/DataSource?dsrcid=248798
> >
> > _______________________________________________
> > tdwg-content mailing list
> > tdwg-content at lists.tdwg.org
> > http://lists.tdwg.org/mailman/listinfo/tdwg-content
> > .
> >
> >
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt 
> University Dept. of Biological Sciences
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707 
> http://bioimages.vanderbilt.edu
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content

More information about the tdwg-content mailing list