[tdwg-content] How to record "Nativeness"?

Richard Pyle deepreef at bishopmuseum.org
Tue Oct 12 21:20:14 CEST 2010

I had *promised* myself (and others) that I would not get dragged into a
debate of this sort.  But Roger just gave me too many opportunities to
comment (I changed the subject line to protect the innocent). 

(Tim -- now would be a good time to get yourself a cup of tea/coffee...)

> The vocabulary I briefly presented at TDWG was aimed at occurrence 
> of taxa in regions but the general thrust of my talk was intended 
> to pose the questions: Why should we score taxa to regions at all? 
> Shouldn't this always be the results of a query on occurrence records? 

In an ideal world, yes it should. There are two answers to the question "Why
should we score taxa to regions at all?".  At one level, the reason is
because many, many people conceptualize space in terms of named regions (in
much the same way that we conceptualize the diversity of oragnisms as named
taxon concepts), and it turns out that many end-users are interested in
answering the question, "What lives here?".  But I think Roger's point in
asking that question was more along the lines of, "Why do we want to model
our data in such a way that taxa are linked directly to regions, rather than
derive such distributional information from occurrence records?"  My answer
to that is: "We don't!" (a conclusion I came to over a decade ago).  For a
long time I have been firm believer that all Taxon-Locality statements
should pass through (be derrived from) Occurrence records.

But now we come back to the real world.  If I were to compile the list of
organisms that are known to have occurred in Hawaii, then my list would
probably only be ablut 70% complete if I relied only on documented
observations and collected specimens.  The other 30% of the list would come
from statements published in historical literature, which often do not
record specific details about individual collected specimens or
observations.  All we have in such cases are statements along the lines of
"Jones (1950) reports that the organism he calls 'Aus bus' occurs in

How does such information get recorded in our databases and shared via DwC
terms?  The easy answer is to establish the link directly between a taxon
and a location, as is done in Dave's DwCA Species Distribution extension
(http://rs.gbif.org/extension/gbif/1.0/distribution.xml).  This is probably
fine for "abstracting" distribution information, and is perhaps appropriate
for a DwCA extension.  But in my opinion, it's a suboptimal approach to
structuring the original information.  Another approach is to re-frame the
statement above as:

"We infer from Jones (1950) that at least one organism that he called 'Aus
bus' was observed or collected in Hawaii"

... which allows us to represent this information in the form of an
Occurrence record (albeit a somewhat skeletal one).

But this leads us into what I think is the real crux of the issue, which is
to idenify what the scope of an "Occurrence" is or sould be.

These are the obvious ones:

- Captured one individual of an organism I identify as "Aus bus" at place
and time. 
- Captured a thousand individuals of an organism I identify as "Aus bus" at
place and time [think plankton tow].  
- Observed one individual of an organism I identify as "Aus bus" at place
and time. 
- Observed a thousand individuals of an organism I identify as "Aus bus" at
place and time [think large school of fish, herd of wildebeest, flock of

Not In-Scope:
- Hawaiian Islands; Oahu; Kaneohe (21.410458, -157.774881)
- Aus bus (Linnaeus 1758) sec. Jones 1950

But the question is whether the following statement falls within scope of an

- A population of an organism identified as "Aus bus" occurs at place and

Perhaps the differences in perspectives we're seeing on this thread are a
result of differences on how we would answer that question.

If the answer is "yes, it's within scope", then I would argue that
"nativeness" is a property of an Occurrence (as it already is in DwC, in the
form of establishmentMeans).

If the answer is "no, it's not in scope", then OK -- but in that case, how
does one represent "nativeness" within DwC?  That is, what class of object
would establishmentMeans be a property of?

> The answer will always depend on the question asked.

Yes!  Exactly the point of my previous post.

> As Kevin says, when I observe an individual (or flock of individuals) 
> I do not observe their "introducedness" or their "nativeness" this 
> is something that is derived from combining multiple observations 
> of occurrence of individuals.

If the scope of "Occurrence" is limited to specific individuals at specific
place and time, then I would agree.  But if the scope of "Occurrence"
includes statements about populations of organisms in less-precise places
and times, then I think it does.

> I would therefore advocate that we just have a flag on an occurrence 
> record that says "intended for distribution" i.e. this is not 
> maintained here in a garden/zoo/farm etc. To say any more on a 
> occurrence record is misleading and there are occasions when 
> even this flag will be ignored in analysis. I think we already 
> have this field.
> There are of course grey areas (biology always has grey areas). 
> A Scots Pine growing in the highlands may be part of a 150 year 
> old naturalistic plantation. It is therefore native to the region, 
> possibly of local genetic stock but has been planted in that position. 
> For some applications this could be considered managed and for others not.

"Managed" is only one metric of consideration for how to score "intended for
distribution" (and a relatively-straightforward one at that).  A bit more
subtle is the issue that Gail was driving at (i.e., Ross's gull in
Massachusetts).  For some use-cases, you would want to score that one as
"intended for distribution" (if your question was about the potential for
the species to disperse without the aid of humans); in other use-cases,
you'd want to fileter it out (if your question was about where the statined
breeding populations were).  There are many other metrics of this sort
which, I'm certain, would get hopelessly lost in a simple "intended for
distribution" flag.

> The status of taxa in regions is a completely different thing. 

Well....not completely different.  We're talking shades of grey; not black
and white.

> As soon as we talk about aggregating multiple observations (or lack of
> then we are talking about the results of analysis instead of primary

Hmmmm....I get where you're coming from on the analysis thing -- but our
databases are absolutely loaded with instances of aggregated multiple
observations (and even aggregated specimens).  And getting back to the
examples of "population of Aus bus occurs at pace time" example, clearly
this is an aggregation, and probably an interpolation, but I'm not so sure
it's merely the result of an analysis.

> Only at this point should we be talking about the status of the
> in terms of native/invasive/naturalised etc. This may not even be based on

> extant records. For example, a taxon can be invasive in an area without 
> actually occurring there. i.e. it used to be there but is presumed to be

Since when do we limit or "proper" occurrence records to extant only?

> Does the problem occur because we are using the same term "occurrence" to
> both a primary unit of data gathering and the result of an analysis
> even just a hypothesis if it is the result of niche modelling)? How could
> differentiate between these two? The discussion probably comes back to 
> 'basisOfRecord' again and our fundamental classes of object. 

Again, I think this is the crux of the issue.  I wasn't even going to reply
at all until I got to this paragraph; which triggered the enormous tome
above. The distinction between "primary unit of data gathering" and "result
of an analysis" is not as stark as you make it out to be.  There's a lot
in-between, which unfortunately includes things to which we would logically
apply the notion of "nativeness" to.

> Sorry to be long winded. 



More information about the tdwg-content mailing list