ARGHH!!
I had *promised* myself (and others) that I would not get dragged into a debate of this sort. But Roger just gave me too many opportunities to comment (I changed the subject line to protect the innocent).
(Tim -- now would be a good time to get yourself a cup of tea/coffee...)
The vocabulary I briefly presented at TDWG was aimed at occurrence of taxa in regions but the general thrust of my talk was intended to pose the questions: Why should we score taxa to regions at all? Shouldn't this always be the results of a query on occurrence records?
In an ideal world, yes it should. There are two answers to the question "Why should we score taxa to regions at all?". At one level, the reason is because many, many people conceptualize space in terms of named regions (in much the same way that we conceptualize the diversity of oragnisms as named taxon concepts), and it turns out that many end-users are interested in answering the question, "What lives here?". But I think Roger's point in asking that question was more along the lines of, "Why do we want to model our data in such a way that taxa are linked directly to regions, rather than derive such distributional information from occurrence records?" My answer to that is: "We don't!" (a conclusion I came to over a decade ago). For a long time I have been firm believer that all Taxon-Locality statements should pass through (be derrived from) Occurrence records.
But now we come back to the real world. If I were to compile the list of organisms that are known to have occurred in Hawaii, then my list would probably only be ablut 70% complete if I relied only on documented observations and collected specimens. The other 30% of the list would come from statements published in historical literature, which often do not record specific details about individual collected specimens or observations. All we have in such cases are statements along the lines of "Jones (1950) reports that the organism he calls 'Aus bus' occurs in Hawaii".
How does such information get recorded in our databases and shared via DwC terms? The easy answer is to establish the link directly between a taxon and a location, as is done in Dave's DwCA Species Distribution extension (http://rs.gbif.org/extension/gbif/1.0/distribution.xml). This is probably fine for "abstracting" distribution information, and is perhaps appropriate for a DwCA extension. But in my opinion, it's a suboptimal approach to structuring the original information. Another approach is to re-frame the statement above as:
"We infer from Jones (1950) that at least one organism that he called 'Aus bus' was observed or collected in Hawaii"
... which allows us to represent this information in the form of an Occurrence record (albeit a somewhat skeletal one).
But this leads us into what I think is the real crux of the issue, which is to idenify what the scope of an "Occurrence" is or sould be.
These are the obvious ones:
In-Scope: - Captured one individual of an organism I identify as "Aus bus" at place and time. - Captured a thousand individuals of an organism I identify as "Aus bus" at place and time [think plankton tow]. - Observed one individual of an organism I identify as "Aus bus" at place and time. - Observed a thousand individuals of an organism I identify as "Aus bus" at place and time [think large school of fish, herd of wildebeest, flock of birds].
Not In-Scope: - Hawaiian Islands; Oahu; Kaneohe (21.410458, -157.774881) - Aus bus (Linnaeus 1758) sec. Jones 1950
But the question is whether the following statement falls within scope of an "Occurrence"
- A population of an organism identified as "Aus bus" occurs at place and time
Perhaps the differences in perspectives we're seeing on this thread are a result of differences on how we would answer that question.
If the answer is "yes, it's within scope", then I would argue that "nativeness" is a property of an Occurrence (as it already is in DwC, in the form of establishmentMeans).
If the answer is "no, it's not in scope", then OK -- but in that case, how does one represent "nativeness" within DwC? That is, what class of object would establishmentMeans be a property of?
The answer will always depend on the question asked.
Yes! Exactly the point of my previous post.
As Kevin says, when I observe an individual (or flock of individuals) I do not observe their "introducedness" or their "nativeness" this is something that is derived from combining multiple observations of occurrence of individuals.
If the scope of "Occurrence" is limited to specific individuals at specific place and time, then I would agree. But if the scope of "Occurrence" includes statements about populations of organisms in less-precise places and times, then I think it does.
I would therefore advocate that we just have a flag on an occurrence record that says "intended for distribution" i.e. this is not maintained here in a garden/zoo/farm etc. To say any more on a occurrence record is misleading and there are occasions when even this flag will be ignored in analysis. I think we already have this field.
There are of course grey areas (biology always has grey areas). A Scots Pine growing in the highlands may be part of a 150 year old naturalistic plantation. It is therefore native to the region, possibly of local genetic stock but has been planted in that position. For some applications this could be considered managed and for others not.
"Managed" is only one metric of consideration for how to score "intended for distribution" (and a relatively-straightforward one at that). A bit more subtle is the issue that Gail was driving at (i.e., Ross's gull in Massachusetts). For some use-cases, you would want to score that one as "intended for distribution" (if your question was about the potential for the species to disperse without the aid of humans); in other use-cases, you'd want to fileter it out (if your question was about where the statined breeding populations were). There are many other metrics of this sort which, I'm certain, would get hopelessly lost in a simple "intended for distribution" flag.
The status of taxa in regions is a completely different thing.
Well....not completely different. We're talking shades of grey; not black and white.
As soon as we talk about aggregating multiple observations (or lack of
them)
then we are talking about the results of analysis instead of primary
observations.
Hmmmm....I get where you're coming from on the analysis thing -- but our databases are absolutely loaded with instances of aggregated multiple observations (and even aggregated specimens). And getting back to the examples of "population of Aus bus occurs at pace time" example, clearly this is an aggregation, and probably an interpolation, but I'm not so sure it's merely the result of an analysis.
Only at this point should we be talking about the status of the
"occurrence"
in terms of native/invasive/naturalised etc. This may not even be based on
extant records. For example, a taxon can be invasive in an area without actually occurring there. i.e. it used to be there but is presumed to be
irradiated.
Since when do we limit or "proper" occurrence records to extant only?
Does the problem occur because we are using the same term "occurrence" to
mean
both a primary unit of data gathering and the result of an analysis
(possibly
even just a hypothesis if it is the result of niche modelling)? How could
we
differentiate between these two? The discussion probably comes back to 'basisOfRecord' again and our fundamental classes of object.
Again, I think this is the crux of the issue. I wasn't even going to reply at all until I got to this paragraph; which triggered the enormous tome above. The distinction between "primary unit of data gathering" and "result of an analysis" is not as stark as you make it out to be. There's a lot in-between, which unfortunately includes things to which we would logically apply the notion of "nativeness" to.
Sorry to be long winded.
Likewise!
Aloha, Rich