Dear All,

Completely agree with Rich’s analysis.

May I coin two new terms? But before, I think we should separate the definition of occurrences from their use, which remove many questions evocated in previous messages):

Occurrence:

At least a triplet (a taxon name, a location, a time); whatever the precision of each member of the triplet is.

The difference is in only in the use that we can do of an occurrence depending on the respective precision of each member of the triplet. Ontologies by definition should reflect the patterns not the processes (although technically I suppose that processes can be described by ontologies … but it is an extension of the meaning of the word).

Name:

From “living organism” (extant or fossil) down to infrasubspecific rank if needed. Can it be a common name? Yes, it may decrease the precision or even the accuracy, that is all.

Location:

From Earth/continent/ocean/catchment down to precise geocoordinates. Earth is always implicit and by default until we find life out in space.

Time:

From 4.5 billion year range / geological era down to a precise date/time stamp.

Here are the two new terms I propose (and more could be coined using the same way):

Geoccurrence: an occurrence with geocoordinates.

Loccurrence: an occurrence with only a locality/geographic name.

Should we coin terms for occurrences resulting from modeling?

Another consideration: species distribution modeling is a rationalization of the production of distribution maps, just like cladistics is a rationalization of the production of phylogenies.

For cladistics, in essence we sample individuals in the real genealogic tree (= tokogenetic tree of Hennig): but can we say that actually all characters used in cladistics lead back to a given individual? Maybe true for molecular data but this statement needs more thinking; I don’t think it is true for morphology, and it is the same way for synthetic descriptions and older works, as Rich described as using all imprecise old records.

Likewise for distribution, we sample individuals, and also use the best of loccurrences based on imprecise location (cf. Jeremy Jackson work on historical records and trends).

As for recording nativeness, I would suggest that it is a general issue for all controlled vocabularies that try to establish categories over a continuum: the only way to get rid of all these problematic definitions, and most probably incl. occurrences, is to express them with fuzzy logic: we can say that a species is more or less native, especially if the abundance is gradient from a center to peripheral areas, and then it could derive from species distribution modeling based on geoccurrences and loccurrences expressed as fuzzy functions.

So the next step is to include fuzzy logic in ontologies ;-). And TDWG becoming a fuzzy think tank ;-).

Nicolas.

From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Richard Pyle
Sent: Wednesday 13 October 2010 07:08
To: tuco@berkeley.edu
Cc: tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com
Subject: Re: [tdwg-content] How to record "Nativeness"?

Thanks, John. I agree that there will be little value in trying to define and name distinct units of space and time, but there may be value in defining units along the taxonomic axis. However, we should first come to a community consensus on what the maximum scope of each axis is.

My sense is that the maximum scope of space is "Earth" (at least until we begin documenting populations of extraterrestrial life).

My sense is that the maximum scope of time is effectively "any window of time during the past 4 billion years or so".

But I don't have a clear sense for what the maximum scope of "one or more organisms" ought to be. I'm content with extending it to "populations" as a unit of "organisms", because I see a smooth transition from two individual organisms all the way up to a population of organisms. But should we accept taxonConcept (which can be thought of as an implied set of populations) as an extension of "organisms"? If so, then "Animalia Occurred on Earth sometime during the past 2 billion years" is a legitimate Occurrence record (pretty damn useless...but still legitimate).

I think it matters, and is relevant to this exchange -- both because of Steve's point about more clearly defining what an "Occurrence" can be, and because we still don't have a good idea of how and where to score "nativeness" (for which there is clearly an expressed need).

I agree that fitness-for-use should be determined from the content of the records, but coming back to Donald's (and others') point about filtering "non-native" records, there needs to be a way to include this information in the content of the records in order to determine fitness-for-use. I believe that a controlled vocabulary for establishmentMeans will probably be all we have to do to satisy 95% of the user need. But before we can nail down what that controlled vocabulary would encompass, I think we need to come to some sort of consensus on the issues that Steve has articulated.

Aloha,

Rich

From: gtuco.btuco@gmail.com [mailto:gtuco.btuco@gmail.com] On Behalf Of John Wieczorek
Sent: Tuesday, October 12, 2010 11:29 AM
To: Richard Pyle
Cc: Steve Baskauf; joel sachs; tdwg-content@lists.tdwg.org; tdwg-bioblitz@googlegroups.com
Subject: Re: [tdwg-content] How to record "Nativeness"?

Occurrence is admittedly a problematic term. Its current definition is vague following in the grand tradition of Dublin Core term definitions. Rich's interpretation echoes what Steve wrote and comes closest in my mind to what an occurrence really is meant to be, namely "evidence of one or more organisms occurring at a place and time." This leaves open all of the vast continuum of scales - geographic, temporal, and taxonomic - at which occurrences can be described. I'm not sure exactly what is solved by trying to make named distinctions between different scales or levels of detail (on any of the three axes) of Occurrence. The core of the issue really boils down to fitness-for-use of records and a potential user's capacity to accurately determine that. These should be characteristics that can be determined from the content of the records.