Re: [tdwg-content] Another example of non-overlapping concepts

19 May 2011

      Hi Nico,

I don't think that we are very far apart.

   The way I recall discussions as the TCS was designed, the role for the
...
DarwinCore was to allow data providers to include sufficient information in
the DC so that the vouchers/observations could be identified to a suitably
authoritative concept. In another realm of the biodiversity informatics net,
that concept would be represented in more depth, and ideally have multiple
relationships mapped and/or inferred to relevant past, present, and future
concepts.
In my mind it would be best to link to an open resolvable concept since even
the original descriptions often not very informative regarding what
individuals should considered instances of the same concept.
One reading of the TCS is that this is something that is applied long after
the specimen is collected and identified, most likely by someone that did
not collect or may not have even seen the actual specimen.
So in some sense this is probably more error prone than the imperfectly
defined concepts I am proposing. Or at least more likely to differ in the
intended meaning of the original observer / identifier.
...
For what it's worth (really not much), I agree that the eBird project is
finding a (the?) pragmatic solution to expanding their contributor base, and
in all likelihood have a pretty good to excellent taxonomy to work with
already. Keep in mind that I study weevils, with 65k species described and
220k species (conservatively) estimated to exist, and a ~ 200 tribes
mid-level classifications based largely on Lacordaire's 1-2 external
character system established in 1863 (claws single, versus paired; virtually
none of these have any phylogenetic value). So experientially I come from
that part of the knowledge spectrum where we're likely centuries away from a
sufficiently stable naming system that includes, say, more than 2/3 of the
actual species diversity.
Yes this is one reason most of my examples involve entities that are not
particularly controversial in their own right.

For now the idea is to concentrate on the model and how different things are
related to each other.

For instance,I noticed that I had some parrot's that were incorrectly marked
up as "expected in" North American.

This will be fixed in the next RDF dump, along with the freebase cougar link
which is now fixed in the online RDF put not in the RDF dump.

This is less about having everything perfect and more about figuring out how
one might markup these kinds of relations.

I had been watching for URI changes in DBpedia and Wikipedia but not
Freebase.

I read a blog recently that described GBIF's efforts to clean up their
occurrence records and thought that their 1 degree areas might make good
candidates for URI's.

http://ff.im/-DwvMu

This would allow them to make cleaned version of species occurrences that
tags each species occurrence to a particular "degree_area"

If they existed I would try to add them to my examples.

Also note that I include name variations that link to the GNI. These are not
true synonyms they should be interpreted as "*someone said this might be a
synonym of the current name*",

In addition, what I call BasionymName is not the same as what others call
it.

I use this field to put in what appears to be the first name used for the
species. In Zoology this would be the name that does not contain ().

It might be best to change to some other term in the future to avoid
confusion.

   I'm not opposed to pragmatic solutions for taxa where it makes sense
...
(again, as if anyone cared..). But, trying to foresee the very substantive
classificatory shifts that many other groups likely still will experience
10, 20, 50, 100 years down the road from now, I think just the same that
there are solid grounds for working out an admittedly non-pragmatic, but
sematically maximally powerful solution. I think it's not unreasonable to
assume that from some groups, classification in 250 years from now will look
just as different from today's system as Linnaeus' 1758 system looks to us
today. He recognized 2 weevil genera and about 90 species. Now we have 5800
genera. We may end up with 15,000.
Yes, you are correct. I am particularly concerned about tying the concepts
to one taxonomic hierarchy.

For your weevil's, I think that there would be some advantages in marking up
what you think exists, documenting those with photos and links specimens and
name variations.
Open accessible versions of these are much more likely to be improved that
ones that are hidden in a dispersed collection of hard to find journals -
many which have limits on the number of photographs etc.

This woud certainly make it easier for someone with a specimen that they
think applies to find you and the other potential concept candidates.

I don't have many weevils so you might want to think about doing this
yourself. My only concern is how to keep the vocabularies etc in sync while
everything is being discussed.

   The problem that at present only taxonomic experts can understand how to
...
retrace the meanings of concepts proposed in the history of the field, and
computers can't yet do this because the data are not marked up and linked to
each other precisely enough (as precisely as an ontological representation
would demand it), is not just going to go away by some top-down adherence to
a "consensus". Most taxonomists jump into the field precisely because they
come to realize that the "consensus" has serious problems (read: "sucks").
So, as long as there is justified taxonomic research, there will be
reclassification. ]And no, IMO coming up with a synapomorphy-based or
node-pointing naming system will not miraculously allow us to have a
reliable system.]
Yes, this will never really go away. My point with eBird and The Plant List
is that there are groups doing this, seemingly without much controversy, so
why is what I propose so controversial?

   To the extent that there is a discussion here on the list as to where the
...
DC and a possible successor of the TCS is going, I think that's a worthwhile
discussion to have.
I hope that others agree with you. :-)

- Pete
...
Regards,
Nico
On 5/13/2011 4:41 PM, Peter DeVries wrote:
Hi Nico,
Thanks for posting this.
I have something in the concept model to indicate the basis for the
species concept.
For now I have three types. An individual species concept can have a
combination of one, two or all three
In the RDF they look like this
<txn:speciesConceptBasedOn rdf:resource="
http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel"/>
The first is what I call the #ObjectiveSpeciesModel - this indicates that
it is a species concept because we say it is.
All the species concepts are at least an #ObjectiveSpeciesModel
*This is in part a way to handle things like the domestic cat which you
want to be seen as different from the African Wildcat.
There are also tags for
txn:PhylogeneticSpeciesModel
txn:BiologicalSpeciesModel
For now I don't have these other models set in the example data, but
fields are in the database and the code for that an editor could state the
basis for the model.
I can think of a couple of different ways to handle the issue of
alternative species concepts.
* Note that the identifications as proposed by DarwinCore don't seem to
indicate what kind of model the identifications were based on.
  So it is not clear to me if a straight DarwinCore data set would allow
the analysis above.
Instead of having multiple different statements like
*txn:occurrenceHasSpeciesConcept <> *in the record for each occurrence
one could use different predicates to link to different kinds of species
concepts.
*txn:occurrenceHasUniprotConcept* => <
http://purl.uniprot.org/taxonomy/9696>
This would allow someone to query for the occurrences of <
http://purl.uniprot.org/taxonomy/9696>
That said, it is not clear to me what people mean by different
identifications.
Is the intent to have identifications with different homotypic synonyms
to be an identification of the same thing or not?
The way it works now in many data sets is that Felis concolor, Puma
concolor and Puma conncolor are treated as identifications of different
things.
This is another way of saying* is the namestring the concept?*
*
*
My understanding of the eBird project is that it allows citizen scientists
to contribute their own observations. This creates a much larger data set
for analysis etc.
They have a created a curated list of species and a ~6 letter code for
each. This serves as a guide for observers on how to encode their
observations.
I think their progress would be inhibited, the occurrence coding
inconsistant, and contributors frustrated, if they have a list that included
many overlapping species concepts.
Thanks again for you comments,
- Pete
On Fri, May 13, 2011 at 3:05 AM, Nico Franz <nico.franz@upr.edu> wrote:
...
Hello Pete (et al.):
For birds, Town Peterson at KU and colleagues have published these
papers showing how alternative bird taxonomies affect the ranking of
conservation priorities.
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_...
Here's the abstract of the 1999 paper:
Analysis of geographic concentrations of endemic taxa is often used to
determine priorities for conservation
action; nevertheless, assumptions inherent in the taxonomic authority list
used as the basis for
analysis are not always considered. We analyzed foci of avian endemism in
Mexico under two alternate species
concepts. Under the biological species concept, 101 bird species are
endemic to Mexico and are concentrated
in the mountains of the western and southern portions of the country.
Under the phylogenetic species
concept, however, total endemic species rises to 249, which are
concentrated in the mountains and lowlands
of western Mexico. Twenty-four narrow endemic biological species are
concentrated on offshore islands, but
97 narrow endemic phylogenetic species show a concentration in the
Transvolcanic Belt of the mainland and
on several offshore islands. Our study demonstrates that conservation
priorities based on concentrations of
endemic taxa depend critically on the particular taxonomic authority
employed and that biodiversity evaluations
need to be developed in collaboration or consultation with practicing
systematic specialists.
There was a debate recently on Taxacom that was started and
subsequently neatly summarized by Fabian Haas. The topic was "let's
summarize reasons why 'donors' seem to not fund taxonomy". One point from
the summary was this:
3) Taxonomy is over-accurate for most applications
Most (not all) decisions in e.g. modelling and conservation are done and
can be done without complete knowledge of taxa. As it is, decisions for
conservation areas are often based on flagship species (e.g. elephants), on
taxa which have an excellent research background, e.g. birds (IBAs), on
availability of land (e.g. land with a high Tsetse burden), importance as
corridor and other factors, but never on a complete view on an all
biodiversity in a specific area. Even if an inventory existed, it would be
an illusion that we could collect data on ecological requirements and
population dynamics for most of the species necessary for informed
decisions. A complete inventory does not seem to provide an advantage for
conservation.
   I personally think there's some truth to that. I also think that, while
it's understandable that an accurate representation of the (sometimes)
fleetingness of taxonomic consensus it not a priority for applied ecological
projects, if taxonomists themselves don't find better ways to document and
link these alternatives perspectives, then it's not the best science we can
do. That would be fine too if adopted outright as a pragmatic stance.
Regards,
Nico
On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants List,
the eBird project also uses on overlapping concepts in its bird list (it
does have concepts for common hybrids)
What is clear to me is that you cannot create graphs like these if every
observation can have X number of species (especially those that overlapping
) without any indication which is is the most appropriate one.
eBird Occurrence Maps Northern Cardinal
http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
NCBI is also similar.
Perhaps a member of the consensus committee can comment?
-- Pete
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept <http://www.taxonconcept.org/>  &  GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
--------------------------------------------------------------------------------------
_______________________________________________
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept <http://www.taxonconcept.org/>  &  GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
--------------------------------------------------------------------------------------
-- 
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept <http://www.taxonconcept.org/>  &
GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
--------------------------------------------------------------------------------------