Hi Nico,

I don't think that we are very far apart.

   The way I recall discussions as the TCS was designed, the role for the DarwinCore was to allow data providers to include sufficient information in the DC so that the vouchers/observations could be identified to a suitably authoritative concept. In another realm of the biodiversity informatics net, that concept would be represented in more depth, and ideally have multiple relationships mapped and/or inferred to relevant past, present, and future concepts.

In my mind it would be best to link to an open resolvable concept since even the original descriptions often not very informative regarding what individuals should considered instances of the same concept.
One reading of the TCS is that this is something that is applied long after the specimen is collected and identified, most likely by someone that did not collect or may not have even seen the actual specimen.
So in some sense this is probably more error prone than the imperfectly defined concepts I am proposing. Or at least more likely to differ in the intended meaning of the original observer / identifier.
 

   For what it's worth (really not much), I agree that the eBird project is finding a (the?) pragmatic solution to expanding their contributor base, and in all likelihood have a pretty good to excellent taxonomy to work with already. Keep in mind that I study weevils, with 65k species described and 220k species (conservatively) estimated to exist, and a ~ 200 tribes mid-level classifications based largely on Lacordaire's 1-2 external character system established in 1863 (claws single, versus paired; virtually none of these have any phylogenetic value). So experientially I come from that part of the knowledge spectrum where we're likely centuries away from a sufficiently stable naming system that includes, say, more than 2/3 of the actual species diversity.

Yes this is one reason most of my examples involve entities that are not particularly controversial in their own right. 

For now the idea is to concentrate on the model and how different things are related to each other. 

For instance,I noticed that I had some parrot's that were incorrectly marked up as "expected in" North American.

This will be fixed in the next RDF dump, along with the freebase cougar link which is now fixed in the online RDF put not in the RDF dump.

This is less about having everything perfect and more about figuring out how one might markup these kinds of relations.

I had been watching for URI changes in DBpedia and Wikipedia but not Freebase.

I read a blog recently that described GBIF's efforts to clean up their occurrence records and thought that their 1 degree areas might make good candidates for URI's.

http://ff.im/-DwvMu

This would allow them to make cleaned version of species occurrences that tags each species occurrence to a particular "degree_area"

If they existed I would try to add them to my examples.

Also note that I include name variations that link to the GNI. These are not true synonyms they should be interpreted as "someone said this might be a synonym of the current name",

In addition, what I call BasionymName is not the same as what others call it. 

I use this field to put in what appears to be the first name used for the species. In Zoology this would be the name that does not contain ().

It might be best to change to some other term in the future to avoid confusion.


   I'm not opposed to pragmatic solutions for taxa where it makes sense (again, as if anyone cared..). But, trying to foresee the very substantive classificatory shifts that many other groups likely still will experience 10, 20, 50, 100 years down the road from now, I think just the same that there are solid grounds for working out an admittedly non-pragmatic, but sematically maximally powerful solution. I think it's not unreasonable to assume that from some groups, classification in 250 years from now will look just as different from today's system as Linnaeus' 1758 system looks to us today. He recognized 2 weevil genera and about 90 species. Now we have 5800 genera. We may end up with 15,000.


Yes, you are correct. I am particularly concerned about tying the concepts to one taxonomic hierarchy.

For your weevil's, I think that there would be some advantages in marking up what you think exists, documenting those with photos and links specimens and name variations. 
Open accessible versions of these are much more likely to be improved that ones that are hidden in a dispersed collection of hard to find journals - many which have limits on the number of photographs etc.

This woud certainly make it easier for someone with a specimen that they think applies to find you and the other potential concept candidates.

I don't have many weevils so you might want to think about doing this yourself. My only concern is how to keep the vocabularies etc in sync while everything is being discussed.

   The problem that at present only taxonomic experts can understand how to retrace the meanings of concepts proposed in the history of the field, and computers can't yet do this because the data are not marked up and linked to each other precisely enough (as precisely as an ontological representation would demand it), is not just going to go away by some top-down adherence to a "consensus". Most taxonomists jump into the field precisely because they come to realize that the "consensus" has serious problems (read: "sucks"). So, as long as there is justified taxonomic research, there will be reclassification. ]And no, IMO coming up with a synapomorphy-based or node-pointing naming system will not miraculously allow us to have a reliable system.]


Yes, this will never really go away. My point with eBird and The Plant List is that there are groups doing this, seemingly without much controversy, so why is what I propose so controversial? 

   To the extent that there is a discussion here on the list as to where the DC and a possible successor of the TCS is going, I think that's a worthwhile discussion to have.


I hope that others agree with you. :-)

- Pete

 
Regards,

Nico



On 5/13/2011 4:41 PM, Peter DeVries wrote:
Hi Nico,

Thanks for posting this.

I have something in the concept model to indicate the basis for the species concept.

For now I have three types. An individual species concept can have a combination of one, two or all three

In the RDF they look like this

<txn:speciesConceptBasedOn rdf:resource="http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel"/>

The first is what I call the #ObjectiveSpeciesModel - this indicates that it is a species concept because we say it is.

All the species concepts are at least an #ObjectiveSpeciesModel

*This is in part a way to handle things like the domestic cat which you want to be seen as different from the African Wildcat.

There are also tags for 

txn:PhylogeneticSpeciesModel
txn:BiologicalSpeciesModel

For now I don't have these other models set in the example data, but fields are in the database and the code for that an editor could state the basis for the model.

I can think of a couple of different ways to handle the issue of alternative species concepts.

* Note that the identifications as proposed by DarwinCore don't seem to indicate what kind of model the identifications were based on.
  So it is not clear to me if a straight DarwinCore data set would allow the analysis above.

Instead of having multiple different statements like 

txn:occurrenceHasSpeciesConcept <> in the record for each occurrence

one could use different predicates to link to different kinds of species concepts.

txn:occurrenceHasUniprotConcept => <http://purl.uniprot.org/taxonomy/9696>

This would allow someone to query for the occurrences of <http://purl.uniprot.org/taxonomy/9696>

That said, it is not clear to me what people mean by different identifications.

Is the intent to have identifications with different homotypic synonyms to be an identification of the same thing or not?

The way it works now in many data sets is that Felis concolor, Puma concolor and Puma conncolor are treated as identifications of different things.

This is another way of saying is the namestring the concept?

My understanding of the eBird project is that it allows citizen scientists to contribute their own observations. This creates a much larger data set for analysis etc.

They have a created a curated list of species and a ~6 letter code for each. This serves as a guide for observers on how to encode their observations.

I think their progress would be inhibited, the occurrence coding inconsistant, and contributors frustrated, if they have a list that included many overlapping species concepts.

Thanks again for you comments,

- Pete
 
On Fri, May 13, 2011 at 3:05 AM, Nico Franz <nico.franz@upr.edu> wrote:
Hello Pete (et al.):

   For birds, Town Peterson at KU and colleagues have published these papers showing how alternative bird taxonomies affect the ranking of conservation priorities.


http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_1999.pdf
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_2004.pdf
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_2006.pdf

   Here's the abstract of the 1999 paper:

Analysis of geographic concentrations of endemic taxa is often used to determine priorities for conservation
action; nevertheless, assumptions inherent in the taxonomic authority list used as the basis for
analysis are not always considered. We analyzed foci of avian endemism in Mexico under two alternate species
concepts. Under the biological species concept, 101 bird species are endemic to Mexico and are concentrated
in the mountains of the western and southern portions of the country. Under the phylogenetic species
concept, however, total endemic species rises to 249, which are concentrated in the mountains and lowlands
of western Mexico. Twenty-four narrow endemic biological species are concentrated on offshore islands, but
97 narrow endemic phylogenetic species show a concentration in the Transvolcanic Belt of the mainland and
on several offshore islands. Our study demonstrates that conservation priorities based on concentrations of
endemic taxa depend critically on the particular taxonomic authority employed and that biodiversity evaluations
need to be developed in collaboration or consultation with practicing systematic specialists.

   There was a debate recently on Taxacom that was started and subsequently neatly summarized by Fabian Haas. The topic was "let's summarize reasons why 'donors' seem to not fund taxonomy". One point from the summary was this:

3) Taxonomy is over-accurate for most applications

Most (not all) decisions in e.g. modelling and conservation are done and can be done without complete knowledge of taxa. As it is, decisions for conservation areas are often based on flagship species (e.g. elephants), on taxa which have an excellent research background, e.g. birds (IBAs), on availability of land (e.g. land with a high Tsetse burden), importance as corridor and other factors, but never on a complete view on an all biodiversity in a specific area. Even if an inventory existed, it would be an illusion that we could collect data on ecological requirements and population dynamics for most of the species necessary for informed decisions. A complete inventory does not seem to provide an advantage for conservation.

   I personally think there's some truth to that. I also think that, while it's understandable that an accurate representation of the (sometimes) fleetingness of taxonomic consensus it not a priority for applied ecological projects, if taxonomists themselves don't find better ways to document and link these alternatives perspectives, then it's not the best science we can do. That would be fine too if adopted outright as a pragmatic stance.

Regards,

Nico



On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants List, the eBird project also uses on overlapping concepts in its bird list (it does have concepts for common hybrids)

What is clear to me is that you cannot create graphs like these if every observation can have X number of species (especially those that overlapping ) without any indication which is is the most appropriate one.

eBird Occurrence Maps Northern Cardinal
NCBI is also similar.

Perhaps a member of the consensus committee can comment?

-- Pete
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept  &  GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data  Project
--------------------------------------------------------------------------------------
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content


_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content




--
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept  &  GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data  Project
--------------------------------------------------------------------------------------




--
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept  &  GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data  Project
--------------------------------------------------------------------------------------