Hi Matt,

It took me a while to ponder your question. There is a long answer which complex and easily misinterpreted and there is a shorter answer.

For now I think the "shorter" answer set in a historical context is best.

The best use of my abilities seems to be recognizing a "ability gap" and figuring out a technical solution or tool to address it.

The most visible of these were involving microscopy and visualization tools to make complex ideas understandable.

My interest in the species problem dates back to when I had the opportunity to talk with E.O. WIlson in 1991/1992. 

At that time he said that if you have a knack for computers we need all this information in databases so it is accessible.

*One of his former Ph.D. students is on my committee.

Years later I had the opportunity to work on questions like this and started to think about how to connect all these disparate facts about species together in a usable queryable knowledge base.

I noticed that several groups and individuals were marking up data sets including observations with different scientific names even though they were clearly meaning the same "species".

* These groups would agree that they were communicating about the same species, but not always agree on the name

This prevents large scale data integration and analysis which in part is described here: http://about.geospecies.org/

With the advent of the web, and the the semantic web in particular, this "database" could be global and almost infinitely scalable.

I started lobbying TDWG starting in 2006 for two things:

1) A GUID for the "species" that was not tied to a particular name string
2) A system that followed semantic web best practices which LSID etc. do not.

Since my TDWG efforts were not successful, I started GeoSpecies and based on comments from a semantic web expert modified these somewhat into what is now TaxonConcept.org

The TCS is an xml standard for transmitting information about a taxon concepts that I think maps best to a "name use concept." (Rich's TNU's)

The TaxonConcepts are identified with semantic web GUIDs that follow semantic web best practices and resolve to an informative documents.

In their current form these documents are not ideal because they do not do a good enough job clearing up what would be the best concept match for a given individual or specimen.

They do however have most of the plumbing for this in that they allow semantic web links to name uses, specimens, occurrence records, images, DNA, authors and publications including the original description.

They also link to similar entities that are on the semantic web, most notably DBpedia, Uniprot, Freebase, Bio2RDF etc. 

This linking may not seem valuable to a humans, is valuable for machines that need to determine what entities are similar and what entities are different.

This also increases the "findability" of these other data sets.

I see my current set of about 105,000 species as an example set that people can use to try out these models.

In their final form these should be authored by editors that determine what specimens and other data are good examples of instances of these concepts.

The editors will be linked via a URI so it is easy to track attribution.

The final concepts do not have to be in one place, they could be distributed but to avoid the kinds of nomenclatural differences that have occurred between zoology / botany etc it would be best to have one code base for now.

They don't have to have the same underlying stack, which now is based on Ruby on Rails, but could be ported to anything.

What they do need is a common structure and a common understanding as to what each attribute means and how it can be appropriately used.

For some use cases it is appropriate to consider the following the same "thing"

 http://lod.taxonconcept.org/ses/v6n7p#Species 

 http://purl.uniprot.org/taxonomy/9696

 http://www.freebase.com/view/en/cougar

http://sw.opencyc.org/concept/Mx4rvVj5o5wpEbGdrcN5Y29ycA

http://www.bbc.co.uk/nature/species/Cougar#species

For other use cases, this sameAs is not appropriate.

Wikipedia is very valuable, but if someone changes the article title then the URI changes in DBpedia.

Uniprot and Bio2RDF are useful in that they link to lots of related data but they don't really give you any information about what specimens are instances of that concept and they only have those species which have NCBI ID's.

What I want is a set of GUID's that resolve to a human readable HTML page and an RDF representation that people can use to "tag" their data.

For instance:

I am going to assert that what I have under the microscope is an instance of the concept described on this page. I do not tie this assertion to a particular name or classification hierarchy.

Because it makes no sense to replicate the functionality of the Encyclopedia of Life etc., I am mainly concentrating on the RDF representations and testing if they behave as expected in SPARQL queries.

* The HTML pages are not really pretty or as informative as the RDF or as the concept as viewed in the knowledge base.

I have been working with the Encyclopedia of Life and GNI groups for a while exploring how these may or may not be useful to them.

During my visited Woods Hole I said that I have no interest in building and empire I just want to build a solution and would like to partner with them and GBIF.

Although I remain active on TDWG I find the most valuable suggestions seem to come from the LOD community since we seem to have a common goal - that is creating something that works in a reasonable amount of time.

Also, in the LOD cloud every linked data set increases the value of all the other data sets.

This is probably more than your question required, but it provides some explanation as to what these are and why I have implemented them in the way I have.

Respectfully,

- Pete



On Fri, May 13, 2011 at 4:14 PM, Matt Jones <jones@nceas.ucsb.edu> wrote:
Hi Peter,

Does your idea of #ObjectiveSpeciesModel correspond 1:1 with the TCS standard's idea of a Nominal Concept (i.e., <TaxonConcept type="nominal">) ?  Can you outline how your concept types differ from TCS concept types?

Thanks,
Matt

On Fri, May 13, 2011 at 12:41 PM, Peter DeVries <pete.devries@gmail.com> wrote:
Hi Nico,

Thanks for posting this.

I have something in the concept model to indicate the basis for the species concept.

For now I have three types. An individual species concept can have a combination of one, two or all three

In the RDF they look like this

<txn:speciesConceptBasedOn rdf:resource="http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel"/>

The first is what I call the #ObjectiveSpeciesModel - this indicates that it is a species concept because we say it is.

All the species concepts are at least an #ObjectiveSpeciesModel

*This is in part a way to handle things like the domestic cat which you want to be seen as different from the African Wildcat.

There are also tags for 

txn:PhylogeneticSpeciesModel
txn:BiologicalSpeciesModel

For now I don't have these other models set in the example data, but fields are in the database and the code for that an editor could state the basis for the model.

I can think of a couple of different ways to handle the issue of alternative species concepts.

* Note that the identifications as proposed by DarwinCore don't seem to indicate what kind of model the identifications were based on.
  So it is not clear to me if a straight DarwinCore data set would allow the analysis above.

Instead of having multiple different statements like 

txn:occurrenceHasSpeciesConcept <> in the record for each occurrence

one could use different predicates to link to different kinds of species concepts.

txn:occurrenceHasUniprotConcept => <http://purl.uniprot.org/taxonomy/9696>

This would allow someone to query for the occurrences of <http://purl.uniprot.org/taxonomy/9696>

That said, it is not clear to me what people mean by different identifications.

Is the intent to have identifications with different homotypic synonyms to be an identification of the same thing or not?

The way it works now in many data sets is that Felis concolor, Puma concolor and Puma conncolor are treated as identifications of different things.

This is another way of saying is the namestring the concept?

My understanding of the eBird project is that it allows citizen scientists to contribute their own observations. This creates a much larger data set for analysis etc.

They have a created a curated list of species and a ~6 letter code for each. This serves as a guide for observers on how to encode their observations.

I think their progress would be inhibited, the occurrence coding inconsistant, and contributors frustrated, if they have a list that included many overlapping species concepts.

Thanks again for you comments,

- Pete
 
On Fri, May 13, 2011 at 3:05 AM, Nico Franz <nico.franz@upr.edu> wrote:
Hello Pete (et al.):

   For bird, Town Peterson at KU and colleagues have published these papers showing how alternative bird taxonomies affect the ranking of conservation priorities.

http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_1999.pdf
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_2004.pdf
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_2006.pdf

   Here's the abstract of the 1999 paper:

Analysis of geographic concentrations of endemic taxa is often used to determine priorities for conservation
action; nevertheless, assumptions inherent in the taxonomic authority list used as the basis for
analysis are not always considered. We analyzed foci of avian endemism in Mexico under two alternate species
concepts. Under the biological species concept, 101 bird species are endemic to Mexico and are concentrated
in the mountains of the western and southern portions of the country. Under the phylogenetic species
concept, however, total endemic species rises to 249, which are concentrated in the mountains and lowlands
of western Mexico. Twenty-four narrow endemic biological species are concentrated on offshore islands, but
97 narrow endemic phylogenetic species show a concentration in the Transvolcanic Belt of the mainland and
on several offshore islands. Our study demonstrates that conservation priorities based on concentrations of
endemic taxa depend critically on the particular taxonomic authority employed and that biodiversity evaluations
need to be developed in collaboration or consultation with practicing systematic specialists.

   There was a debate recently on Taxacom that was started and subsequently neatly summarized by Fabian Haas. The topic was "let's summarize reasons why 'donors' seem to not fund taxonomy". One point from the summary was this:

3) Taxonomy is over-accurate for most applications

Most (not all) decisions in e.g. modelling and conservation are done and can be done without complete knowledge of taxa. As it is, decisions for conservation areas are often based on flagship species (e.g. elephants), on taxa which have an excellent research background, e.g. birds (IBAs), on availability of land (e.g. land with a high Tsetse burden), importance as corridor and other factors, but never on a complete view on an all biodiversity in a specific area. Even if an inventory existed, it would be an illusion that we could collect data on ecological requirements and population dynamics for most of the species necessary for informed decisions. A complete inventory does not seem to provide an advantage for conservation.

   I personally think there's some truth to that. I also think that, while it's understandable that an accurate representation of the (sometimes) fleetingness of taxonomic consensus it not a priority for applied ecological projects, if taxonomists themselves don't find better ways to document and link these alternatives perspectives, then it's not the best science we can do. That would be fine too if adopted outright as a pragmatic stance.

Regards,

Nico



On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants List, the eBird project also uses on overlapping concepts in its bird list (it does have concepts for common hybrids)

What is clear to me is that you cannot create graphs like these if every observation can have X number of species (especially those that overlapping ) without any indication which is is the most appropriate one.

eBird Occurrence Maps Northern Cardinal
NCBI is also similar.

Perhaps a member of the consensus committee can comment?

-- Pete
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept  &  GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data  Project
--------------------------------------------------------------------------------------
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content


_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content




--
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept  &  GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data  Project
--------------------------------------------------------------------------------------

_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content





--
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept  &  GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data  Project
--------------------------------------------------------------------------------------