Another example of non-overlapping concepts
I thought that I would also mention that in addition to The Plants List, the eBird project also uses on overlapping concepts in its bird list (it does have concepts for common hybrids)
What is clear to me is that you cannot create graphs like these if every observation can have X number of species (especially those that overlapping ) without any indication which is is the most appropriate one.
eBird Occurrence Maps Northern Cardinal http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
NCBI is also similar.
Perhaps a member of the consensus committee can comment?
-- Pete ------------------------------------------------------------------------------------ Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project --------------------------------------------------------------------------------------
Hello Pete (et al.):
For bird, Town Peterson at KU and colleagues have published these papers showing how alternative bird taxonomies affect the ranking of conservation priorities.
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_... http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_... http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_...
Here's the abstract of the 1999 paper:
Analysis of geographic concentrations of endemic taxa is often used to determine priorities for conservation action; nevertheless, assumptions inherent in the taxonomic authority list used as the basis for analysis are not always considered. We analyzed foci of avian endemism in Mexico under two alternate species concepts. Under the biological species concept, 101 bird species are endemic to Mexico and are concentrated in the mountains of the western and southern portions of the country. Under the phylogenetic species concept, however, total endemic species rises to 249, which are concentrated in the mountains and lowlands of western Mexico. Twenty-four narrow endemic biological species are concentrated on offshore islands, but 97 narrow endemic phylogenetic species show a concentration in the Transvolcanic Belt of the mainland and on several offshore islands. Our study demonstrates that conservation priorities based on concentrations of endemic taxa depend critically on the particular taxonomic authority employed and that biodiversity evaluations need to be developed in collaboration or consultation with practicing systematic specialists.
There was a debate recently on Taxacom that was started and subsequently neatly summarized by Fabian Haas. The topic was "let's summarize reasons why 'donors' seem to not fund taxonomy". One point from the summary was this:
3) Taxonomy is over-accurate for most applications
Most (not all) decisions in e.g. modelling and conservation are done and can be done without complete knowledge of taxa. As it is, decisions for conservation areas are often based on flagship species (e.g. elephants), on taxa which have an excellent research background, e.g. birds (IBAs), on availability of land (e.g. land with a high Tsetse burden), importance as corridor and other factors, but never on a complete view on an all biodiversity in a specific area. Even if an inventory existed, it would be an illusion that we could collect data on ecological requirements and population dynamics for most of the species necessary for informed decisions. A complete inventory does not seem to provide an advantage for conservation.
I personally think there's some truth to that. I also think that, while it's understandable that an accurate representation of the (sometimes) fleetingness of taxonomic consensus it not a priority for applied ecological projects, if taxonomists themselves don't find better ways to document and link these alternatives perspectives, then it's not the best science we can do. That would be fine too if adopted outright as a pragmatic stance.
Regards,
Nico
On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants List, the eBird project also uses on overlapping concepts in its bird list (it does have concepts for common hybrids)
What is clear to me is that you cannot create graphs like these if every observation can have X number of species (especially those that overlapping ) without any indication which is is the most appropriate one.
eBird Occurrence Maps Northern Cardinal http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
NCBI is also similar.
Perhaps a member of the consensus committee can comment?
-- Pete
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu mailto:pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecies http://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Nico,
Thanks for posting this.
I have something in the concept model to indicate the basis for the species concept.
For now I have three types. An individual species concept can have a combination of one, two or all three
In the RDF they look like this
<txn:speciesConceptBasedOn rdf:resource=" http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel%22/%3E
The first is what I call the #ObjectiveSpeciesModel - this indicates that it is a species concept because we say it is.
All the species concepts are at least an #ObjectiveSpeciesModel
*This is in part a way to handle things like the domestic cat which you want to be seen as different from the African Wildcat.
There are also tags for
txn:PhylogeneticSpeciesModel txn:BiologicalSpeciesModel
For now I don't have these other models set in the example data, but fields are in the database and the code for that an editor could state the basis for the model.
I can think of a couple of different ways to handle the issue of alternative species concepts.
* Note that the identifications as proposed by DarwinCore don't seem to indicate what kind of model the identifications were based on. So it is not clear to me if a straight DarwinCore data set would allow the analysis above.
Instead of having multiple different statements like
*txn:occurrenceHasSpeciesConcept <> *in the record for each occurrence
one could use different predicates to link to different kinds of species concepts.
*txn:occurrenceHasUniprotConcept* => http://purl.uniprot.org/taxonomy/9696
This would allow someone to query for the occurrences of < http://purl.uniprot.org/taxonomy/9696%3E
That said, it is not clear to me what people mean by different identifications.
Is the intent to have identifications with different homotypic synonyms to be an identification of the same thing or not?
The way it works now in many data sets is that Felis concolor, Puma concolor and Puma conncolor are treated as identifications of different things.
This is another way of saying* is the namestring the concept?* * * My understanding of the eBird project is that it allows citizen scientists to contribute their own observations. This creates a much larger data set for analysis etc.
They have a created a curated list of species and a ~6 letter code for each. This serves as a guide for observers on how to encode their observations.
I think their progress would be inhibited, the occurrence coding inconsistant, and contributors frustrated, if they have a list that included many overlapping species concepts.
Thanks again for you comments,
- Pete
On Fri, May 13, 2011 at 3:05 AM, Nico Franz nico.franz@upr.edu wrote:
Hello Pete (et al.):
For bird, Town Peterson at KU and colleagues have published these papers showing how alternative bird taxonomies affect the ranking of conservation priorities.
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_...
Here's the abstract of the 1999 paper:
Analysis of geographic concentrations of endemic taxa is often used to determine priorities for conservation action; nevertheless, assumptions inherent in the taxonomic authority list used as the basis for analysis are not always considered. We analyzed foci of avian endemism in Mexico under two alternate species concepts. Under the biological species concept, 101 bird species are endemic to Mexico and are concentrated in the mountains of the western and southern portions of the country. Under the phylogenetic species concept, however, total endemic species rises to 249, which are concentrated in the mountains and lowlands of western Mexico. Twenty-four narrow endemic biological species are concentrated on offshore islands, but 97 narrow endemic phylogenetic species show a concentration in the Transvolcanic Belt of the mainland and on several offshore islands. Our study demonstrates that conservation priorities based on concentrations of endemic taxa depend critically on the particular taxonomic authority employed and that biodiversity evaluations need to be developed in collaboration or consultation with practicing systematic specialists.
There was a debate recently on Taxacom that was started and subsequently neatly summarized by Fabian Haas. The topic was "let's summarize reasons why 'donors' seem to not fund taxonomy". One point from the summary was this:
- Taxonomy is over-accurate for most applications
Most (not all) decisions in e.g. modelling and conservation are done and can be done without complete knowledge of taxa. As it is, decisions for conservation areas are often based on flagship species (e.g. elephants), on taxa which have an excellent research background, e.g. birds (IBAs), on availability of land (e.g. land with a high Tsetse burden), importance as corridor and other factors, but never on a complete view on an all biodiversity in a specific area. Even if an inventory existed, it would be an illusion that we could collect data on ecological requirements and population dynamics for most of the species necessary for informed decisions. A complete inventory does not seem to provide an advantage for conservation. I personally think there's some truth to that. I also think that, while it's understandable that an accurate representation of the (sometimes) fleetingness of taxonomic consensus it not a priority for applied ecological projects, if taxonomists themselves don't find better ways to document and link these alternatives perspectives, then it's not the best science we can do. That would be fine too if adopted outright as a pragmatic stance.
Regards,
Nico
On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants List, the eBird project also uses on overlapping concepts in its bird list (it does have concepts for common hybrids)
What is clear to me is that you cannot create graphs like these if every observation can have X number of species (especially those that overlapping ) without any indication which is is the most appropriate one.
eBird Occurrence Maps Northern Cardinal http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
NCBI is also similar.
Perhaps a member of the consensus committee can comment?
-- Pete
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Peter,
Does your idea of #ObjectiveSpeciesModel correspond 1:1 with the TCS standard's idea of a Nominal Concept (i.e., <TaxonConcept type="nominal">) ? Can you outline how your concept types differ from TCS concept types?
Thanks, Matt
On Fri, May 13, 2011 at 12:41 PM, Peter DeVries pete.devries@gmail.comwrote:
Hi Nico,
Thanks for posting this.
I have something in the concept model to indicate the basis for the species concept.
For now I have three types. An individual species concept can have a combination of one, two or all three
In the RDF they look like this
<txn:speciesConceptBasedOn rdf:resource=" http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel%22/%3E
The first is what I call the #ObjectiveSpeciesModel - this indicates that it is a species concept because we say it is.
All the species concepts are at least an #ObjectiveSpeciesModel
*This is in part a way to handle things like the domestic cat which you want to be seen as different from the African Wildcat.
There are also tags for
txn:PhylogeneticSpeciesModel txn:BiologicalSpeciesModel
For now I don't have these other models set in the example data, but fields are in the database and the code for that an editor could state the basis for the model.
I can think of a couple of different ways to handle the issue of alternative species concepts.
- Note that the identifications as proposed by DarwinCore don't seem to
indicate what kind of model the identifications were based on. So it is not clear to me if a straight DarwinCore data set would allow the analysis above.
Instead of having multiple different statements like
*txn:occurrenceHasSpeciesConcept <> *in the record for each occurrence
one could use different predicates to link to different kinds of species concepts.
*txn:occurrenceHasUniprotConcept* => < http://purl.uniprot.org/taxonomy/9696%3E
This would allow someone to query for the occurrences of < http://purl.uniprot.org/taxonomy/9696%3E
That said, it is not clear to me what people mean by different identifications.
Is the intent to have identifications with different homotypic synonyms to be an identification of the same thing or not?
The way it works now in many data sets is that Felis concolor, Puma concolor and Puma conncolor are treated as identifications of different things.
This is another way of saying* is the namestring the concept?*
My understanding of the eBird project is that it allows citizen scientists to contribute their own observations. This creates a much larger data set for analysis etc.
They have a created a curated list of species and a ~6 letter code for each. This serves as a guide for observers on how to encode their observations.
I think their progress would be inhibited, the occurrence coding inconsistant, and contributors frustrated, if they have a list that included many overlapping species concepts.
Thanks again for you comments,
- Pete
On Fri, May 13, 2011 at 3:05 AM, Nico Franz nico.franz@upr.edu wrote:
Hello Pete (et al.):
For bird, Town Peterson at KU and colleagues have published these papers showing how alternative bird taxonomies affect the ranking of conservation priorities.
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_...
Here's the abstract of the 1999 paper:
Analysis of geographic concentrations of endemic taxa is often used to determine priorities for conservation action; nevertheless, assumptions inherent in the taxonomic authority list used as the basis for analysis are not always considered. We analyzed foci of avian endemism in Mexico under two alternate species concepts. Under the biological species concept, 101 bird species are endemic to Mexico and are concentrated in the mountains of the western and southern portions of the country. Under the phylogenetic species concept, however, total endemic species rises to 249, which are concentrated in the mountains and lowlands of western Mexico. Twenty-four narrow endemic biological species are concentrated on offshore islands, but 97 narrow endemic phylogenetic species show a concentration in the Transvolcanic Belt of the mainland and on several offshore islands. Our study demonstrates that conservation priorities based on concentrations of endemic taxa depend critically on the particular taxonomic authority employed and that biodiversity evaluations need to be developed in collaboration or consultation with practicing systematic specialists.
There was a debate recently on Taxacom that was started and subsequently neatly summarized by Fabian Haas. The topic was "let's summarize reasons why 'donors' seem to not fund taxonomy". One point from the summary was this:
- Taxonomy is over-accurate for most applications
Most (not all) decisions in e.g. modelling and conservation are done and can be done without complete knowledge of taxa. As it is, decisions for conservation areas are often based on flagship species (e.g. elephants), on taxa which have an excellent research background, e.g. birds (IBAs), on availability of land (e.g. land with a high Tsetse burden), importance as corridor and other factors, but never on a complete view on an all biodiversity in a specific area. Even if an inventory existed, it would be an illusion that we could collect data on ecological requirements and population dynamics for most of the species necessary for informed decisions. A complete inventory does not seem to provide an advantage for conservation. I personally think there's some truth to that. I also think that, while it's understandable that an accurate representation of the (sometimes) fleetingness of taxonomic consensus it not a priority for applied ecological projects, if taxonomists themselves don't find better ways to document and link these alternatives perspectives, then it's not the best science we can do. That would be fine too if adopted outright as a pragmatic stance.
Regards,
Nico
On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants List, the eBird project also uses on overlapping concepts in its bird list (it does have concepts for common hybrids)
What is clear to me is that you cannot create graphs like these if every observation can have X number of species (especially those that overlapping ) without any indication which is is the most appropriate one.
eBird Occurrence Maps Northern Cardinal http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
NCBI is also similar.
Perhaps a member of the consensus committee can comment?
-- Pete
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Matt,
It took me a while to ponder your question. There is a long answer which complex and easily misinterpreted and there is a shorter answer.
For now I think the "shorter" answer set in a historical context is best.
The best use of my abilities seems to be recognizing a "ability gap" and figuring out a technical solution or tool to address it.
The most visible of these were involving microscopy and visualization tools to make complex ideas understandable.
My interest in the species problem dates back to when I had the opportunity to talk with E.O. WIlson in 1991/1992.
At that time he said that if you have a knack for computers we need all this information in databases so it is accessible.
*One of his former Ph.D. students is on my committee.
Years later I had the opportunity to work on questions like this and started to think about how to connect all these disparate facts about species together in a usable queryable knowledge base.
I noticed that several groups and individuals were marking up data sets including observations with different scientific names even though they were clearly meaning the same "species".
* These groups would agree that they were communicating about the same species, but not always agree on the name
This prevents large scale data integration and analysis which in part is described here: http://about.geospecies.org/
With the advent of the web, and the the semantic web in particular, this "database" could be global and almost infinitely scalable.
http://about.geospecies.org/I started lobbying TDWG starting in 2006 for two things:
1) A GUID for the "species" that was not tied to a particular name string 2) A system that followed semantic web best practices which LSID etc. do not.
Since my TDWG efforts were not successful, I started GeoSpecies and based on comments from a semantic web expert modified these somewhat into what is now TaxonConcept.org
The TCS is an xml standard for transmitting information about a taxon concepts that I think maps best to a "name use concept." (Rich's TNU's)
The TaxonConcepts are identified with semantic web GUIDs that follow semantic web best practices and resolve to an informative documents.
In their current form these documents are not ideal because they do not do a good enough job clearing up what would be the best concept match for a given individual or specimen.
They do however have most of the plumbing for this in that they allow semantic web links to name uses, specimens, occurrence records, images, DNA, authors and publications including the original description.
They also link to similar entities that are on the semantic web, most notably DBpedia, Uniprot, Freebase, Bio2RDF etc.
This linking may not seem valuable to a humans, is valuable for machines that need to determine what entities are similar and what entities are different.
This also increases the "findability" of these other data sets.
I see my current set of about 105,000 species as an example set that people can use to try out these models.
In their final form these should be authored by editors that determine what specimens and other data are good examples of instances of these concepts.
The editors will be linked via a URI so it is easy to track attribution.
The final concepts do not have to be in one place, they could be distributed but to avoid the kinds of nomenclatural differences that have occurred between zoology / botany etc it would be best to have one code base for now.
They don't have to have the same underlying stack, which now is based on Ruby on Rails, but could be ported to anything.
What they do need is a common structure and a common understanding as to what each attribute means and how it can be appropriately used.
For some use cases it is appropriate to consider the following the same "thing"
http://lod.taxonconcept.org/ses/v6n7p#Species
http://purl.uniprot.org/taxonomy/9696
http://purl.uniprot.org/taxonomy/9696 http://www.freebase.com/view/en/cougar
http://www.freebase.com/view/en/cougar http://sw.opencyc.org/concept/Mx4rvVj5o5wpEbGdrcN5Y29ycA
http://sw.opencyc.org/concept/Mx4rvVj5o5wpEbGdrcN5Y29ycA http://www.bbc.co.uk/nature/species/Cougar#species
For other use cases, this sameAs is not appropriate.
Wikipedia is very valuable, but if someone changes the article title then the URI changes in DBpedia.
Uniprot and Bio2RDF are useful in that they link to lots of related data but they don't really give you any information about what specimens are instances of that concept and they only have those species which have NCBI ID's.
What I want is a set of GUID's that resolve to a human readable HTML page and an RDF representation that people can use to "tag" their data.
For instance: * * * I am going to assert that what I have under the microscope is an instance of the concept described on this page. I do not tie this assertion to a particular name or classification hierarchy.*
Because it makes no sense to replicate the functionality of the Encyclopedia of Life etc., I am mainly concentrating on the RDF representations and testing if they behave as expected in SPARQL queries.
* The HTML pages are not really pretty or as informative as the RDF or as the concept as viewed in the knowledge base.
I have been working with the Encyclopedia of Life and GNI groups for a while exploring how these may or may not be useful to them.
During my visited Woods Hole I said that I have no interest in building and empire I just want to build a solution and would like to partner with them and GBIF.
Although I remain active on TDWG I find the most valuable suggestions seem to come from the LOD community since we seem to have a common goal - that is creating something that works in a reasonable amount of time.
Also, in the LOD cloud every linked data set increases the value of all the other data sets.
This is probably more than your question required, but it provides some explanation as to what these are and why I have implemented them in the way I have.
Respectfully,
- Pete
On Fri, May 13, 2011 at 4:14 PM, Matt Jones jones@nceas.ucsb.edu wrote:
Hi Peter,
Does your idea of #ObjectiveSpeciesModel correspond 1:1 with the TCS standard's idea of a Nominal Concept (i.e., <TaxonConcept type="nominal">) ? Can you outline how your concept types differ from TCS concept types?
Thanks, Matt
On Fri, May 13, 2011 at 12:41 PM, Peter DeVries pete.devries@gmail.comwrote:
Hi Nico,
Thanks for posting this.
I have something in the concept model to indicate the basis for the species concept.
For now I have three types. An individual species concept can have a combination of one, two or all three
In the RDF they look like this
<txn:speciesConceptBasedOn rdf:resource=" http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel%22/%3E
The first is what I call the #ObjectiveSpeciesModel - this indicates that it is a species concept because we say it is.
All the species concepts are at least an #ObjectiveSpeciesModel
*This is in part a way to handle things like the domestic cat which you want to be seen as different from the African Wildcat.
There are also tags for
txn:PhylogeneticSpeciesModel txn:BiologicalSpeciesModel
For now I don't have these other models set in the example data, but fields are in the database and the code for that an editor could state the basis for the model.
I can think of a couple of different ways to handle the issue of alternative species concepts.
- Note that the identifications as proposed by DarwinCore don't seem to
indicate what kind of model the identifications were based on. So it is not clear to me if a straight DarwinCore data set would allow the analysis above.
Instead of having multiple different statements like
*txn:occurrenceHasSpeciesConcept <> *in the record for each occurrence
one could use different predicates to link to different kinds of species concepts.
*txn:occurrenceHasUniprotConcept* => < http://purl.uniprot.org/taxonomy/9696%3E
This would allow someone to query for the occurrences of < http://purl.uniprot.org/taxonomy/9696%3E
That said, it is not clear to me what people mean by different identifications.
Is the intent to have identifications with different homotypic synonyms to be an identification of the same thing or not?
The way it works now in many data sets is that Felis concolor, Puma concolor and Puma conncolor are treated as identifications of different things.
This is another way of saying* is the namestring the concept?*
My understanding of the eBird project is that it allows citizen scientists to contribute their own observations. This creates a much larger data set for analysis etc.
They have a created a curated list of species and a ~6 letter code for each. This serves as a guide for observers on how to encode their observations.
I think their progress would be inhibited, the occurrence coding inconsistant, and contributors frustrated, if they have a list that included many overlapping species concepts.
Thanks again for you comments,
- Pete
On Fri, May 13, 2011 at 3:05 AM, Nico Franz nico.franz@upr.edu wrote:
Hello Pete (et al.):
For bird, Town Peterson at KU and colleagues have published these papers showing how alternative bird taxonomies affect the ranking of conservation priorities.
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_...
Here's the abstract of the 1999 paper:
Analysis of geographic concentrations of endemic taxa is often used to determine priorities for conservation action; nevertheless, assumptions inherent in the taxonomic authority list used as the basis for analysis are not always considered. We analyzed foci of avian endemism in Mexico under two alternate species concepts. Under the biological species concept, 101 bird species are endemic to Mexico and are concentrated in the mountains of the western and southern portions of the country. Under the phylogenetic species concept, however, total endemic species rises to 249, which are concentrated in the mountains and lowlands of western Mexico. Twenty-four narrow endemic biological species are concentrated on offshore islands, but 97 narrow endemic phylogenetic species show a concentration in the Transvolcanic Belt of the mainland and on several offshore islands. Our study demonstrates that conservation priorities based on concentrations of endemic taxa depend critically on the particular taxonomic authority employed and that biodiversity evaluations need to be developed in collaboration or consultation with practicing systematic specialists.
There was a debate recently on Taxacom that was started and subsequently neatly summarized by Fabian Haas. The topic was "let's summarize reasons why 'donors' seem to not fund taxonomy". One point from the summary was this:
- Taxonomy is over-accurate for most applications
Most (not all) decisions in e.g. modelling and conservation are done and can be done without complete knowledge of taxa. As it is, decisions for conservation areas are often based on flagship species (e.g. elephants), on taxa which have an excellent research background, e.g. birds (IBAs), on availability of land (e.g. land with a high Tsetse burden), importance as corridor and other factors, but never on a complete view on an all biodiversity in a specific area. Even if an inventory existed, it would be an illusion that we could collect data on ecological requirements and population dynamics for most of the species necessary for informed decisions. A complete inventory does not seem to provide an advantage for conservation. I personally think there's some truth to that. I also think that, while it's understandable that an accurate representation of the (sometimes) fleetingness of taxonomic consensus it not a priority for applied ecological projects, if taxonomists themselves don't find better ways to document and link these alternatives perspectives, then it's not the best science we can do. That would be fine too if adopted outright as a pragmatic stance.
Regards,
Nico
On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants List, the eBird project also uses on overlapping concepts in its bird list (it does have concepts for common hybrids)
What is clear to me is that you cannot create graphs like these if every observation can have X number of species (especially those that overlapping ) without any indication which is is the most appropriate one.
eBird Occurrence Maps Northern Cardinal http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
NCBI is also similar.
Perhaps a member of the consensus committee can comment?
-- Pete
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
hmmm... a http://www.freebase.com/view/en/cougar seems to be an 'armoured fighting vehicle'...
which might be part of the point you are making... wondering if NotEvenRemotelyTheSameAs is a valid relationship type... ;)
jim
On Wed, May 18, 2011 at 8:00 AM, Peter DeVries pete.devries@gmail.com wrote:
Hi Matt, It took me a while to ponder your question. There is a long answer which complex and easily misinterpreted and there is a shorter answer. For now I think the "shorter" answer set in a historical context is best. The best use of my abilities seems to be recognizing a "ability gap" and figuring out a technical solution or tool to address it. The most visible of these were involving microscopy and visualization tools to make complex ideas understandable. My interest in the species problem dates back to when I had the opportunity to talk with E.O. WIlson in 1991/1992. At that time he said that if you have a knack for computers we need all this information in databases so it is accessible. *One of his former Ph.D. students is on my committee. Years later I had the opportunity to work on questions like this and started to think about how to connect all these disparate facts about species together in a usable queryable knowledge base. I noticed that several groups and individuals were marking up data sets including observations with different scientific names even though they were clearly meaning the same "species".
- These groups would agree that they were communicating about the same
species, but not always agree on the name This prevents large scale data integration and analysis which in part is described here: http://about.geospecies.org/ With the advent of the web, and the the semantic web in particular, this "database" could be global and almost infinitely scalable. I started lobbying TDWG starting in 2006 for two things:
- A GUID for the "species" that was not tied to a particular name string
- A system that followed semantic web best practices which LSID etc. do
not. Since my TDWG efforts were not successful, I started GeoSpecies and based on comments from a semantic web expert modified these somewhat into what is now TaxonConcept.org The TCS is an xml standard for transmitting information about a taxon concepts that I think maps best to a "name use concept." (Rich's TNU's) The TaxonConcepts are identified with semantic web GUIDs that follow semantic web best practices and resolve to an informative documents. In their current form these documents are not ideal because they do not do a good enough job clearing up what would be the best concept match for a given individual or specimen. They do however have most of the plumbing for this in that they allow semantic web links to name uses, specimens, occurrence records, images, DNA, authors and publications including the original description. They also link to similar entities that are on the semantic web, most notably DBpedia, Uniprot, Freebase, Bio2RDF etc. This linking may not seem valuable to a humans, is valuable for machines that need to determine what entities are similar and what entities are different. This also increases the "findability" of these other data sets. I see my current set of about 105,000 species as an example set that people can use to try out these models. In their final form these should be authored by editors that determine what specimens and other data are good examples of instances of these concepts. The editors will be linked via a URI so it is easy to track attribution. The final concepts do not have to be in one place, they could be distributed but to avoid the kinds of nomenclatural differences that have occurred between zoology / botany etc it would be best to have one code base for now. They don't have to have the same underlying stack, which now is based on Ruby on Rails, but could be ported to anything. What they do need is a common structure and a common understanding as to what each attribute means and how it can be appropriately used. For some use cases it is appropriate to consider the following the same "thing" http://lod.taxonconcept.org/ses/v6n7p#Species http://purl.uniprot.org/taxonomy/9696 http://www.freebase.com/view/en/cougar http://sw.opencyc.org/concept/Mx4rvVj5o5wpEbGdrcN5Y29ycA http://www.bbc.co.uk/nature/species/Cougar#species
For other use cases, this sameAs is not appropriate. Wikipedia is very valuable, but if someone changes the article title then the URI changes in DBpedia. Uniprot and Bio2RDF are useful in that they link to lots of related data but they don't really give you any information about what specimens are instances of that concept and they only have those species which have NCBI ID's. What I want is a set of GUID's that resolve to a human readable HTML page and an RDF representation that people can use to "tag" their data. For instance: I am going to assert that what I have under the microscope is an instance of the concept described on this page. I do not tie this assertion to a particular name or classification hierarchy. Because it makes no sense to replicate the functionality of the Encyclopedia of Life etc., I am mainly concentrating on the RDF representations and testing if they behave as expected in SPARQL queries.
- The HTML pages are not really pretty or as informative as the RDF or as
the concept as viewed in the knowledge base. I have been working with the Encyclopedia of Life and GNI groups for a while exploring how these may or may not be useful to them. During my visited Woods Hole I said that I have no interest in building and empire I just want to build a solution and would like to partner with them and GBIF. Although I remain active on TDWG I find the most valuable suggestions seem to come from the LOD community since we seem to have a common goal - that is creating something that works in a reasonable amount of time. Also, in the LOD cloud every linked data set increases the value of all the other data sets. This is probably more than your question required, but it provides some explanation as to what these are and why I have implemented them in the way I have. Respectfully,
- Pete
On Fri, May 13, 2011 at 4:14 PM, Matt Jones jones@nceas.ucsb.edu wrote:
Hi Peter, Does your idea of #ObjectiveSpeciesModel correspond 1:1 with the TCS standard's idea of a Nominal Concept (i.e., <TaxonConcept type="nominal">) ? Can you outline how your concept types differ from TCS concept types? Thanks, Matt On Fri, May 13, 2011 at 12:41 PM, Peter DeVries pete.devries@gmail.com wrote:
Hi Nico, Thanks for posting this. I have something in the concept model to indicate the basis for the species concept. For now I have three types. An individual species concept can have a combination of one, two or all three In the RDF they look like this txn:speciesConceptBasedOn rdf:resource="http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel"/
The first is what I call the #ObjectiveSpeciesModel - this indicates that it is a species concept because we say it is. All the species concepts are at least an #ObjectiveSpeciesModel *This is in part a way to handle things like the domestic cat which you want to be seen as different from the African Wildcat. There are also tags for txn:PhylogeneticSpeciesModel txn:BiologicalSpeciesModel For now I don't have these other models set in the example data, but fields are in the database and the code for that an editor could state the basis for the model. I can think of a couple of different ways to handle the issue of alternative species concepts.
- Note that the identifications as proposed by DarwinCore don't seem to
indicate what kind of model the identifications were based on. So it is not clear to me if a straight DarwinCore data set would allow the analysis above. Instead of having multiple different statements like txn:occurrenceHasSpeciesConcept <> in the record for each occurrence one could use different predicates to link to different kinds of species concepts. txn:occurrenceHasUniprotConcept => http://purl.uniprot.org/taxonomy/9696 This would allow someone to query for the occurrences of http://purl.uniprot.org/taxonomy/9696 That said, it is not clear to me what people mean by different identifications. Is the intent to have identifications with different homotypic synonyms to be an identification of the same thing or not? The way it works now in many data sets is that Felis concolor, Puma concolor and Puma conncolor are treated as identifications of different things. This is another way of saying is the namestring the concept? My understanding of the eBird project is that it allows citizen scientists to contribute their own observations. This creates a much larger data set for analysis etc. They have a created a curated list of species and a ~6 letter code for each. This serves as a guide for observers on how to encode their observations. I think their progress would be inhibited, the occurrence coding inconsistant, and contributors frustrated, if they have a list that included many overlapping species concepts. Thanks again for you comments,
- Pete
On Fri, May 13, 2011 at 3:05 AM, Nico Franz nico.franz@upr.edu wrote:
Hello Pete (et al.):
For bird, Town Peterson at KU and colleagues have published these papers showing how alternative bird taxonomies affect the ranking of conservation priorities.
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_...
Here's the abstract of the 1999 paper:
Analysis of geographic concentrations of endemic taxa is often used to determine priorities for conservation action; nevertheless, assumptions inherent in the taxonomic authority list used as the basis for analysis are not always considered. We analyzed foci of avian endemism in Mexico under two alternate species concepts. Under the biological species concept, 101 bird species are endemic to Mexico and are concentrated in the mountains of the western and southern portions of the country. Under the phylogenetic species concept, however, total endemic species rises to 249, which are concentrated in the mountains and lowlands of western Mexico. Twenty-four narrow endemic biological species are concentrated on offshore islands, but 97 narrow endemic phylogenetic species show a concentration in the Transvolcanic Belt of the mainland and on several offshore islands. Our study demonstrates that conservation priorities based on concentrations of endemic taxa depend critically on the particular taxonomic authority employed and that biodiversity evaluations need to be developed in collaboration or consultation with practicing systematic specialists.
There was a debate recently on Taxacom that was started and subsequently neatly summarized by Fabian Haas. The topic was "let's summarize reasons why 'donors' seem to not fund taxonomy". One point from the summary was this:
- Taxonomy is over-accurate for most applications
Most (not all) decisions in e.g. modelling and conservation are done and can be done without complete knowledge of taxa. As it is, decisions for conservation areas are often based on flagship species (e.g. elephants), on taxa which have an excellent research background, e.g. birds (IBAs), on availability of land (e.g. land with a high Tsetse burden), importance as corridor and other factors, but never on a complete view on an all biodiversity in a specific area. Even if an inventory existed, it would be an illusion that we could collect data on ecological requirements and population dynamics for most of the species necessary for informed decisions. A complete inventory does not seem to provide an advantage for conservation.
I personally think there's some truth to that. I also think that, while it's understandable that an accurate representation of the (sometimes) fleetingness of taxonomic consensus it not a priority for applied ecological projects, if taxonomists themselves don't find better ways to document and link these alternatives perspectives, then it's not the best science we can do. That would be fine too if adopted outright as a pragmatic stance.
Regards,
Nico
On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants List, the eBird project also uses on overlapping concepts in its bird list (it does have concepts for common hybrids) What is clear to me is that you cannot create graphs like these if every observation can have X number of species (especially those that overlapping ) without any indication which is is the most appropriate one. eBird Occurrence Maps Northern Cardinal http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
NCBI is also similar. Perhaps a member of the consensus committee can comment? -- Pete
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept & GeoSpecies Knowledge Bases A Semantic Web, Linked Open Data Project
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept & GeoSpecies Knowledge Bases A Semantic Web, Linked Open Data Project
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept & GeoSpecies Knowledge Bases A Semantic Web, Linked Open Data Project
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Sorry,
I meant http://www.freebase.com/view/en/puma.
Sort of , I have an "all different in my people ontology.
DBpedia has the same link with freebase. I wonder if that freebase URI has changed recently.
- Pete
On Tue, May 17, 2011 at 5:17 PM, Jim Croft jim.croft@gmail.com wrote:
hmmm... a http://www.freebase.com/view/en/cougar seems to be an 'armoured fighting vehicle'...
which might be part of the point you are making... wondering if NotEvenRemotelyTheSameAs is a valid relationship type... ;)
jim
On Wed, May 18, 2011 at 8:00 AM, Peter DeVries pete.devries@gmail.com wrote:
Hi Matt, It took me a while to ponder your question. There is a long answer which complex and easily misinterpreted and there is a shorter answer. For now I think the "shorter" answer set in a historical context is best. The best use of my abilities seems to be recognizing a "ability gap" and figuring out a technical solution or tool to address it. The most visible of these were involving microscopy and visualization
tools
to make complex ideas understandable. My interest in the species problem dates back to when I had the
opportunity
to talk with E.O. WIlson in 1991/1992. At that time he said that if you have a knack for computers we need all
this
information in databases so it is accessible. *One of his former Ph.D. students is on my committee. Years later I had the opportunity to work on questions like this and
started
to think about how to connect all these disparate facts about species together in a usable queryable knowledge base. I noticed that several groups and individuals were marking up data sets including observations with different scientific names even though they
were
clearly meaning the same "species".
- These groups would agree that they were communicating about the same
species, but not always agree on the name This prevents large scale data integration and analysis which in part is described here: http://about.geospecies.org/ With the advent of the web, and the the semantic web in particular, this "database" could be global and almost infinitely scalable. I started lobbying TDWG starting in 2006 for two things:
- A GUID for the "species" that was not tied to a particular name string
- A system that followed semantic web best practices which LSID etc. do
not. Since my TDWG efforts were not successful, I started GeoSpecies and based
on
comments from a semantic web expert modified these somewhat into what is
now
TaxonConcept.org The TCS is an xml standard for transmitting information about a taxon concepts that I think maps best to a "name use concept." (Rich's TNU's) The TaxonConcepts are identified with semantic web GUIDs that follow semantic web best practices and resolve to an informative documents. In their current form these documents are not ideal because they do not
do a
good enough job clearing up what would be the best concept match for a
given
individual or specimen. They do however have most of the plumbing for this in that they allow semantic web links to name uses, specimens, occurrence records, images,
DNA,
authors and publications including the original description. They also link to similar entities that are on the semantic web, most notably DBpedia, Uniprot, Freebase, Bio2RDF etc. This linking may not seem valuable to a humans, is valuable for machines that need to determine what entities are similar and what entities are different. This also increases the "findability" of these other data sets. I see my current set of about 105,000 species as an example set that
people
can use to try out these models. In their final form these should be authored by editors that determine
what
specimens and other data are good examples of instances of these
concepts.
The editors will be linked via a URI so it is easy to track attribution. The final concepts do not have to be in one place, they could be
distributed
but to avoid the kinds of nomenclatural differences that have occurred between zoology / botany etc it would be best to have one code base for
now.
They don't have to have the same underlying stack, which now is based on Ruby on Rails, but could be ported to anything. What they do need is a common structure and a common understanding as to what each attribute means and how it can be appropriately used. For some use cases it is appropriate to consider the following the same "thing" http://lod.taxonconcept.org/ses/v6n7p#Species http://purl.uniprot.org/taxonomy/9696 http://www.freebase.com/view/en/cougar http://sw.opencyc.org/concept/Mx4rvVj5o5wpEbGdrcN5Y29ycA http://www.bbc.co.uk/nature/species/Cougar#species
For other use cases, this sameAs is not appropriate. Wikipedia is very valuable, but if someone changes the article title then the URI changes in DBpedia. Uniprot and Bio2RDF are useful in that they link to lots of related data
but
they don't really give you any information about what specimens are instances of that concept and they only have those species which have
NCBI
ID's. What I want is a set of GUID's that resolve to a human readable HTML page and an RDF representation that people can use to "tag" their data. For instance: I am going to assert that what I have under the microscope is an instance
of
the concept described on this page. I do not tie this assertion to a particular name or classification hierarchy. Because it makes no sense to replicate the functionality of the
Encyclopedia
of Life etc., I am mainly concentrating on the RDF representations and testing if they behave as expected in SPARQL queries.
- The HTML pages are not really pretty or as informative as the RDF or as
the concept as viewed in the knowledge base. I have been working with the Encyclopedia of Life and GNI groups for a
while
exploring how these may or may not be useful to them. During my visited Woods Hole I said that I have no interest in building
and
empire I just want to build a solution and would like to partner with
them
and GBIF. Although I remain active on TDWG I find the most valuable suggestions
seem
to come from the LOD community since we seem to have a common goal - that
is
creating something that works in a reasonable amount of time. Also, in the LOD cloud every linked data set increases the value of all
the
other data sets. This is probably more than your question required, but it provides some explanation as to what these are and why I have implemented them in the
way
I have. Respectfully,
- Pete
On Fri, May 13, 2011 at 4:14 PM, Matt Jones jones@nceas.ucsb.edu
wrote:
Hi Peter, Does your idea of #ObjectiveSpeciesModel correspond 1:1 with the TCS standard's idea of a Nominal Concept (i.e., <TaxonConcept
type="nominal">) ?
Can you outline how your concept types differ from TCS concept types? Thanks, Matt On Fri, May 13, 2011 at 12:41 PM, Peter DeVries <pete.devries@gmail.com
wrote:
Hi Nico, Thanks for posting this. I have something in the concept model to indicate the basis for the species concept. For now I have three types. An individual species concept can have a combination of one, two or all three In the RDF they look like this <txn:speciesConceptBasedOn rdf:resource="
http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel%22/%3E
The first is what I call the #ObjectiveSpeciesModel - this indicates
that
it is a species concept because we say it is. All the species concepts are at least an #ObjectiveSpeciesModel *This is in part a way to handle things like the domestic cat which you want to be seen as different from the African Wildcat. There are also tags for txn:PhylogeneticSpeciesModel txn:BiologicalSpeciesModel For now I don't have these other models set in the example data, but fields are in the database and the code for that an editor could state
the
basis for the model. I can think of a couple of different ways to handle the issue of alternative species concepts.
- Note that the identifications as proposed by DarwinCore don't seem to
indicate what kind of model the identifications were based on. So it is not clear to me if a straight DarwinCore data set would
allow
the analysis above. Instead of having multiple different statements like txn:occurrenceHasSpeciesConcept <> in the record for each occurrence one could use different predicates to link to different kinds of
species
concepts. txn:occurrenceHasUniprotConcept => http://purl.uniprot.org/taxonomy/9696 This would allow someone to query for the occurrences of http://purl.uniprot.org/taxonomy/9696 That said, it is not clear to me what people mean by different identifications. Is the intent to have identifications with different homotypic synonyms to be an identification of the same thing or not? The way it works now in many data sets is that Felis concolor, Puma concolor and Puma conncolor are treated as identifications of different things. This is another way of saying is the namestring the concept? My understanding of the eBird project is that it allows citizen scientists to contribute their own observations. This creates a much
larger
data set for analysis etc. They have a created a curated list of species and a ~6 letter code for each. This serves as a guide for observers on how to encode their observations. I think their progress would be inhibited, the occurrence coding inconsistant, and contributors frustrated, if they have a list that
included
many overlapping species concepts. Thanks again for you comments,
- Pete
On Fri, May 13, 2011 at 3:05 AM, Nico Franz nico.franz@upr.edu
wrote:
Hello Pete (et al.):
For bird, Town Peterson at KU and colleagues have published these papers showing how alternative bird taxonomies affect the ranking of conservation priorities.
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_...
Here's the abstract of the 1999 paper:
Analysis of geographic concentrations of endemic taxa is often used to determine priorities for conservation action; nevertheless, assumptions inherent in the taxonomic authority list used as the basis for analysis are not always considered. We analyzed foci of avian endemism in Mexico under two alternate species concepts. Under the biological species concept, 101 bird species are endemic to Mexico and are concentrated in the mountains of the western and southern portions of the country. Under the phylogenetic species concept, however, total endemic species rises to 249, which are concentrated in the mountains and lowlands of western Mexico. Twenty-four narrow endemic biological species are concentrated on offshore islands, but 97 narrow endemic phylogenetic species show a concentration in the Transvolcanic Belt of the mainland and on several offshore islands. Our study demonstrates that conservation priorities based on concentrations of endemic taxa depend critically on the particular taxonomic authority employed and that biodiversity evaluations need to be developed in collaboration or consultation with practicing systematic specialists.
There was a debate recently on Taxacom that was started and subsequently neatly summarized by Fabian Haas. The topic was "let's summarize reasons why 'donors' seem to not fund taxonomy". One point
from
the summary was this:
- Taxonomy is over-accurate for most applications
Most (not all) decisions in e.g. modelling and conservation are done
and
can be done without complete knowledge of taxa. As it is, decisions
for
conservation areas are often based on flagship species (e.g.
elephants), on
taxa which have an excellent research background, e.g. birds (IBAs),
on
availability of land (e.g. land with a high Tsetse burden), importance
as
corridor and other factors, but never on a complete view on an all biodiversity in a specific area. Even if an inventory existed, it
would be
an illusion that we could collect data on ecological requirements and population dynamics for most of the species necessary for informed decisions. A complete inventory does not seem to provide an advantage
for
conservation.
I personally think there's some truth to that. I also think that, while it's understandable that an accurate representation of the
(sometimes)
fleetingness of taxonomic consensus it not a priority for applied
ecological
projects, if taxonomists themselves don't find better ways to document
and
link these alternatives perspectives, then it's not the best science
we can
do. That would be fine too if adopted outright as a pragmatic stance.
Regards,
Nico
On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants
List,
the eBird project also uses on overlapping concepts in its bird list
(it
does have concepts for common hybrids) What is clear to me is that you cannot create graphs like these if
every
observation can have X number of species (especially those that
overlapping
) without any indication which is is the most appropriate one. eBird Occurrence Maps Northern Cardinal
http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
NCBI is also similar. Perhaps a member of the consensus committee can comment? -- Pete
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept & GeoSpecies Knowledge Bases A Semantic Web, Linked Open Data Project
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept & GeoSpecies Knowledge Bases A Semantic Web, Linked Open Data Project
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept & GeoSpecies Knowledge Bases A Semantic Web, Linked Open Data Project
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://about.me/jrc 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.'
- Robert Frost, poet (1874-1963)
Please send URLs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html
Hi Pete (et al.):
I can unfortunately only comment on some of this. One thing I'd like to clarify, though perhaps not needed. -- There's a way in which taxonomists sometimes say "different names, same concept". Meaning: there's a synonymy relationship between two or more Latin (species) names, but not a real disagreement about the underlying circumscription of the taxon that's being referred to. I suppose your "Felis concolor, Puma concolor and Puma conncolor" example fits this situation.
Now, that meaning of "sameness" ("same concept" as above) doesn't strictly work in the taxon concept "world"; because concepts are in the first instance distinguished by their LABELS, i.e. the Latin name (name author) + sec. author/publication combo, as in Quercus robur L. sec. Nixon 2003 (making this one up). So if you have Quercus robur L. sec Linnaeus 1758 and Quercus robur L. sec. Nixon 2003; these are two different labels and thus two different concepts in TCS speak, even if their meaning (referential extension) is "the same". Sorry if this was obvious and thus missed the mark. The problem is that there are at least two common meanings of "concept" in the mix.
The way I recall discussions as the TCS was designed, the role for the DarwinCore was to allow data providers to include sufficient information in the DC so that the vouchers/observations could be identified to a suitably authoritative concept. In another realm of the biodiversity informatics net, that concept would be represented in more depth, and ideally have multiple relationships mapped and/or inferred to relevant past, present, and future concepts.
For what it's worth (really not much), I agree that the eBird project is finding a (the?) pragmatic solution to expanding their contributor base, and in all likelihood have a pretty good to excellent taxonomy to work with already. Keep in mind that I study weevils, with 65k species described and 220k species (conservatively) estimated to exist, and a ~ 200 tribes mid-level classifications based largely on Lacordaire's 1-2 external character system established in 1863 (claws single, versus paired; virtually none of these have any phylogenetic value). So experientially I come from that part of the knowledge spectrum where we're likely centuries away from a sufficiently stable naming system that includes, say, more than 2/3 of the actual species diversity.
I'm not opposed to pragmatic solutions for taxa where it makes sense (again, as if anyone cared..). But, trying to foresee the very substantive classificatory shifts that many other groups likely still will experience 10, 20, 50, 100 years down the road from now, I think just the same that there are solid grounds for working out an admittedly non-pragmatic, but sematically maximally powerful solution. I think it's not unreasonable to assume that from some groups, classification in 250 years from now will look just as different from today's system as Linnaeus' 1758 system looks to us today. He recognized 2 weevil genera and about 90 species. Now we have 5800 genera. We may end up with 15,000.
The problem that at present only taxonomic experts can understand how to retrace the meanings of concepts proposed in the history of the field, and computers can't yet do this because the data are not marked up and linked to each other precisely enough (as precisely as an ontological representation would demand it), is not just going to go away by some top-down adherence to a "consensus". Most taxonomists jump into the field precisely because they come to realize that the "consensus" has serious problems (read: "sucks"). So, as long as there is justified taxonomic research, there will be reclassification. ]And no, IMO coming up with a synapomorphy-based or node-pointing naming system will not miraculously allow us to have a reliable system.]
To the extent that there is a discussion here on the list as to where the DC and a possible successor of the TCS is going, I think that's a worthwhile discussion to have.
Regards,
Nico
On 5/13/2011 4:41 PM, Peter DeVries wrote:
Hi Nico,
Thanks for posting this.
I have something in the concept model to indicate the basis for the species concept.
For now I have three types. An individual species concept can have a combination of one, two or all three
In the RDF they look like this
<txn:speciesConceptBasedOn rdf:resource="http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel%22/%3E
The first is what I call the #ObjectiveSpeciesModel - this indicates that it is a species concept because we say it is.
All the species concepts are at least an #ObjectiveSpeciesModel
*This is in part a way to handle things like the domestic cat which you want to be seen as different from the African Wildcat.
There are also tags for
txn:PhylogeneticSpeciesModel txn:BiologicalSpeciesModel
For now I don't have these other models set in the example data, but fields are in the database and the code for that an editor could state the basis for the model.
I can think of a couple of different ways to handle the issue of alternative species concepts.
- Note that the identifications as proposed by DarwinCore don't seem
to indicate what kind of model the identifications were based on. So it is not clear to me if a straight DarwinCore data set would allow the analysis above.
Instead of having multiple different statements like
*txn:occurrenceHasSpeciesConcept <> *in the record for each occurrence
one could use different predicates to link to different kinds of species concepts.
*txn:occurrenceHasUniprotConcept* => http://purl.uniprot.org/taxonomy/9696
This would allow someone to query for the occurrences of http://purl.uniprot.org/taxonomy/9696
That said, it is not clear to me what people mean by different identifications.
Is the intent to have identifications with different homotypic synonyms to be an identification of the same thing or not?
The way it works now in many data sets is that Felis concolor, Puma concolor and Puma conncolor are treated as identifications of different things.
This is another way of saying/is the namestring the concept?/ / / My understanding of the eBird project is that it allows citizen scientists to contribute their own observations. This creates a much larger data set for analysis etc.
They have a created a curated list of species and a ~6 letter code for each. This serves as a guide for observers on how to encode their observations.
I think their progress would be inhibited, the occurrence coding inconsistant, and contributors frustrated, if they have a list that included many overlapping species concepts.
Thanks again for you comments,
- Pete
On Fri, May 13, 2011 at 3:05 AM, Nico Franz <nico.franz@upr.edu mailto:nico.franz@upr.edu> wrote:
Hello Pete (et al.): For birds, Town Peterson at KU and colleagues have published these papers showing how alternative bird taxonomies affect the ranking of conservation priorities. http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_1999.pdf http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_2004.pdf http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_2006.pdf Here's the abstract of the 1999 paper: Analysis of geographic concentrations of endemic taxa is often used to determine priorities for conservation action; nevertheless, assumptions inherent in the taxonomic authority list used as the basis for analysis are not always considered. We analyzed foci of avian endemism in Mexico under two alternate species concepts. Under the biological species concept, 101 bird species are endemic to Mexico and are concentrated in the mountains of the western and southern portions of the country. Under the phylogenetic species concept, however, total endemic species rises to 249, which are concentrated in the mountains and lowlands of western Mexico. Twenty-four narrow endemic biological species are concentrated on offshore islands, but 97 narrow endemic phylogenetic species show a concentration in the Transvolcanic Belt of the mainland and on several offshore islands. Our study demonstrates that conservation priorities based on concentrations of endemic taxa depend critically on the particular taxonomic authority employed and that biodiversity evaluations need to be developed in collaboration or consultation with practicing systematic specialists. There was a debate recently on Taxacom that was started and subsequently neatly summarized by Fabian Haas. The topic was "let's summarize reasons why 'donors' seem to not fund taxonomy". One point from the summary was this: 3) Taxonomy is over-accurate for most applications Most (not all) decisions in e.g. modelling and conservation are done and can be done without complete knowledge of taxa. As it is, decisions for conservation areas are often based on flagship species (e.g. elephants), on taxa which have an excellent research background, e.g. birds (IBAs), on availability of land (e.g. land with a high Tsetse burden), importance as corridor and other factors, but never on a complete view on an all biodiversity in a specific area. Even if an inventory existed, it would be an illusion that we could collect data on ecological requirements and population dynamics for most of the species necessary for informed decisions. A complete inventory does not seem to provide an advantage for conservation. I personally think there's some truth to that. I also think that, while it's understandable that an accurate representation of the (sometimes) fleetingness of taxonomic consensus it not a priority for applied ecological projects, if taxonomists themselves don't find better ways to document and link these alternatives perspectives, then it's not the best science we can do. That would be fine too if adopted outright as a pragmatic stance. Regards, Nico On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants List, the eBird project also uses on overlapping concepts in its bird list (it does have concepts for common hybrids) What is clear to me is that you cannot create graphs like these if every observation can have X number of species (especially those that overlapping ) without any indication which is is the most appropriate one. eBird Occurrence Maps Northern Cardinal http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal NCBI is also similar. Perhaps a member of the consensus committee can comment? -- Pete ------------------------------------------------------------------------------------ Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu <mailto:pdevries@wisc.edu> TaxonConcept <http://www.taxonconcept.org/> & GeoSpecies <http://about.geospecies.org/> Knowledge Bases A Semantic Web, Linked Open Data <http://linkeddata.org/> Project -------------------------------------------------------------------------------------- _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
_______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu mailto:pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecies http://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
Hi Nico,
I don't think that we are very far apart.
The way I recall discussions as the TCS was designed, the role for the
DarwinCore was to allow data providers to include sufficient information in the DC so that the vouchers/observations could be identified to a suitably authoritative concept. In another realm of the biodiversity informatics net, that concept would be represented in more depth, and ideally have multiple relationships mapped and/or inferred to relevant past, present, and future concepts.
In my mind it would be best to link to an open resolvable concept since even the original descriptions often not very informative regarding what individuals should considered instances of the same concept. One reading of the TCS is that this is something that is applied long after the specimen is collected and identified, most likely by someone that did not collect or may not have even seen the actual specimen. So in some sense this is probably more error prone than the imperfectly defined concepts I am proposing. Or at least more likely to differ in the intended meaning of the original observer / identifier.
For what it's worth (really not much), I agree that the eBird project is finding a (the?) pragmatic solution to expanding their contributor base, and in all likelihood have a pretty good to excellent taxonomy to work with already. Keep in mind that I study weevils, with 65k species described and 220k species (conservatively) estimated to exist, and a ~ 200 tribes mid-level classifications based largely on Lacordaire's 1-2 external character system established in 1863 (claws single, versus paired; virtually none of these have any phylogenetic value). So experientially I come from that part of the knowledge spectrum where we're likely centuries away from a sufficiently stable naming system that includes, say, more than 2/3 of the actual species diversity.
Yes this is one reason most of my examples involve entities that are not particularly controversial in their own right.
For now the idea is to concentrate on the model and how different things are related to each other.
For instance,I noticed that I had some parrot's that were incorrectly marked up as "expected in" North American.
This will be fixed in the next RDF dump, along with the freebase cougar link which is now fixed in the online RDF put not in the RDF dump.
This is less about having everything perfect and more about figuring out how one might markup these kinds of relations.
I had been watching for URI changes in DBpedia and Wikipedia but not Freebase.
I read a blog recently that described GBIF's efforts to clean up their occurrence records and thought that their 1 degree areas might make good candidates for URI's.
This would allow them to make cleaned version of species occurrences that tags each species occurrence to a particular "degree_area"
If they existed I would try to add them to my examples.
Also note that I include name variations that link to the GNI. These are not true synonyms they should be interpreted as "*someone said this might be a synonym of the current name*",
In addition, what I call BasionymName is not the same as what others call it.
I use this field to put in what appears to be the first name used for the species. In Zoology this would be the name that does not contain ().
It might be best to change to some other term in the future to avoid confusion.
I'm not opposed to pragmatic solutions for taxa where it makes sense
(again, as if anyone cared..). But, trying to foresee the very substantive classificatory shifts that many other groups likely still will experience 10, 20, 50, 100 years down the road from now, I think just the same that there are solid grounds for working out an admittedly non-pragmatic, but sematically maximally powerful solution. I think it's not unreasonable to assume that from some groups, classification in 250 years from now will look just as different from today's system as Linnaeus' 1758 system looks to us today. He recognized 2 weevil genera and about 90 species. Now we have 5800 genera. We may end up with 15,000.
Yes, you are correct. I am particularly concerned about tying the concepts to one taxonomic hierarchy.
For your weevil's, I think that there would be some advantages in marking up what you think exists, documenting those with photos and links specimens and name variations. Open accessible versions of these are much more likely to be improved that ones that are hidden in a dispersed collection of hard to find journals - many which have limits on the number of photographs etc.
This woud certainly make it easier for someone with a specimen that they think applies to find you and the other potential concept candidates.
I don't have many weevils so you might want to think about doing this yourself. My only concern is how to keep the vocabularies etc in sync while everything is being discussed.
The problem that at present only taxonomic experts can understand how to
retrace the meanings of concepts proposed in the history of the field, and computers can't yet do this because the data are not marked up and linked to each other precisely enough (as precisely as an ontological representation would demand it), is not just going to go away by some top-down adherence to a "consensus". Most taxonomists jump into the field precisely because they come to realize that the "consensus" has serious problems (read: "sucks"). So, as long as there is justified taxonomic research, there will be reclassification. ]And no, IMO coming up with a synapomorphy-based or node-pointing naming system will not miraculously allow us to have a reliable system.]
Yes, this will never really go away. My point with eBird and The Plant List is that there are groups doing this, seemingly without much controversy, so why is what I propose so controversial?
To the extent that there is a discussion here on the list as to where the
DC and a possible successor of the TCS is going, I think that's a worthwhile discussion to have.
I hope that others agree with you. :-)
- Pete
Regards,
Nico
On 5/13/2011 4:41 PM, Peter DeVries wrote:
Hi Nico,
Thanks for posting this.
I have something in the concept model to indicate the basis for the species concept.
For now I have three types. An individual species concept can have a combination of one, two or all three
In the RDF they look like this
<txn:speciesConceptBasedOn rdf:resource=" http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel%22/%3E
The first is what I call the #ObjectiveSpeciesModel - this indicates that it is a species concept because we say it is.
All the species concepts are at least an #ObjectiveSpeciesModel
*This is in part a way to handle things like the domestic cat which you want to be seen as different from the African Wildcat.
There are also tags for
txn:PhylogeneticSpeciesModel txn:BiologicalSpeciesModel
For now I don't have these other models set in the example data, but fields are in the database and the code for that an editor could state the basis for the model.
I can think of a couple of different ways to handle the issue of alternative species concepts.
- Note that the identifications as proposed by DarwinCore don't seem to
indicate what kind of model the identifications were based on. So it is not clear to me if a straight DarwinCore data set would allow the analysis above.
Instead of having multiple different statements like
*txn:occurrenceHasSpeciesConcept <> *in the record for each occurrence
one could use different predicates to link to different kinds of species concepts.
*txn:occurrenceHasUniprotConcept* => < http://purl.uniprot.org/taxonomy/9696%3E
This would allow someone to query for the occurrences of < http://purl.uniprot.org/taxonomy/9696%3E
That said, it is not clear to me what people mean by different identifications.
Is the intent to have identifications with different homotypic synonyms to be an identification of the same thing or not?
The way it works now in many data sets is that Felis concolor, Puma concolor and Puma conncolor are treated as identifications of different things.
This is another way of saying* is the namestring the concept?*
My understanding of the eBird project is that it allows citizen scientists to contribute their own observations. This creates a much larger data set for analysis etc.
They have a created a curated list of species and a ~6 letter code for each. This serves as a guide for observers on how to encode their observations.
I think their progress would be inhibited, the occurrence coding inconsistant, and contributors frustrated, if they have a list that included many overlapping species concepts.
Thanks again for you comments,
- Pete
On Fri, May 13, 2011 at 3:05 AM, Nico Franz nico.franz@upr.edu wrote:
Hello Pete (et al.):
For birds, Town Peterson at KU and colleagues have published these papers showing how alternative bird taxonomies affect the ranking of conservation priorities.
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_...
http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_...
Here's the abstract of the 1999 paper:
Analysis of geographic concentrations of endemic taxa is often used to determine priorities for conservation action; nevertheless, assumptions inherent in the taxonomic authority list used as the basis for analysis are not always considered. We analyzed foci of avian endemism in Mexico under two alternate species concepts. Under the biological species concept, 101 bird species are endemic to Mexico and are concentrated in the mountains of the western and southern portions of the country. Under the phylogenetic species concept, however, total endemic species rises to 249, which are concentrated in the mountains and lowlands of western Mexico. Twenty-four narrow endemic biological species are concentrated on offshore islands, but 97 narrow endemic phylogenetic species show a concentration in the Transvolcanic Belt of the mainland and on several offshore islands. Our study demonstrates that conservation priorities based on concentrations of endemic taxa depend critically on the particular taxonomic authority employed and that biodiversity evaluations need to be developed in collaboration or consultation with practicing systematic specialists.
There was a debate recently on Taxacom that was started and subsequently neatly summarized by Fabian Haas. The topic was "let's summarize reasons why 'donors' seem to not fund taxonomy". One point from the summary was this:
- Taxonomy is over-accurate for most applications
Most (not all) decisions in e.g. modelling and conservation are done and can be done without complete knowledge of taxa. As it is, decisions for conservation areas are often based on flagship species (e.g. elephants), on taxa which have an excellent research background, e.g. birds (IBAs), on availability of land (e.g. land with a high Tsetse burden), importance as corridor and other factors, but never on a complete view on an all biodiversity in a specific area. Even if an inventory existed, it would be an illusion that we could collect data on ecological requirements and population dynamics for most of the species necessary for informed decisions. A complete inventory does not seem to provide an advantage for conservation. I personally think there's some truth to that. I also think that, while it's understandable that an accurate representation of the (sometimes) fleetingness of taxonomic consensus it not a priority for applied ecological projects, if taxonomists themselves don't find better ways to document and link these alternatives perspectives, then it's not the best science we can do. That would be fine too if adopted outright as a pragmatic stance.
Regards,
Nico
On 5/13/2011 1:08 AM, Peter DeVries wrote:
I thought that I would also mention that in addition to The Plants List, the eBird project also uses on overlapping concepts in its bird list (it does have concepts for common hybrids)
What is clear to me is that you cannot create graphs like these if every observation can have X number of species (especially those that overlapping ) without any indication which is is the most appropriate one.
eBird Occurrence Maps Northern Cardinal http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
NCBI is also similar.
Perhaps a member of the consensus committee can comment?
-- Pete
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
participants (4)
-
Jim Croft
-
Matt Jones
-
Nico Franz
-
Peter DeVries