[tdwg-content] Another example of non-overlapping concepts

Jim Croft jim.croft at gmail.com
Wed May 18 00:17:13 CEST 2011

hmmm... a  http://www.freebase.com/view/en/cougar seems to be an
'armoured fighting vehicle'...

which might be part of the point you are making... wondering if
NotEvenRemotelyTheSameAs is a valid relationship type... ;)


On Wed, May 18, 2011 at 8:00 AM, Peter DeVries <pete.devries at gmail.com> wrote:
> Hi Matt,
> It took me a while to ponder your question. There is a long answer which
> complex and easily misinterpreted and there is a shorter answer.
> For now I think the "shorter" answer set in a historical context is best.
> The best use of my abilities seems to be recognizing a "ability gap" and
> figuring out a technical solution or tool to address it.
> The most visible of these were involving microscopy and visualization tools
> to make complex ideas understandable.
> My interest in the species problem dates back to when I had the opportunity
> to talk with E.O. WIlson in 1991/1992.
> At that time he said that if you have a knack for computers we need all this
> information in databases so it is accessible.
> *One of his former Ph.D. students is on my committee.
> Years later I had the opportunity to work on questions like this and started
> to think about how to connect all these disparate facts about species
> together in a usable queryable knowledge base.
> I noticed that several groups and individuals were marking up data sets
> including observations with different scientific names even though they were
> clearly meaning the same "species".
> * These groups would agree that they were communicating about the same
> species, but not always agree on the name
> This prevents large scale data integration and analysis which in part is
> described here: http://about.geospecies.org/
> With the advent of the web, and the the semantic web in particular, this
> "database" could be global and almost infinitely scalable.
> I started lobbying TDWG starting in 2006 for two things:
> 1) A GUID for the "species" that was not tied to a particular name string
> 2) A system that followed semantic web best practices which LSID etc. do
> not.
> Since my TDWG efforts were not successful, I started GeoSpecies and based on
> comments from a semantic web expert modified these somewhat into what is now
> TaxonConcept.org
> The TCS is an xml standard for transmitting information about a taxon
> concepts that I think maps best to a "name use concept." (Rich's TNU's)
> The TaxonConcepts are identified with semantic web GUIDs that follow
> semantic web best practices and resolve to an informative documents.
> In their current form these documents are not ideal because they do not do a
> good enough job clearing up what would be the best concept match for a given
> individual or specimen.
> They do however have most of the plumbing for this in that they allow
> semantic web links to name uses, specimens, occurrence records, images, DNA,
> authors and publications including the original description.
> They also link to similar entities that are on the semantic web, most
> notably DBpedia, Uniprot, Freebase, Bio2RDF etc.
> This linking may not seem valuable to a humans, is valuable for machines
> that need to determine what entities are similar and what entities are
> different.
> This also increases the "findability" of these other data sets.
> I see my current set of about 105,000 species as an example set that people
> can use to try out these models.
> In their final form these should be authored by editors that determine what
> specimens and other data are good examples of instances of these concepts.
> The editors will be linked via a URI so it is easy to track attribution.
> The final concepts do not have to be in one place, they could be distributed
> but to avoid the kinds of nomenclatural differences that have occurred
> between zoology / botany etc it would be best to have one code base for now.
> They don't have to have the same underlying stack, which now is based on
> Ruby on Rails, but could be ported to anything.
> What they do need is a common structure and a common understanding as to
> what each attribute means and how it can be appropriately used.
> For some use cases it is appropriate to consider the following the same
> "thing"
>  http://lod.taxonconcept.org/ses/v6n7p#Species
>  http://purl.uniprot.org/taxonomy/9696
>  http://www.freebase.com/view/en/cougar
> http://sw.opencyc.org/concept/Mx4rvVj5o5wpEbGdrcN5Y29ycA
> http://www.bbc.co.uk/nature/species/Cougar#species
> For other use cases, this sameAs is not appropriate.
> Wikipedia is very valuable, but if someone changes the article title then
> the URI changes in DBpedia.
> Uniprot and Bio2RDF are useful in that they link to lots of related data but
> they don't really give you any information about what specimens are
> instances of that concept and they only have those species which have NCBI
> ID's.
> What I want is a set of GUID's that resolve to a human readable HTML page
> and an RDF representation that people can use to "tag" their data.
> For instance:
> I am going to assert that what I have under the microscope is an instance of
> the concept described on this page. I do not tie this assertion to a
> particular name or classification hierarchy.
> Because it makes no sense to replicate the functionality of the Encyclopedia
> of Life etc., I am mainly concentrating on the RDF representations and
> testing if they behave as expected in SPARQL queries.
> * The HTML pages are not really pretty or as informative as the RDF or as
> the concept as viewed in the knowledge base.
> I have been working with the Encyclopedia of Life and GNI groups for a while
> exploring how these may or may not be useful to them.
> During my visited Woods Hole I said that I have no interest in building and
> empire I just want to build a solution and would like to partner with them
> and GBIF.
> Although I remain active on TDWG I find the most valuable suggestions seem
> to come from the LOD community since we seem to have a common goal - that is
> creating something that works in a reasonable amount of time.
> Also, in the LOD cloud every linked data set increases the value of all the
> other data sets.
> This is probably more than your question required, but it provides some
> explanation as to what these are and why I have implemented them in the way
> I have.
> Respectfully,
> - Pete
> On Fri, May 13, 2011 at 4:14 PM, Matt Jones <jones at nceas.ucsb.edu> wrote:
>> Hi Peter,
>> Does your idea of #ObjectiveSpeciesModel correspond 1:1 with the TCS
>> standard's idea of a Nominal Concept (i.e., <TaxonConcept type="nominal">) ?
>>  Can you outline how your concept types differ from TCS concept types?
>> Thanks,
>> Matt
>> On Fri, May 13, 2011 at 12:41 PM, Peter DeVries <pete.devries at gmail.com>
>> wrote:
>>> Hi Nico,
>>> Thanks for posting this.
>>> I have something in the concept model to indicate the basis for the
>>> species concept.
>>> For now I have three types. An individual species concept can have a
>>> combination of one, two or all three
>>> In the RDF they look like this
>>> <txn:speciesConceptBasedOn
>>> rdf:resource="http://lod.taxonconcept.org/ontology/txn.owl#ObjectiveSpeciesModel"/>
>>> The first is what I call the #ObjectiveSpeciesModel - this indicates that
>>> it is a species concept because we say it is.
>>> All the species concepts are at least an #ObjectiveSpeciesModel
>>> *This is in part a way to handle things like the domestic cat which you
>>> want to be seen as different from the African Wildcat.
>>> There are also tags for
>>> txn:PhylogeneticSpeciesModel
>>> txn:BiologicalSpeciesModel
>>> For now I don't have these other models set in the example data, but
>>> fields are in the database and the code for that an editor could state the
>>> basis for the model.
>>> I can think of a couple of different ways to handle the issue of
>>> alternative species concepts.
>>> * Note that the identifications as proposed by DarwinCore don't seem to
>>> indicate what kind of model the identifications were based on.
>>>   So it is not clear to me if a straight DarwinCore data set would allow
>>> the analysis above.
>>> Instead of having multiple different statements like
>>> txn:occurrenceHasSpeciesConcept <> in the record for each occurrence
>>> one could use different predicates to link to different kinds of species
>>> concepts.
>>> txn:occurrenceHasUniprotConcept =>
>>> <http://purl.uniprot.org/taxonomy/9696>
>>> This would allow someone to query for the occurrences
>>> of <http://purl.uniprot.org/taxonomy/9696>
>>> That said, it is not clear to me what people mean by different
>>> identifications.
>>> Is the intent to have identifications with different homotypic synonyms
>>> to be an identification of the same thing or not?
>>> The way it works now in many data sets is that Felis concolor, Puma
>>> concolor and Puma conncolor are treated as identifications of different
>>> things.
>>> This is another way of saying is the namestring the concept?
>>> My understanding of the eBird project is that it allows citizen
>>> scientists to contribute their own observations. This creates a much larger
>>> data set for analysis etc.
>>> They have a created a curated list of species and a ~6 letter code for
>>> each. This serves as a guide for observers on how to encode their
>>> observations.
>>> I think their progress would be inhibited, the occurrence coding
>>> inconsistant, and contributors frustrated, if they have a list that included
>>> many overlapping species concepts.
>>> Thanks again for you comments,
>>> - Pete
>>> On Fri, May 13, 2011 at 3:05 AM, Nico Franz <nico.franz at upr.edu> wrote:
>>>> Hello Pete (et al.):
>>>>    For bird, Town Peterson at KU and colleagues have published these
>>>> papers showing how alternative bird taxonomies affect the ranking of
>>>> conservation priorities.
>>>> http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/PN_CB_1999.pdf
>>>> http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/NP_BN_2004.pdf
>>>> http://specify5.specifysoftware.org/Informatics/bios/biostownpeterson/P_BCI_2006.pdf
>>>>    Here's the abstract of the 1999 paper:
>>>> Analysis of geographic concentrations of endemic taxa is often used to
>>>> determine priorities for conservation
>>>> action; nevertheless, assumptions inherent in the taxonomic authority
>>>> list used as the basis for
>>>> analysis are not always considered. We analyzed foci of avian endemism
>>>> in Mexico under two alternate species
>>>> concepts. Under the biological species concept, 101 bird species are
>>>> endemic to Mexico and are concentrated
>>>> in the mountains of the western and southern portions of the country.
>>>> Under the phylogenetic species
>>>> concept, however, total endemic species rises to 249, which are
>>>> concentrated in the mountains and lowlands
>>>> of western Mexico. Twenty-four narrow endemic biological species are
>>>> concentrated on offshore islands, but
>>>> 97 narrow endemic phylogenetic species show a concentration in the
>>>> Transvolcanic Belt of the mainland and
>>>> on several offshore islands. Our study demonstrates that conservation
>>>> priorities based on concentrations of
>>>> endemic taxa depend critically on the particular taxonomic authority
>>>> employed and that biodiversity evaluations
>>>> need to be developed in collaboration or consultation with practicing
>>>> systematic specialists.
>>>>    There was a debate recently on Taxacom that was started and
>>>> subsequently neatly summarized by Fabian Haas. The topic was "let's
>>>> summarize reasons why 'donors' seem to not fund taxonomy". One point from
>>>> the summary was this:
>>>> 3) Taxonomy is over-accurate for most applications
>>>> Most (not all) decisions in e.g. modelling and conservation are done and
>>>> can be done without complete knowledge of taxa. As it is, decisions for
>>>> conservation areas are often based on flagship species (e.g. elephants), on
>>>> taxa which have an excellent research background, e.g. birds (IBAs), on
>>>> availability of land (e.g. land with a high Tsetse burden), importance as
>>>> corridor and other factors, but never on a complete view on an all
>>>> biodiversity in a specific area. Even if an inventory existed, it would be
>>>> an illusion that we could collect data on ecological requirements and
>>>> population dynamics for most of the species necessary for informed
>>>> decisions. A complete inventory does not seem to provide an advantage for
>>>> conservation.
>>>>    I personally think there's some truth to that. I also think that,
>>>> while it's understandable that an accurate representation of the (sometimes)
>>>> fleetingness of taxonomic consensus it not a priority for applied ecological
>>>> projects, if taxonomists themselves don't find better ways to document and
>>>> link these alternatives perspectives, then it's not the best science we can
>>>> do. That would be fine too if adopted outright as a pragmatic stance.
>>>> Regards,
>>>> Nico
>>>> On 5/13/2011 1:08 AM, Peter DeVries wrote:
>>>> I thought that I would also mention that in addition to The Plants List,
>>>> the eBird project also uses on overlapping concepts in its bird list (it
>>>> does have concepts for common hybrids)
>>>> What is clear to me is that you cannot create graphs like these if every
>>>> observation can have X number of species (especially those that overlapping
>>>> ) without any indication which is is the most appropriate one.
>>>> eBird Occurrence Maps Northern Cardinal
>>>> http://ebird.org/content/ebird/about/occurrence-maps/northern-cardinal
>>>> NCBI is also similar.
>>>> Perhaps a member of the consensus committee can comment?
>>>> -- Pete
>>>> ------------------------------------------------------------------------------------
>>>> Pete DeVries
>>>> Department of Entomology
>>>> University of Wisconsin - Madison
>>>> 445 Russell Laboratories
>>>> 1630 Linden Drive
>>>> Madison, WI 53706
>>>> Email: pdevries at wisc.edu
>>>> TaxonConcept  &  GeoSpecies Knowledge Bases
>>>> A Semantic Web, Linked Open Data  Project
>>>> --------------------------------------------------------------------------------------
>>>> _______________________________________________
>>>> tdwg-content mailing list
>>>> tdwg-content at lists.tdwg.org
>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>> _______________________________________________
>>>> tdwg-content mailing list
>>>> tdwg-content at lists.tdwg.org
>>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>> --
>>> ------------------------------------------------------------------------------------
>>> Pete DeVries
>>> Department of Entomology
>>> University of Wisconsin - Madison
>>> 445 Russell Laboratories
>>> 1630 Linden Drive
>>> Madison, WI 53706
>>> Email: pdevries at wisc.edu
>>> TaxonConcept  &  GeoSpecies Knowledge Bases
>>> A Semantic Web, Linked Open Data  Project
>>> --------------------------------------------------------------------------------------
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> --
> ------------------------------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> Email: pdevries at wisc.edu
> TaxonConcept  &  GeoSpecies Knowledge Bases
> A Semantic Web, Linked Open Data  Project
> --------------------------------------------------------------------------------------
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-content

Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~ http://about.me/jrc
'A civilized society is one which tolerates eccentricity to the point
of doubtful sanity.'
 - Robert Frost, poet (1874-1963)

Please send URLs, not attachments:

More information about the tdwg-content mailing list