I thought that it might be useful to provide some initial comments on the GBIF KOS Report.

There are several issues but I will mention only a few in this email.

The first is "There appear to be no systematic attempts to develop use cases, competency questions,  or other goals for use of KOS in the biodiversity informatics community."

What about these resources and efforts that have been going on for several years?

http://about.geospecies.org/

http://about.geospecies.org/sparql.xhtml   http://www.taxonconcept.org/example-sparql-queries/

http://www.taxonconcept.org/

Note that this seems to be the only open SPARQL endpoint that is devoted to biodiversity informatics.

http://www.taxonconcept.org/sparql-endpoint/

It is also the SPARQL endpoint for a number of the data sets that are mentioned.

It also has the only examples which use the "IETF scheme for URIs for geographic locations" mentioned in the report.

Also this: "there appears to be no semantically enabled discovery of these resources.  Work across subdisciplines is hampered by this, as scientists haphazardly locate resources which may or may not be the most fit for their purpose. For example, a field biologist made aware of ITIS might never become aware of its relationship to the Catalog of Life."

This RDF snippet is from this record ( http://lod.geospecies.org/ses/v6n7p.html ) one of several thousand that have been around for years.

By querying one of the various LOD services a human or machine would find this interlinking.

    <skos:closeMatch rdf:resource="urn:lsid:ubio.org:namebank:105509"/>

    <skos:closeMatch rdf:resource="urn:lsid:catalogueoflife.org:taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>

    <skos:closeMatch rdf:resource="http://www.uniprot.org/taxonomy/9696"/>

    <skos:closeMatch rdf:resource="http://bio2rdf.org/taxon:9696"/>

    <rdfs:seeAlso rdf:resource="http://bio2rdf.org/taxon:9696"/>

    <skos:closeMatch rdf:resource="http://dbpedia.org/resource/Cougar"/>

    <rdfs:seeAlso rdf:resource="http://dbpedia.org/resource/Cougar"/>

    <skos:closeMatch rdf:resource="http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000008b5a3"/>

    <skos:closeMatch rdf:resource="http://sw.opencyc.org/concept/Mx4rvVj5o5wpEbGdrcN5Y29ycA"/>

    <skos:closeMatch rdf:resource="http://www.bbc.co.uk/nature/species/Cougar#species"/>

    <rdfs:seeAlso rdf:resource="http://www.bbc.co.uk/nature/species/Cougar.rdf"/>

    <geospecies:hasGBIF>13815711</geospecies:hasGBIF>

    <geospecies:hasGBIFPage rdf:resource="http://data.gbif.org/species/13815711"/>

    <foaf:page rdf:resource="http://data.gbif.org/species/13815711"/>

    <geospecies:hasITIS>552479</geospecies:hasITIS>

    <foaf:page rdf:resource="http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&amp;search_value=552479"/>

    <geospecies:hasNCBI>9696</geospecies:hasNCBI>

    <foaf:page rdf:resource="http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9696&amp;lvl=0"/>

    <geospecies:hasBioLib>id1995</geospecies:hasBioLib>

    <geospecies:hasBioLibPage rdf:resource="http://www.biolib.cz/en/taxon/id1995"/>

    <foaf:page rdf:resource="http://www.biolib.cz/en/taxon/id1995"/>

    <geospecies:hasBBCPage rdf:resource="http://www.bbc.co.uk/nature/species/Cougar"/>

    <foaf:page rdf:resource="http://www.bbc.co.uk/nature/species/Cougar"/>

    <geospecies:hasGNI>505310</geospecies:hasGNI>

    <geospecies:hasGNIPage rdf:resource="http://globalnames.org/?search_term=id:505310"/>

    <geospecies:hasWikipediaArticle rdf:resource="http://en.wikipedia.org/wiki/Cougar"/>

    <foaf:page rdf:resource="http://en.wikipedia.org/wiki/Cougar"/>

    <geospecies:hasWikispeciesArticle rdf:resource="http://species.wikimedia.org/wiki/Puma_concolor"/>

    <foaf:page rdf:resource="http://species.wikimedia.org/wiki/Puma_concolor"/>

    <geospecies:hasToLPage rdf:resource="http://tolweb.org/Puma_concolor"/>

    <foaf:page rdf:resource="http://tolweb.org/Puma_concolor"/>



I would also like to address this statement "For example, at this writing, LOD statistics reveal only 42 bioscience datasets holding 2.7B triples"

The Linked Open Data set list as , are only those LOD data sets that are documented here. http://www.ckan.net/

The Bio2RDF data set is over 15 billion triples on it's own http://www.slideshare.net/micheldumontier/bio2rdf-and-beyond

The authors of the report don't seem to be aware of the significance of the Linked Data movement.

http://data.nytimes.com/

http://www.guardian.co.uk/open-platform/blog/linked-data-open-platform

Oddly GeoSpecies/TaxonConcept seems to be visible to the New York Times and The Guardian, yet not the expert authors of the GBIF report?

FaceBook's OpenGraph is also Linked Data, so all those "liked" pages are linked data.

http://go-to-hellman.blogspot.com/2010/06/global-warming-of-linked-data-in.html

Here is an interesting recent blog post from O'reilly Radar.

http://radar.oreilly.com/2010/11/semantic-web-linked-data.html

The Linked Data Community is much larger and more significant than the GBIF report has implied.

Also you should look at what resources show up when you query "“Quercus alba” in the following LOD services.

http://sindice.com/
http://sindice.com/search?q=Quercus+alba&qt=term

http://uriburner.com/fct/
http://uriburner.com/fct/facet.vsp?cmd=text&sid=9244

http://lsd.taxonconcept.org/fct/
http://lsd.taxonconcept.org/fct/facet.vsp?cmd=text&sid=71

Would seem that this would have been a little hard to miss?


I was originally somewhat skeptical about Gregor Hagedorn's earlier statement.

"I fully believe you and all who are doing it do it with careful
consideration of the needs as they see it. I just believe that those
taking these decisions have a specific perspective and use case
scenarios, that involves biologists only after the perfect software
user interface system is finished. I challenge the last assumption ...
"

But after reading through the GBIF report, I think he make a good point.

Reasoning will only work everyone has a common conceptualization of what each of this things are, and how they relate to other things.

What is a species?

What is a species relationship to a particular classification hierarchy? Can there be be more than one hierarchy?

The report authors have experience in creating highly engineered systems where each entity is modelled in a formal strict way.

It is not clear to me that we have agreement on some of the fundamental entities, let alone how the relate to each other.

In my own work I have been thinking that the model of a species for occurrences etc. might not be the best model for addressing phylogenetic questions.

This is because you might want to have one standard agreed on classification so you can search for species in a given family that are potential pathogen vectors.

Others might want a different kind of entity that is not tied to one particular classification so they can address phylogenetic questions.

What you don't want is to prevent people from asking questions that relate to subfamilies etc. because the model does not support them.

It might be best to have separate, but loosely linked models for these different kinds of "perspectives"

This is similar to the way I model relationships between various LOD species entities.

For some uses you could interpret 

<Puma concolor se:v6n7p> as being the same thing as <http://purl.uniprot.org/taxonomy/9696>

For other uses, they are not the same thing. (The later is in a nested set of NCBI classification subclasses)

Because of this, I tie them together loosely with a skos:closeMatch.

This keeps them "findable" without entailing the other entities potentially incompatible conceptualization.

This allows the end user of the data to determine if they want to convert these to a owl:sameAs relationship or not.

My guess is that this is exactly the kind of thing that at least some of the authors of the GBIF report might not like.

For reasoning etc. they would probably prefer that these species concepts fall under some highly constrained classification hierarchy - probably the one produced by the Catalog of Life.

Regarding issues like this that I think Gregor has made an interesting point.

Respectfully,

- Pete
---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base / GeoSpecies Knowledge Base
About the GeoSpecies Knowledge Base
------------------------------------------------------------