[tdwg-content] First Comments on the GBIF KOS Report

Peter DeVries pete.devries at gmail.com
Sat Nov 27 09:47:58 CET 2010


I thought that it might be useful to provide some initial comments on the
GBIF KOS Report.

There are several issues but I will mention only a few in this email.

The first is *"There appear to be no systematic attempts to develop use
cases, competency questions,  or other goals for use of KOS in the
biodiversity informatics community."*

What about these resources and efforts that have been going on for several
years?

http://about.geospecies.org/

http://about.geospecies.org/sparql.xhtml
http://www.taxonconcept.org/example-sparql-queries/

http://www.taxonconcept.org/

Note that this seems to be the only open SPARQL endpoint that is devoted to
biodiversity informatics.

http://www.taxonconcept.org/sparql-endpoint/

It is also the SPARQL endpoint for a number of the data sets that are
mentioned.

It also has the only examples which use the "IETF scheme for URIs for
geographic locations" mentioned in the report.

Also this: *"there appears to be no semantically enabled discovery of these
resources.  Work across subdisciplines is hampered by this, as scientists
haphazardly locate resources which may or may not be the most fit for their
purpose. For example, a field biologist made aware of ITIS might never
become aware of its relationship to the Catalog of Life."*

This RDF snippet is from this record (
http://lod.geospecies.org/ses/v6n7p.html ) one of several thousand that have
been around for years.

By querying one of the various LOD services a human or machine would find
this interlinking.

       <skos:closeMatch rdf:resource="urn:lsid:ubio.org:namebank:105509"/>

    <skos:closeMatch rdf:resource="urn:lsid:catalogueoflife.org:
taxon:24e7d624-60a7-102d-be47-00304854f810:ac2010"/>

    <skos:closeMatch rdf:resource="http://www.uniprot.org/taxonomy/9696"/>

    <skos:closeMatch rdf:resource="http://bio2rdf.org/taxon:9696"/>

    <rdfs:seeAlso rdf:resource="http://bio2rdf.org/taxon:9696"/>

    <skos:closeMatch rdf:resource="http://dbpedia.org/resource/Cougar"/>

    <rdfs:seeAlso rdf:resource="http://dbpedia.org/resource/Cougar"/>

    <skos:closeMatch rdf:resource="
http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000008b5a3"/>

    <skos:closeMatch rdf:resource="
http://sw.opencyc.org/concept/Mx4rvVj5o5wpEbGdrcN5Y29ycA"/>

    <skos:closeMatch rdf:resource="
http://www.bbc.co.uk/nature/species/Cougar#species"/>

    <rdfs:seeAlso rdf:resource="
http://www.bbc.co.uk/nature/species/Cougar.rdf"/>

    <geospecies:hasGBIF>13815711</geospecies:hasGBIF>

    <geospecies:hasGBIFPage rdf:resource="
http://data.gbif.org/species/13815711"/>

    <foaf:page rdf:resource="http://data.gbif.org/species/13815711"/>

    <geospecies:hasITIS>552479</geospecies:hasITIS>

    <foaf:page rdf:resource="
http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&amp;search_value=552479
"/>

    <geospecies:hasNCBI>9696</geospecies:hasNCBI>

    <foaf:page rdf:resource="
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9696&amp;lvl=0"/>

    <geospecies:hasBioLib>id1995</geospecies:hasBioLib>

    <geospecies:hasBioLibPage rdf:resource="
http://www.biolib.cz/en/taxon/id1995"/>

    <foaf:page rdf:resource="http://www.biolib.cz/en/taxon/id1995"/>

    <geospecies:hasBBCPage rdf:resource="
http://www.bbc.co.uk/nature/species/Cougar"/>

    <foaf:page rdf:resource="http://www.bbc.co.uk/nature/species/Cougar"/>

    <geospecies:hasGNI>505310</geospecies:hasGNI>

    <geospecies:hasGNIPage rdf:resource="
http://globalnames.org/?search_term=id:505310"/>

    <geospecies:hasWikipediaArticle rdf:resource="
http://en.wikipedia.org/wiki/Cougar"/>

    <foaf:page rdf:resource="http://en.wikipedia.org/wiki/Cougar"/>

    <geospecies:hasWikispeciesArticle rdf:resource="
http://species.wikimedia.org/wiki/Puma_concolor"/>

    <foaf:page rdf:resource="http://species.wikimedia.org/wiki/Puma_concolor
"/>

    <geospecies:hasToLPage rdf:resource="http://tolweb.org/Puma_concolor"/>

    <foaf:page rdf:resource="http://tolweb.org/Puma_concolor"/>


I would also like to address this statement *"For example, at this writing,
LOD statistics reveal only 42 bioscience datasets holding 2.7B triples"*

The Linked Open Data set list as , are only those LOD data sets that are
documented here. http://www.ckan.net/

The Bio2RDF data set is over 15 billion triples on it's own
http://www.slideshare.net/micheldumontier/bio2rdf-and-beyond

The authors of the report don't seem to be aware of the significance of the
Linked Data movement.

http://data.nytimes.com/

http://www.guardian.co.uk/open-platform/blog/linked-data-open-platform

Oddly GeoSpecies/TaxonConcept seems to be visible to the New York Times and
The Guardian, yet not the expert authors of the GBIF report?

FaceBook's OpenGraph is also Linked Data, so all those "liked" pages are
linked data.

http://go-to-hellman.blogspot.com/2010/06/global-warming-of-linked-data-in.html

Here is an interesting recent blog post from O'reilly Radar.

http://radar.oreilly.com/2010/11/semantic-web-linked-data.html

The Linked Data Community is much larger and more significant than the GBIF
report has implied.

Also you should look at what resources show up when you query "“Quercus
alba” in the following LOD services.

http://sindice.com/
http://sindice.com/search?q=Quercus+alba&qt=term

http://uriburner.com/fct/
http://uriburner.com/fct/facet.vsp?cmd=text&sid=9244

http://lsd.taxonconcept.org/fct/
http://lsd.taxonconcept.org/fct/facet.vsp?cmd=text&sid=71

Would seem that this would have been a little hard to miss?


I was originally somewhat skeptical about Gregor Hagedorn's earlier
statement.

"*I fully believe you and all who are doing it do it with careful
consideration of the needs as they see it. I just believe that those
taking these decisions have a specific perspective and use case
scenarios, that involves biologists only after the perfect software
user interface system is finished. I challenge the last assumption ...*"

But after reading through the GBIF report, I think he make a good point.

Reasoning will only work everyone has a common conceptualization of what
each of this things are, and how they relate to other things.

What is a species?

What is a species relationship to a particular classification hierarchy? Can
there be be more than one hierarchy?

The report authors have experience in creating highly engineered systems
where each entity is modelled in a formal strict way.

It is not clear to me that we have agreement on some of the fundamental
entities, let alone how the relate to each other.

In my own work I have been thinking that the model of a species for
occurrences etc. might not be the best model for addressing phylogenetic
questions.

This is because you might want to have one standard agreed on classification
so you can search for species in a given family that are potential pathogen
vectors.

Others might want a different kind of entity that is not tied to one
particular classification so they can address phylogenetic questions.

What you don't want is to prevent people from asking questions that relate
to subfamilies etc. because the model does not support them.

It might be best to have separate, but loosely linked models for these
different kinds of "perspectives"

This is similar to the way I model relationships between various LOD species
entities.

For some uses you could interpret

<Puma concolor se:v6n7p <http://lod.taxonconcept.org/ses/v6n7p#Species>> as
being the same thing as <http://purl.uniprot.org/taxonomy/9696>

For other uses, they are not the same thing. (The later is in a nested set
of NCBI classification subclasses)

Because of this, I tie them together loosely with a skos:closeMatch.

This keeps them "findable" without entailing the other entities potentially
incompatible conceptualization.

This allows the end user of the data to determine if they want to convert
these to a owl:sameAs relationship or not.

My guess is that this is exactly the kind of thing that at least some of the
authors of the GBIF report might not like.

For reasoning etc. they would probably prefer that these species concepts
fall under some highly constrained classification hierarchy - probably the
one produced by the Catalog of Life.

Regarding issues like this that I think Gregor has made an interesting
point.

Respectfully,

- Pete
      ---------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies
Knowledge Base <http://lod.geospecies.org/>
About the GeoSpecies Knowledge Base <http://about.geospecies.org/>
------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20101127/f4738f32/attachment.html 


More information about the tdwg-content mailing list