[tdwg-content] If you need something for referring to a population, then it is probably best to do it as a related class

Steve Baskauf steve.baskauf at vanderbilt.edu
Wed May 4 16:14:53 CEST 2011

OK, I think that I have already probably said more than people want to 
hear on this subject.  So I will stop with this:
1. It does not appear that there is anything "wrong" with the 
taxonconcept.org from a technical standpoint.  It does what Pete wants 
it to do and that is very cool.
2. I believe that there are aspects of the taxonconcept.org (introduced 
for convenience in querying) that make it much more complicated than I 
think are necessary to represent the core conceptual entities in the 
biodiversity informatics community.  I believe (for reasons articulated 
previously) that some of those complexities may introduce problems in a 
distributed system where people of different institutions are linking to 
each other's URIs. 
3. I believe that the way that taxonconcept.org conceptualizes some of 
these core entities is not congruent with the most common opinions that 
I have heard expressed on this list.  Note that I am not saying that the 
taxonconcept.org conceptualization is "wrong".  I am saying that in some 
ways it differs significantly from what I perceive to be the community 
consensus.  On the issue of the representation of taxa and names I am 
going to have to defer to the opinion of others (and there is no 
shortage of people on the list who are experts on this subject).  
However, I will say that if one says:
> And get results free of all inappropriate identifications. 
> Do you want the misidentifications showing up in these species lists?
> How would a general user correctly determine which of these 
> identifications are correct?
who is going to be the judge of "correct"?  I don't want to be around 
when that cat fight erupts.  I do think there ought to be some way that 
a determiner can indicate that they may have made a mistake on their own 
Identification.  But I think multiple Identifications better be multiple 
opinions or else there will never be a system that will be supported by 
diverse participants.

> On Wed, May 4, 2011 at 6:04 AM, Steve Baskauf 
> <steve.baskauf at vanderbilt.edu <mailto:steve.baskauf at vanderbilt.edu>> 
> wrote:
>     Thanks, Bob, for the examples.  I will try to dig my way through
>     them. 
>     I don't want to give the impression that Darwin-SW was not
>     intended to facilitate any reasoning.  That is, after all why it
>     is called "Darwin-SW" instead of "Darwin-data-markup".  I know
>     that Cam is quite interested in the "semantic" end of it, and when
>     he has Internet again I hope he will chime in on this.  I'm simply
>     confessing what my primary concern is (data markup).  When we
>     started working on the ontology, we decided to make it as simple
>     as possible while still trying to permit every (or almost every)
>     kind of class and relationship that was discussed in the Oct/Nov
>     discussion.  The result was to have a single class Occurrence
>     whose instances are described by properties, not 1.7 million
>     classes N#occurrence and so on for the other six classes in the
>     model.  The intention was that DSW 1.0 would be constructed in
>     such a way that it could support the addition of more complex
>     components (Cam has actually marked the posted version at version
>     0.2 which means that it is certainly subject to improvement) and
>     possibly more complex reasoning.  But the more complex stuff was
>     not put into the model at the start because we wanted something
>     that (hopefully) most people could agree represents reality
>     reasonably well (at least a TDWG form of reality since it uses the
>     structure of DwC as its basis) and hence it would actually have
>     the possibility of being used by more than two people. 
>     I hope that people realize that I'm not making these comments to
>     give Pete a hard time or anything.  I really am trying to
>     understand the relative benefits and problems of modeling on class
>     of cat with many properties vs. creating a class of cats for every
>     property we care about.  Clearly Pete's interest is in Taxon
>     Concepts in the sense that he has defined them.  OK, just to set
>     up a straw man, let's say that I am interested in geography more
>     than taxonomy.  So I define a class and URI for every state and
>     province in the world.  I have no idea how many of those there
>     are, but I'll guess 400.  Now I want to describe other things in
>     the biodiversity informatics world.  So I mint classes
>     http://baskaufgeo.org/lod/ohio#occurrence for occurrences that
>     happen in Ohio, http://baskaufgeo.org/lod/swaziland#occurrence for
>     occurrences that happen in Swaziland,
>     http://baskaufgeo.org/lod/tennessee#occurrence,
>     http://baskaufgeo.org/lod/ohio#taxon,
>     http://baskaufgeo.org/lod/swaziland#taxon,
>     http://baskaufgeo.org/lod/tennessee#taxon, etc. etc. for all 400
>     state/provinces and all seven basic types of things in the
>     biodiversity domain.  I can now do cool queries that involve
>     geography. 
>     OK, maybe I'm somebody else and I love thinking about temporal
>     relationships.  So I create
>     http://baskauf-time.org/lod/1959may#occurrence for occurrences
>     that happen in May of 1959,
>     http://baskauf-time.org/lod/2005may#occurrence for occurrences
>     that happened in May of 2005, etc.  Given a billion or so years of
>     life on earth, that would give me about 12 billion classes for
>     each of the six other basic kinds of things I want to model.  I
>     could do all kinds of cool queries that involve time now. 
>     So which one of these three ontologies are we going to adopt?  The
>     taxon based one?  The time based one?  The geography based one? 
>     Now we are not just having to chose whether to model things as a
>     single class of cats whose instance have many color and
>     reproductiveMethod properties vs. many classes of cats each
>     defined on the basis of its color.  We must decide whether it's
>     better to have many classes of colors each defined by the kind of
>     animal that has that color, or many kinds of reproductive systems,
>     each with different kinds of animals, etc.  Where is it going to
>     end and how could we agree on which system to use?  It seems to me
>     that it would be better to have a class of cats, a class of
>     reproductive systems, etc. and connect their instances with
>     properties. 
>     Am I somehow thinking about this incorrectly?
>     Steve
>     Bob Morris wrote:
>>     See, for example,
>>     Mungall et al., “Integrating phenotype ontologies across multiple
>>     species”, Genome
>>     Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2)
>>     Ward Blondé et al.  "Reasoning with bio-ontologies: using relational
>>     closure rules to enable practical querying", Bioinformatics (2011)
>>     doi: 10.1093/bioinformatics/btr164
>>     Calder, et al. "Machine Reasoning about Anomalous Sensor Data"
>>     http://dx.doi.org/10.1016/j.ecoinf.2009.08.007 or in manuscript form
>>     at http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf
>>     ...
>>     OK, so  maybe these knowledge domains are all hypothesis-driven
>>     sciences (i.e.,  sciences), and <whatever dsw is modelling> is not.
>>     But that would be sad.
>>     Bob
>>     p.s. I had almost finished something else on this thread when Hilmar
>>     beat me to the punch. But here's a slightly different expression of
>>     his point:
>>     It turns out that the differences between instances and classes is
>>     mainly important in contexts in which you have declaimed interest,
>>     namely reasoning.  In the RDF/RDFS/OWL stack, enforcing a distinction
>>     between classes and instances only occurs pretty high up in the stack,
>>     when one desires an OWL variant that will offer guarantees that
>>     reasoners will finish any inference they are asked to verify,
>>     preferably in less than exponential time . I guess, but am not
>>     certain, that even in an LOD context, if data are described with an
>>     OWL ontology that is known to be intractable, e.g. not in OWL DL, that
>>     it is possible to design SPARQL queries that will never complete. In
>>     fact, I believe that even with tractable ontologies, there are SPARQL
>>     queries that are fundamentally exponential in the number of variables.
>>     p.p.s. Irrelevant, but equivalent, aside about mathematics. At the
>>     turn of the 20th century, Whitehead and Russell tried (and failed) to
>>     show that everything about numbers could be logically derived from an
>>     axiomatic description of the natural numbers (i.e. non-negative
>>     integers). It was later shown to be the case that you must include in
>>     your logical foundations something deeper, namely the ability to have
>>     sets that are elements of other sets (roughly, classes that are
>>     individuals in other classes.).  Without this, and starting only with
>>     the natural numbers, you can logically derive all rational numbers
>>     (fractions) and their arithmetic properties, and even all the
>>     irrational numbers that are are the solutions of polynomial equations
>>     with integer coefficients ("algebraic numbers") such as sqrt(2), and
>>     even solutions of the polynomials that have coefficients that are
>>     algebraic numbers.  But without introducing the notion of the set of
>>     subsets of a set, you cannot logically derive the all the interesting
>>     transcendental numbers (i.e. those which are not the roots of
>>     polynomials), such as e and pi.  So if you love calculus, you better
>>     not insist on  distinguishing instances from classes. But if you are
>>     content with polynomials, you can probably be ontologically sloppy.
>>     Or, if you don't care about logical foundations of your science, you
>>     can forget about the whole thing. :-)
>>     On Tue, May 3, 2011 at 11:51 PM, Steve Baskauf
>>     <steve.baskauf at vanderbilt.edu> <mailto:steve.baskauf at vanderbilt.edu> wrote:
>>>     [snip]
>>>     OK, so let's imagine that we mark up several million records of specimens,
>>>     tissue samples, and images as RDF.  (We don't have to imagine very hard, I
>>>     think the BiSciCol group is planning to actually do this within the next
>>>     several months.)  I would really like to hear from some of the people who
>>>     actually use "DL reasoners" (a group which certainly does not include me) to
>>>     know what it is that we could actually find out that would be useful about
>>>     that big data blob using reasoners.  I have already confessed that my
>>>     primary concern is enabling data discovery, transfer, and aggregation using
>>>     GUIDs and RDF.  I'm still somewhat of a "semantic web" skeptic as far as the
>>>     whole inferencing thing is concerned.  Aside from inferring "duplicates",
>>>     I'm really wanting to know what else there is useful that could be reasoned
>>>     outside of the Taxon/TaxonConcept class.  (I can imaging useful reasoning
>>>     being done about things in that class like the relationships among names,
>>>     concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15
>>>     article https://journals.ku.edu/index.php/jbi/article/view/25)  I think this
>>>     (data markup priority vs. inferencing priority) is an important discussion
>>>     to have before the tdwg community can settle on some kind of consensus way
>>>     of turning database records into RDF, particularly if it is going to have a
>>>     big influence on the way the RDF model is set up.  To me, there is a clear
>>>     and immediate need to be able to mark data up in a straightforward way.  If
>>>     we can get the semantic part, too, that would be great but not at the
>>>     expense of data markup.  I just was at a meeting of a bunch of herbarium
>>>     curators.  They desperately need a way to implement GUIDs and aggregate data
>>>     and they need it now.  I really don't think they care one whit about
>>>     inferencing.  If we coalesce on a model that is great for doing cool things
>>>     with 10 records but which can't handle hundreds of thousands of records
>>>     easily and simply, then we are wasting our time.  I don't think we need to
>>>     dither about this for another five years.
>>>       I would hate to have to draw an RDF graph of that model
>>>     I would as much hate to have to draw an RDF graph of 1.7 million instances.
>>>     The point being, in order to draw a graph of how someone models a domain you
>>>     don't draw a graph of the entire RDF triple store.
>>>     That was the point I was trying to make (I think).
>>>     Thanks for the clarification, Hilmar.
>>>     Steve
>>>     -hilmar
>>>     --
>>>     ===========================================================
>>>     : Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org <http://informatics.nescent.org> :
>>>     ===========================================================
>>>     --
>>>     Steven J. Baskauf, Ph.D., Senior Lecturer
>>>     Vanderbilt University Dept. of Biological Sciences
>>>     postal mail address:
>>>     VU Station B 351634
>>>     Nashville, TN  37235-1634,  U.S.A.
>>>     delivery address:
>>>     2125 Stevenson Center
>>>     1161 21st Ave., S.
>>>     Nashville, TN 37235
>>>     office: 2128 Stevenson Center
>>>     phone: (615) 343-4582 <tel:%28615%29%20343-4582>,  fax: (615) 343-6707 <tel:%28615%29%20343-6707>
>>>     http://bioimages.vanderbilt.edu
>>>     _______________________________________________
>>>     tdwg-content mailing list
>>>     tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>>>     http://lists.tdwg.org/mailman/listinfo/tdwg-content
>     -- 
>     Steven J. Baskauf, Ph.D., Senior Lecturer
>     Vanderbilt University Dept. of Biological Sciences
>     postal mail address:
>     VU Station B 351634
>     Nashville, TN  37235-1634,  U.S.A.
>     delivery address:
>     2125 Stevenson Center
>     1161 21st Ave., S.
>     Nashville, TN 37235
>     office: 2128 Stevenson Center
>     phone: (615) 343-4582 <tel:%28615%29%20343-4582>,  fax: (615) 343-6707 <tel:%28615%29%20343-6707>
>     http://bioimages.vanderbilt.edu
>     _______________________________________________
>     tdwg-content mailing list
>     tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>     http://lists.tdwg.org/mailman/listinfo/tdwg-content
> -- 
> ------------------------------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> Email: pdevries at wisc.edu <mailto:pdevries at wisc.edu>
> TaxonConcept <http://www.taxonconcept.org/>  &  GeoSpecies 
> <http://about.geospecies.org/> Knowledge Bases
> A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
> --------------------------------------------------------------------------------------

Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110504/03e9f3fa/attachment.html 

More information about the tdwg-content mailing list