[tdwg-content] If you need something for referring to a population, then it is probably best to do it as a related class

Steve Baskauf steve.baskauf at vanderbilt.edu
Wed May 4 14:17:15 CEST 2011

Oh yeah, I forgot to say in the interest of defining acronyms used, TDWG 
stands for "Biodiversity Information Standards".  It supposedly had 
grown beyond "Taxonomic Databases Working Group" and "focuses on the 
development of standards for the exchange of biological/biodiversity 
data." (http://www.tdwg.org/about-tdwg/). ;-)

Steve Baskauf wrote:
> Thanks, Bob, for the examples.  I will try to dig my way through them. 
> I don't want to give the impression that Darwin-SW was not intended to 
> facilitate any reasoning.  That is, after all why it is called 
> "Darwin-SW" instead of "Darwin-data-markup".  I know that Cam is quite 
> interested in the "semantic" end of it, and when he has Internet again 
> I hope he will chime in on this.  I'm simply confessing what my 
> primary concern is (data markup).  When we started working on the 
> ontology, we decided to make it as simple as possible while still 
> trying to permit every (or almost every) kind of class and 
> relationship that was discussed in the Oct/Nov discussion.  The result 
> was to have a single class Occurrence whose instances are described by 
> properties, not 1.7 million classes N#occurrence and so on for the 
> other six classes in the model.  The intention was that DSW 1.0 would 
> be constructed in such a way that it could support the addition of 
> more complex components (Cam has actually marked the posted version at 
> version 0.2 which means that it is certainly subject to improvement) 
> and possibly more complex reasoning.  But the more complex stuff was 
> not put into the model at the start because we wanted something that 
> (hopefully) most people could agree represents reality reasonably well 
> (at least a TDWG form of reality since it uses the structure of DwC as 
> its basis) and hence it would actually have the possibility of being 
> used by more than two people. 
> I hope that people realize that I'm not making these comments to give 
> Pete a hard time or anything.  I really am trying to understand the 
> relative benefits and problems of modeling on class of cat with many 
> properties vs. creating a class of cats for every property we care 
> about.  Clearly Pete's interest is in Taxon Concepts in the sense that 
> he has defined them.  OK, just to set up a straw man, let's say that I 
> am interested in geography more than taxonomy.  So I define a class 
> and URI for every state and province in the world.  I have no idea how 
> many of those there are, but I'll guess 400.  Now I want to describe 
> other things in the biodiversity informatics world.  So I mint classes 
> http://baskaufgeo.org/lod/ohio#occurrence for occurrences that happen 
> in Ohio, http://baskaufgeo.org/lod/swaziland#occurrence for 
> occurrences that happen in Swaziland, 
> http://baskaufgeo.org/lod/tennessee#occurrence, 
> http://baskaufgeo.org/lod/ohio#taxon, 
> http://baskaufgeo.org/lod/swaziland#taxon, 
> http://baskaufgeo.org/lod/tennessee#taxon, etc. etc. for all 400 
> state/provinces and all seven basic types of things in the 
> biodiversity domain.  I can now do cool queries that involve geography. 
> OK, maybe I'm somebody else and I love thinking about temporal 
> relationships.  So I create 
> http://baskauf-time.org/lod/1959may#occurrence for occurrences that 
> happen in May of 1959, http://baskauf-time.org/lod/2005may#occurrence 
> for occurrences that happened in May of 2005, etc.  Given a billion or 
> so years of life on earth, that would give me about 12 billion classes 
> for each of the six other basic kinds of things I want to model.  I 
> could do all kinds of cool queries that involve time now. 
> So which one of these three ontologies are we going to adopt?  The 
> taxon based one?  The time based one?  The geography based one?  Now 
> we are not just having to chose whether to model things as a single 
> class of cats whose instance have many color and reproductiveMethod 
> properties vs. many classes of cats each defined on the basis of its 
> color.  We must decide whether it's better to have many classes of 
> colors each defined by the kind of animal that has that color, or many 
> kinds of reproductive systems, each with different kinds of animals, 
> etc.  Where is it going to end and how could we agree on which system 
> to use?  It seems to me that it would be better to have a class of 
> cats, a class of reproductive systems, etc. and connect their 
> instances with properties. 
> Am I somehow thinking about this incorrectly?
> Steve
> Bob Morris wrote:
>> See, for example,
>> Mungall et al., “Integrating phenotype ontologies across multiple
>> species”, Genome
>> Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2)
>> Ward Blondé et al.  "Reasoning with bio-ontologies: using relational
>> closure rules to enable practical querying", Bioinformatics (2011)
>> doi: 10.1093/bioinformatics/btr164
>> Calder, et al. "Machine Reasoning about Anomalous Sensor Data"
>> http://dx.doi.org/10.1016/j.ecoinf.2009.08.007 or in manuscript form
>> at http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf
>> ...
>> OK, so  maybe these knowledge domains are all hypothesis-driven
>> sciences (i.e.,  sciences), and <whatever dsw is modelling> is not.
>> But that would be sad.
>> Bob
>> p.s. I had almost finished something else on this thread when Hilmar
>> beat me to the punch. But here's a slightly different expression of
>> his point:
>> It turns out that the differences between instances and classes is
>> mainly important in contexts in which you have declaimed interest,
>> namely reasoning.  In the RDF/RDFS/OWL stack, enforcing a distinction
>> between classes and instances only occurs pretty high up in the stack,
>> when one desires an OWL variant that will offer guarantees that
>> reasoners will finish any inference they are asked to verify,
>> preferably in less than exponential time . I guess, but am not
>> certain, that even in an LOD context, if data are described with an
>> OWL ontology that is known to be intractable, e.g. not in OWL DL, that
>> it is possible to design SPARQL queries that will never complete. In
>> fact, I believe that even with tractable ontologies, there are SPARQL
>> queries that are fundamentally exponential in the number of variables.
>> p.p.s. Irrelevant, but equivalent, aside about mathematics. At the
>> turn of the 20th century, Whitehead and Russell tried (and failed) to
>> show that everything about numbers could be logically derived from an
>> axiomatic description of the natural numbers (i.e. non-negative
>> integers). It was later shown to be the case that you must include in
>> your logical foundations something deeper, namely the ability to have
>> sets that are elements of other sets (roughly, classes that are
>> individuals in other classes.).  Without this, and starting only with
>> the natural numbers, you can logically derive all rational numbers
>> (fractions) and their arithmetic properties, and even all the
>> irrational numbers that are are the solutions of polynomial equations
>> with integer coefficients ("algebraic numbers") such as sqrt(2), and
>> even solutions of the polynomials that have coefficients that are
>> algebraic numbers.  But without introducing the notion of the set of
>> subsets of a set, you cannot logically derive the all the interesting
>> transcendental numbers (i.e. those which are not the roots of
>> polynomials), such as e and pi.  So if you love calculus, you better
>> not insist on  distinguishing instances from classes. But if you are
>> content with polynomials, you can probably be ontologically sloppy.
>> Or, if you don't care about logical foundations of your science, you
>> can forget about the whole thing. :-)
>> On Tue, May 3, 2011 at 11:51 PM, Steve Baskauf
>> <steve.baskauf at vanderbilt.edu> wrote:
>>> [snip]
>>> OK, so let's imagine that we mark up several million records of specimens,
>>> tissue samples, and images as RDF.  (We don't have to imagine very hard, I
>>> think the BiSciCol group is planning to actually do this within the next
>>> several months.)  I would really like to hear from some of the people who
>>> actually use "DL reasoners" (a group which certainly does not include me) to
>>> know what it is that we could actually find out that would be useful about
>>> that big data blob using reasoners.  I have already confessed that my
>>> primary concern is enabling data discovery, transfer, and aggregation using
>>> GUIDs and RDF.  I'm still somewhat of a "semantic web" skeptic as far as the
>>> whole inferencing thing is concerned.  Aside from inferring "duplicates",
>>> I'm really wanting to know what else there is useful that could be reasoned
>>> outside of the Taxon/TaxonConcept class.  (I can imaging useful reasoning
>>> being done about things in that class like the relationships among names,
>>> concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15
>>> article https://journals.ku.edu/index.php/jbi/article/view/25)  I think this
>>> (data markup priority vs. inferencing priority) is an important discussion
>>> to have before the tdwg community can settle on some kind of consensus way
>>> of turning database records into RDF, particularly if it is going to have a
>>> big influence on the way the RDF model is set up.  To me, there is a clear
>>> and immediate need to be able to mark data up in a straightforward way.  If
>>> we can get the semantic part, too, that would be great but not at the
>>> expense of data markup.  I just was at a meeting of a bunch of herbarium
>>> curators.  They desperately need a way to implement GUIDs and aggregate data
>>> and they need it now.  I really don't think they care one whit about
>>> inferencing.  If we coalesce on a model that is great for doing cool things
>>> with 10 records but which can't handle hundreds of thousands of records
>>> easily and simply, then we are wasting our time.  I don't think we need to
>>> dither about this for another five years.
>>>   I would hate to have to draw an RDF graph of that model
>>> I would as much hate to have to draw an RDF graph of 1.7 million instances.
>>> The point being, in order to draw a graph of how someone models a domain you
>>> don't draw a graph of the entire RDF triple store.
>>> That was the point I was trying to make (I think).
>>> Thanks for the clarification, Hilmar.
>>> Steve
>>> -hilmar
>>> --
>>> ===========================================================
>>> : Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
>>> ===========================================================
>>> --
>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>> Vanderbilt University Dept. of Biological Sciences
>>> postal mail address:
>>> VU Station B 351634
>>> Nashville, TN  37235-1634,  U.S.A.
>>> delivery address:
>>> 2125 Stevenson Center
>>> 1161 21st Ave., S.
>>> Nashville, TN 37235
>>> office: 2128 Stevenson Center
>>> phone: (615) 343-4582,  fax: (615) 343-6707
>>> http://bioimages.vanderbilt.edu
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
> -- 
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
> postal mail address:
> VU Station B 351634
> Nashville, TN  37235-1634,  U.S.A.
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
> office: 2128 Stevenson Center
> phone: (615) 343-4582,  fax: (615) 343-6707
> http://bioimages.vanderbilt.edu

Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110504/ba5a4617/attachment.html 

More information about the tdwg-content mailing list