[tdwg-content] If you need something for referring to a population, then it is probably best to do it as a related class
Steve Baskauf
steve.baskauf at vanderbilt.edu
Wed May 4 16:14:53 CEST 2011
OK, I think that I have already probably said more than people want to
hear on this subject. So I will stop with this:
1. It does not appear that there is anything "wrong" with the
taxonconcept.org from a technical standpoint. It does what Pete wants
it to do and that is very cool.
2. I believe that there are aspects of the taxonconcept.org (introduced
for convenience in querying) that make it much more complicated than I
think are necessary to represent the core conceptual entities in the
biodiversity informatics community. I believe (for reasons articulated
previously) that some of those complexities may introduce problems in a
distributed system where people of different institutions are linking to
each other's URIs.
3. I believe that the way that taxonconcept.org conceptualizes some of
these core entities is not congruent with the most common opinions that
I have heard expressed on this list. Note that I am not saying that the
taxonconcept.org conceptualization is "wrong". I am saying that in some
ways it differs significantly from what I perceive to be the community
consensus. On the issue of the representation of taxa and names I am
going to have to defer to the opinion of others (and there is no
shortage of people on the list who are experts on this subject).
However, I will say that if one says:
> And get results free of all inappropriate identifications.
>
> Do you want the misidentifications showing up in these species lists?
>
> How would a general user correctly determine which of these
> identifications are correct?
who is going to be the judge of "correct"? I don't want to be around
when that cat fight erupts. I do think there ought to be some way that
a determiner can indicate that they may have made a mistake on their own
Identification. But I think multiple Identifications better be multiple
opinions or else there will never be a system that will be supported by
diverse participants.
Steve
>
> On Wed, May 4, 2011 at 6:04 AM, Steve Baskauf
> <steve.baskauf at vanderbilt.edu <mailto:steve.baskauf at vanderbilt.edu>>
> wrote:
>
> Thanks, Bob, for the examples. I will try to dig my way through
> them.
>
> I don't want to give the impression that Darwin-SW was not
> intended to facilitate any reasoning. That is, after all why it
> is called "Darwin-SW" instead of "Darwin-data-markup". I know
> that Cam is quite interested in the "semantic" end of it, and when
> he has Internet again I hope he will chime in on this. I'm simply
> confessing what my primary concern is (data markup). When we
> started working on the ontology, we decided to make it as simple
> as possible while still trying to permit every (or almost every)
> kind of class and relationship that was discussed in the Oct/Nov
> discussion. The result was to have a single class Occurrence
> whose instances are described by properties, not 1.7 million
> classes N#occurrence and so on for the other six classes in the
> model. The intention was that DSW 1.0 would be constructed in
> such a way that it could support the addition of more complex
> components (Cam has actually marked the posted version at version
> 0.2 which means that it is certainly subject to improvement) and
> possibly more complex reasoning. But the more complex stuff was
> not put into the model at the start because we wanted something
> that (hopefully) most people could agree represents reality
> reasonably well (at least a TDWG form of reality since it uses the
> structure of DwC as its basis) and hence it would actually have
> the possibility of being used by more than two people.
>
> I hope that people realize that I'm not making these comments to
> give Pete a hard time or anything. I really am trying to
> understand the relative benefits and problems of modeling on class
> of cat with many properties vs. creating a class of cats for every
> property we care about. Clearly Pete's interest is in Taxon
> Concepts in the sense that he has defined them. OK, just to set
> up a straw man, let's say that I am interested in geography more
> than taxonomy. So I define a class and URI for every state and
> province in the world. I have no idea how many of those there
> are, but I'll guess 400. Now I want to describe other things in
> the biodiversity informatics world. So I mint classes
> http://baskaufgeo.org/lod/ohio#occurrence for occurrences that
> happen in Ohio, http://baskaufgeo.org/lod/swaziland#occurrence for
> occurrences that happen in Swaziland,
> http://baskaufgeo.org/lod/tennessee#occurrence,
> http://baskaufgeo.org/lod/ohio#taxon,
> http://baskaufgeo.org/lod/swaziland#taxon,
> http://baskaufgeo.org/lod/tennessee#taxon, etc. etc. for all 400
> state/provinces and all seven basic types of things in the
> biodiversity domain. I can now do cool queries that involve
> geography.
>
> OK, maybe I'm somebody else and I love thinking about temporal
> relationships. So I create
> http://baskauf-time.org/lod/1959may#occurrence for occurrences
> that happen in May of 1959,
> http://baskauf-time.org/lod/2005may#occurrence for occurrences
> that happened in May of 2005, etc. Given a billion or so years of
> life on earth, that would give me about 12 billion classes for
> each of the six other basic kinds of things I want to model. I
> could do all kinds of cool queries that involve time now.
>
> So which one of these three ontologies are we going to adopt? The
> taxon based one? The time based one? The geography based one?
> Now we are not just having to chose whether to model things as a
> single class of cats whose instance have many color and
> reproductiveMethod properties vs. many classes of cats each
> defined on the basis of its color. We must decide whether it's
> better to have many classes of colors each defined by the kind of
> animal that has that color, or many kinds of reproductive systems,
> each with different kinds of animals, etc. Where is it going to
> end and how could we agree on which system to use? It seems to me
> that it would be better to have a class of cats, a class of
> reproductive systems, etc. and connect their instances with
> properties.
>
> Am I somehow thinking about this incorrectly?
> Steve
>
>
> Bob Morris wrote:
>> See, for example,
>>
>> Mungall et al., “Integrating phenotype ontologies across multiple
>> species”, Genome
>> Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2)
>>
>> Ward Blondé et al. "Reasoning with bio-ontologies: using relational
>> closure rules to enable practical querying", Bioinformatics (2011)
>> doi: 10.1093/bioinformatics/btr164
>>
>> Calder, et al. "Machine Reasoning about Anomalous Sensor Data"
>> http://dx.doi.org/10.1016/j.ecoinf.2009.08.007 or in manuscript form
>> at http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf
>>
>> ...
>>
>> OK, so maybe these knowledge domains are all hypothesis-driven
>> sciences (i.e., sciences), and <whatever dsw is modelling> is not.
>> But that would be sad.
>>
>> Bob
>> p.s. I had almost finished something else on this thread when Hilmar
>> beat me to the punch. But here's a slightly different expression of
>> his point:
>>
>> It turns out that the differences between instances and classes is
>> mainly important in contexts in which you have declaimed interest,
>> namely reasoning. In the RDF/RDFS/OWL stack, enforcing a distinction
>> between classes and instances only occurs pretty high up in the stack,
>> when one desires an OWL variant that will offer guarantees that
>> reasoners will finish any inference they are asked to verify,
>> preferably in less than exponential time . I guess, but am not
>> certain, that even in an LOD context, if data are described with an
>> OWL ontology that is known to be intractable, e.g. not in OWL DL, that
>> it is possible to design SPARQL queries that will never complete. In
>> fact, I believe that even with tractable ontologies, there are SPARQL
>> queries that are fundamentally exponential in the number of variables.
>>
>> p.p.s. Irrelevant, but equivalent, aside about mathematics. At the
>> turn of the 20th century, Whitehead and Russell tried (and failed) to
>> show that everything about numbers could be logically derived from an
>> axiomatic description of the natural numbers (i.e. non-negative
>> integers). It was later shown to be the case that you must include in
>> your logical foundations something deeper, namely the ability to have
>> sets that are elements of other sets (roughly, classes that are
>> individuals in other classes.). Without this, and starting only with
>> the natural numbers, you can logically derive all rational numbers
>> (fractions) and their arithmetic properties, and even all the
>> irrational numbers that are are the solutions of polynomial equations
>> with integer coefficients ("algebraic numbers") such as sqrt(2), and
>> even solutions of the polynomials that have coefficients that are
>> algebraic numbers. But without introducing the notion of the set of
>> subsets of a set, you cannot logically derive the all the interesting
>> transcendental numbers (i.e. those which are not the roots of
>> polynomials), such as e and pi. So if you love calculus, you better
>> not insist on distinguishing instances from classes. But if you are
>> content with polynomials, you can probably be ontologically sloppy.
>> Or, if you don't care about logical foundations of your science, you
>> can forget about the whole thing. :-)
>>
>>
>>
>>
>>
>>
>> On Tue, May 3, 2011 at 11:51 PM, Steve Baskauf
>> <steve.baskauf at vanderbilt.edu> <mailto:steve.baskauf at vanderbilt.edu> wrote:
>>
>>> [snip]
>>> OK, so let's imagine that we mark up several million records of specimens,
>>> tissue samples, and images as RDF. (We don't have to imagine very hard, I
>>> think the BiSciCol group is planning to actually do this within the next
>>> several months.) I would really like to hear from some of the people who
>>> actually use "DL reasoners" (a group which certainly does not include me) to
>>> know what it is that we could actually find out that would be useful about
>>> that big data blob using reasoners. I have already confessed that my
>>> primary concern is enabling data discovery, transfer, and aggregation using
>>> GUIDs and RDF. I'm still somewhat of a "semantic web" skeptic as far as the
>>> whole inferencing thing is concerned. Aside from inferring "duplicates",
>>> I'm really wanting to know what else there is useful that could be reasoned
>>> outside of the Taxon/TaxonConcept class. (I can imaging useful reasoning
>>> being done about things in that class like the relationships among names,
>>> concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15
>>> article https://journals.ku.edu/index.php/jbi/article/view/25) I think this
>>> (data markup priority vs. inferencing priority) is an important discussion
>>> to have before the tdwg community can settle on some kind of consensus way
>>> of turning database records into RDF, particularly if it is going to have a
>>> big influence on the way the RDF model is set up. To me, there is a clear
>>> and immediate need to be able to mark data up in a straightforward way. If
>>> we can get the semantic part, too, that would be great but not at the
>>> expense of data markup. I just was at a meeting of a bunch of herbarium
>>> curators. They desperately need a way to implement GUIDs and aggregate data
>>> and they need it now. I really don't think they care one whit about
>>> inferencing. If we coalesce on a model that is great for doing cool things
>>> with 10 records but which can't handle hundreds of thousands of records
>>> easily and simply, then we are wasting our time. I don't think we need to
>>> dither about this for another five years.
>>>
>>> I would hate to have to draw an RDF graph of that model
>>>
>>> I would as much hate to have to draw an RDF graph of 1.7 million instances.
>>> The point being, in order to draw a graph of how someone models a domain you
>>> don't draw a graph of the entire RDF triple store.
>>>
>>> That was the point I was trying to make (I think).
>>>
>>> Thanks for the clarification, Hilmar.
>>> Steve
>>>
>>> -hilmar
>>> --
>>> ===========================================================
>>> : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org <http://informatics.nescent.org> :
>>> ===========================================================
>>>
>>>
>>>
>>> --
>>> Steven J. Baskauf, Ph.D., Senior Lecturer
>>> Vanderbilt University Dept. of Biological Sciences
>>>
>>> postal mail address:
>>> VU Station B 351634
>>> Nashville, TN 37235-1634, U.S.A.
>>>
>>> delivery address:
>>> 2125 Stevenson Center
>>> 1161 21st Ave., S.
>>> Nashville, TN 37235
>>>
>>> office: 2128 Stevenson Center
>>> phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 343-6707 <tel:%28615%29%20343-6707>
>>> http://bioimages.vanderbilt.edu
>>>
>>> _______________________________________________
>>> tdwg-content mailing list
>>> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
>>> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>>>
>>>
>>>
>>
>
> --
> Steven J. Baskauf, Ph.D., Senior Lecturer
> Vanderbilt University Dept. of Biological Sciences
>
> postal mail address:
> VU Station B 351634
> Nashville, TN 37235-1634, U.S.A.
>
> delivery address:
> 2125 Stevenson Center
> 1161 21st Ave., S.
> Nashville, TN 37235
>
> office: 2128 Stevenson Center
> phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 343-6707 <tel:%28615%29%20343-6707>
> http://bioimages.vanderbilt.edu
>
>
>
> _______________________________________________
> tdwg-content mailing list
> tdwg-content at lists.tdwg.org <mailto:tdwg-content at lists.tdwg.org>
> http://lists.tdwg.org/mailman/listinfo/tdwg-content
>
>
>
>
> --
> ------------------------------------------------------------------------------------
> Pete DeVries
> Department of Entomology
> University of Wisconsin - Madison
> 445 Russell Laboratories
> 1630 Linden Drive
> Madison, WI 53706
> Email: pdevries at wisc.edu <mailto:pdevries at wisc.edu>
> TaxonConcept <http://www.taxonconcept.org/> & GeoSpecies
> <http://about.geospecies.org/> Knowledge Bases
> A Semantic Web, Linked Open Data <http://linkeddata.org/> Project
> --------------------------------------------------------------------------------------
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110504/03e9f3fa/attachment.html
More information about the tdwg-content
mailing list