Re: [tdwg-content] If you need something for referring to a population, then it is probably best to do it as a related class

4 May 2011

      Steve,

You are making this sound as if the alternative identifications are
discarded, they are not.

For those that are interested in the identification history they can look at
the information from the level of the individual or below.

If an individual was misidentified then the occurrence record can be updated
to link to the corrected species concept.

If someone wants to create there own alternative set of identifications they
can freely do so and those will be linked to the other data so people can
choose.

There are a number of ways that alternative identifications could be
handled.

For instance, lets say that TaxonomistA and TaxonomistB never agree on an
identification.

These could be separated by the use of different predicates.

txn:occurrenceHasSpeciesConcept => Concept_A_URI
bioimages:occurrenceHasSpeciesConcept => Concept_B_URI

Now these do not conflict.

In your arboretum are the tree's labeled with all the scientific names and
concepts, including the incorrect ones, or just one?

If they were wouldn't visiting children and congressmen ask *so which one is
it?*

Respectfully,

- Pete

On Wed, May 4, 2011 at 9:14 AM, Steve Baskauf
<steve.baskauf@vanderbilt.edu>wrote:
...
OK, I think that I have already probably said more than people want to
hear on this subject.  So I will stop with this:
1. It does not appear that there is anything "wrong" with the
taxonconcept.org from a technical standpoint.  It does what Pete wants it
to do and that is very cool.
2. I believe that there are aspects of the taxonconcept.org (introduced
for convenience in querying) that make it much more complicated than I think
are necessary to represent the core conceptual entities in the biodiversity
informatics community.  I believe (for reasons articulated previously) that
some of those complexities may introduce problems in a distributed system
where people of different institutions are linking to each other's URIs.
3. I believe that the way that taxonconcept.org conceptualizes some of
these core entities is not congruent with the most common opinions that I
have heard expressed on this list.  Note that I am not saying that the
taxonconcept.org conceptualization is "wrong".  I am saying that in some
ways it differs significantly from what I perceive to be the community
consensus.  On the issue of the representation of taxa and names I am going
to have to defer to the opinion of others (and there is no shortage of
people on the list who are experts on this subject).  However, I will say
that if one says:
And get results free of all inappropriate identifications.
Do you want the misidentifications showing up in these species lists?
How would a general user correctly determine which of these
identifications are correct?
who is going to be the judge of "correct"?  I don't want to be around when
that cat fight erupts.  I do think there ought to be some way that a
determiner can indicate that they may have made a mistake on their own
Identification.  But I think multiple Identifications better be multiple
opinions or else there will never be a system that will be supported by
diverse participants.
Steve
On Wed, May 4, 2011 at 6:04 AM, Steve Baskauf <
steve.baskauf@vanderbilt.edu> wrote:
...
Thanks, Bob, for the examples.  I will try to dig my way through them.
I don't want to give the impression that Darwin-SW was not intended to
facilitate any reasoning.  That is, after all why it is called "Darwin-SW"
instead of "Darwin-data-markup".  I know that Cam is quite interested in the
"semantic" end of it, and when he has Internet again I hope he will chime in
on this.  I'm simply confessing what my primary concern is (data markup).
When we started working on the ontology, we decided to make it as simple as
possible while still trying to permit every (or almost every) kind of class
and relationship that was discussed in the Oct/Nov discussion.  The result
was to have a single class Occurrence whose instances are described by
properties, not 1.7 million classes N#occurrence and so on for the other six
classes in the model.  The intention was that DSW 1.0 would be constructed
in such a way that it could support the addition of more complex components
(Cam has actually marked the posted version at version 0.2 which means that
it is certainly subject to improvement) and possibly more complex
reasoning.  But the more complex stuff was not put into the model at the
start because we wanted something that (hopefully) most people could agree
represents reality reasonably well (at least a TDWG form of reality since it
uses the structure of DwC as its basis) and hence it would actually have the
possibility of being used by more than two people.
I hope that people realize that I'm not making these comments to give Pete
a hard time or anything.  I really am trying to understand the relative
benefits and problems of modeling on class of cat with many properties vs.
creating a class of cats for every property we care about.  Clearly Pete's
interest is in Taxon Concepts in the sense that he has defined them.  OK,
just to set up a straw man, let's say that I am interested in geography more
than taxonomy.  So I define a class and URI for every state and province in
the world.  I have no idea how many of those there are, but I'll guess 400.
Now I want to describe other things in the biodiversity informatics world.
So I mint classes http://baskaufgeo.org/lod/ohio#occurrence for
occurrences that happen in Ohio,
http://baskaufgeo.org/lod/swaziland#occurrence for occurrences that
happen in Swaziland, http://baskaufgeo.org/lod/tennessee#occurrence,
http://baskaufgeo.org/lod/ohio#taxon,
http://baskaufgeo.org/lod/swaziland#taxon,
http://baskaufgeo.org/lod/tennessee#taxon, etc. etc. for all 400
state/provinces and all seven basic types of things in the biodiversity
domain.  I can now do cool queries that involve geography.
OK, maybe I'm somebody else and I love thinking about temporal
relationships.  So I create
http://baskauf-time.org/lod/1959may#occurrence for occurrences that
happen in May of 1959, http://baskauf-time.org/lod/2005may#occurrence for
occurrences that happened in May of 2005, etc.  Given a billion or so years
of life on earth, that would give me about 12 billion classes for each of
the six other basic kinds of things I want to model.  I could do all kinds
of cool queries that involve time now.
So which one of these three ontologies are we going to adopt?  The taxon
based one?  The time based one?  The geography based one?  Now we are not
just having to chose whether to model things as a single class of cats whose
instance have many color and reproductiveMethod properties vs. many classes
of cats each defined on the basis of its color.  We must decide whether it's
better to have many classes of colors each defined by the kind of animal
that has that color, or many kinds of reproductive systems, each with
different kinds of animals, etc.  Where is it going to end and how could we
agree on which system to use?  It seems to me that it would be better to
have a class of cats, a class of reproductive systems, etc. and connect
their instances with properties.
Am I somehow thinking about this incorrectly?
Steve
Bob Morris wrote:
See, for example,
Mungall et al., “Integrating phenotype ontologies across multiple
species”, Genome
Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2)
Ward Blondé et al.  "Reasoning with bio-ontologies: using relational
closure rules to enable practical querying", Bioinformatics (2011)
doi: 10.1093/bioinformatics/btr164
Calder, et al. "Machine Reasoning about Anomalous Sensor Data"http://dx.doi.org/10.1016/j.ecoinf.2009.08.007 or in manuscript form
at http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf
...
OK, so  maybe these knowledge domains are all hypothesis-driven
sciences (i.e.,  sciences), and <whatever dsw is modelling> is not.
But that would be sad.
Bob
p.s. I had almost finished something else on this thread when Hilmar
beat me to the punch. But here's a slightly different expression of
his point:
It turns out that the differences between instances and classes is
mainly important in contexts in which you have declaimed interest,
namely reasoning.  In the RDF/RDFS/OWL stack, enforcing a distinction
between classes and instances only occurs pretty high up in the stack,
when one desires an OWL variant that will offer guarantees that
reasoners will finish any inference they are asked to verify,
preferably in less than exponential time . I guess, but am not
certain, that even in an LOD context, if data are described with an
OWL ontology that is known to be intractable, e.g. not in OWL DL, that
it is possible to design SPARQL queries that will never complete. In
fact, I believe that even with tractable ontologies, there are SPARQL
queries that are fundamentally exponential in the number of variables.
p.p.s. Irrelevant, but equivalent, aside about mathematics. At the
turn of the 20th century, Whitehead and Russell tried (and failed) to
show that everything about numbers could be logically derived from an
axiomatic description of the natural numbers (i.e. non-negative
integers). It was later shown to be the case that you must include in
your logical foundations something deeper, namely the ability to have
sets that are elements of other sets (roughly, classes that are
individuals in other classes.).  Without this, and starting only with
the natural numbers, you can logically derive all rational numbers
(fractions) and their arithmetic properties, and even all the
irrational numbers that are are the solutions of polynomial equations
with integer coefficients ("algebraic numbers") such as sqrt(2), and
even solutions of the polynomials that have coefficients that are
algebraic numbers.  But without introducing the notion of the set of
subsets of a set, you cannot logically derive the all the interesting
transcendental numbers (i.e. those which are not the roots of
polynomials), such as e and pi.  So if you love calculus, you better
not insist on  distinguishing instances from classes. But if you are
content with polynomials, you can probably be ontologically sloppy.
Or, if you don't care about logical foundations of your science, you
can forget about the whole thing. :-)
On Tue, May 3, 2011 at 11:51 PM, Steve Baskauf<steve.baskauf@vanderbilt.edu> <steve.baskauf@vanderbilt.edu> wrote:
[snip]
OK, so let's imagine that we mark up several million records of specimens,
tissue samples, and images as RDF.  (We don't have to imagine very hard, I
think the BiSciCol group is planning to actually do this within the next
several months.)  I would really like to hear from some of the people who
actually use "DL reasoners" (a group which certainly does not include me) to
know what it is that we could actually find out that would be useful about
that big data blob using reasoners.  I have already confessed that my
primary concern is enabling data discovery, transfer, and aggregation using
GUIDs and RDF.  I'm still somewhat of a "semantic web" skeptic as far as the
whole inferencing thing is concerned.  Aside from inferring "duplicates",
I'm really wanting to know what else there is useful that could be reasoned
outside of the Taxon/TaxonConcept class.  (I can imaging useful reasoning
being done about things in that class like the relationships among names,
concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15
article https://journals.ku.edu/index.php/jbi/article/view/25)  I think this
(data markup priority vs. inferencing priority) is an important discussion
to have before the tdwg community can settle on some kind of consensus way
of turning database records into RDF, particularly if it is going to have a
big influence on the way the RDF model is set up.  To me, there is a clear
and immediate need to be able to mark data up in a straightforward way.  If
we can get the semantic part, too, that would be great but not at the
expense of data markup.  I just was at a meeting of a bunch of herbarium
curators.  They desperately need a way to implement GUIDs and aggregate data
and they need it now.  I really don't think they care one whit about
inferencing.  If we coalesce on a model that is great for doing cool things
with 10 records but which can't handle hundreds of thousands of records
easily and simply, then we are wasting our time.  I don't think we need to
dither about this for another five years.
I would hate to have to draw an RDF graph of that model
I would as much hate to have to draw an RDF graph of 1.7 million instances.
The point being, in order to draw a graph of how someone models a domain you
don't draw a graph of the entire RDF triple store.
That was the point I was trying to make (I think).
Thanks for the clarification, Hilmar.
Steve
-hilmar
--
===========================================================
: Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
===========================================================
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707http://bioimages.vanderbilt.edu
_______________________________________________
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707http://bioimages.vanderbilt.edu
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept <http://www.taxonconcept.org/>  &  GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
--------------------------------------------------------------------------------------
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707http://bioimages.vanderbilt.edu
-- 
------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept <http://www.taxonconcept.org/>  &
GeoSpecies<http://about.geospecies.org/> Knowledge
Bases
A Semantic Web, Linked Open Data <http://linkeddata.org/>  Project
--------------------------------------------------------------------------------------

Re: [tdwg-content] If you need something for referring to a population, then it is probably best to do it as a related class

Peter DeVries