Hi Steve, Bob and Hilmar,<div><br></div><div>It might be helpful to think of it this way.</div><div><br></div><div>This species concept <a href="http://lod.taxonconcept.org/ses/mCcSp#Species" target="_blank">http://lod.taxonconcept.org/ses/mCcSp#Species</a><br>

<br></div><div>is <b>both</b> an instance of txn:SpeciesConcept and an owl:Class</div><div><br></div><div>The occurrence record <a href="http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurrence" target="_blank">http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurrence</a></div>

<div><br></div><div><a href="http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurrence" target="_blank"></a>is <b>both</b> an instance of txn:Occurrence and an instance of <a href="http://lod.taxonconcept.org/ses/ICmLC#Occurrence" target="_blank">http://lod.taxonconcept.org/ses/ICmLC#Occurrence</a></div>

<div><br></div><div>This identification record has links back to the species concept, occurrence and individual</div><div><br></div><div>&lt; <a href="http://lsd.taxonconcept.org/describe/?url=http://ocs.taxonconcept.org/ocs/1de0579b-086f-456a-8dbe-89f32dfbee68%23Identification_0001" target="_blank">http://lsd.taxonconcept.org/describe/?url=http://ocs.taxonconcept.org/ocs/1de0579b-086f-456a-8dbe-89f32dfbee68%23Identification_0001</a> &gt; <a href="http://bit.ly" target="_blank">bit.ly</a> <a href="http://bit.ly/jtLgNu" target="_blank">http://bit.ly/jtLgNu</a></div>

<div><br></div><div>My reasoning behind the current structure is that you want to be able to easily query for: </div><div><br></div><div><b>Occurrences at at the TDWG BioBlitz</b></div><div><br></div>

<div><div><font face="&#39;courier new&#39;, monospace">PREFIX txn:     &lt;<a href="http://lod.taxonconcept.org/ontology/txn.owl#" target="_blank">http://lod.taxonconcept.org/ontology/txn.owl#</a>&gt;</font></div>

<div><font face="&#39;courier new&#39;, monospace">PREFIX rdf:     &lt;<a href="http://www.w3.org/1999/02/22-rdf-syntax-ns#" target="_blank">http://www.w3.org/1999/02/22-rdf-syntax-ns#</a>&gt;</font></div><div><font face="&#39;courier new&#39;, monospace">PREFIX dcterms: &lt;<a href="http://purl.org/dc/terms/" target="_blank">http://purl.org/dc/terms/</a>&gt;</font></div>

<div><font face="&#39;courier new&#39;, monospace">PREFIX foaf:    &lt;<a href="http://xmlns.com/foaf/0.1/" target="_blank">http://xmlns.com/foaf/0.1/</a>&gt;</font></div><div><font face="&#39;courier new&#39;, monospace"><br>

</font></div><div><font face="&#39;courier new&#39;, monospace"><br></font></div><div><font face="&#39;courier new&#39;, monospace">select distinct ?s, ?o as ?image,  ?kingdom, ?phylum, ?class, ?order, ?family, ?genus, ?sciname, ?cname where {</font></div>

<div><font face="&#39;courier new&#39;, monospace"> ?s rdf:type txn:Occurrence.</font></div><div><font face="&#39;courier new&#39;, monospace"> ?s dcterms:isPartOf &lt;<a href="http://lod.taxonconcept.org/ontology/txn.owl#TDWG2010_BioBlitz" target="_blank">http://lod.taxonconcept.org/ontology/txn.owl#TDWG2010_BioBlitz</a>&gt;.</font></div>

<div><font face="&#39;courier new&#39;, monospace"> ?s txn:kingdom ?kingdom.</font></div><div><font face="&#39;courier new&#39;, monospace"> ?s txn:phylum  ?phylum.</font></div>

<div><font face="&#39;courier new&#39;, monospace"> ?s txn:class   ?class.</font></div><div><font face="&#39;courier new&#39;, monospace"> ?s txn:order   ?order.</font></div>

<div><font face="&#39;courier new&#39;, monospace"> ?s txn:family  ?family.</font></div><div><font face="&#39;courier new&#39;, monospace"> ?s txn:genus   ?genus.</font></div>

<div><font face="&#39;courier new&#39;, monospace"> ?s txn:hasScientificName   ?sciname.</font></div><div><font face="&#39;courier new&#39;, monospace"> optional {?s  foaf:depiction ?o.</font></div>

<div><font face="&#39;courier new&#39;, monospace">           ?s  txn:commonName ?cname}.</font></div><div><font face="&#39;courier new&#39;, monospace"> }</font></div></div>

<div><br></div><div><br></div><div>Run This Query:  <a href="http://bit.ly/kZ8C1Q" target="_blank">bit.ly/kZ8C1Q</a></div><div><br></div><div><b>Species expected in Massachusetts</b></div><div><b><br>

</b></div><div><div><font face="&#39;courier new&#39;, monospace">PREFIX txn:     &lt;<a href="http://lod.taxonconcept.org/ontology/txn.owl#" target="_blank">http://lod.taxonconcept.org/ontology/txn.owl#</a>&gt;</font></div>

<div><font face="&#39;courier new&#39;, monospace">PREFIX rdf:     &lt;<a href="http://www.w3.org/1999/02/22-rdf-syntax-ns#" target="_blank">http://www.w3.org/1999/02/22-rdf-syntax-ns#</a>&gt;</font></div><div><font face="&#39;courier new&#39;, monospace">PREFIX dcterms: &lt;<a href="http://purl.org/dc/terms/" target="_blank">http://purl.org/dc/terms/</a>&gt;</font></div>

<div><font face="&#39;courier new&#39;, monospace">PREFIX massachusetts: &lt;<a href="http://sws.geonames.org/6254926/" target="_blank">http://sws.geonames.org/6254926/</a>&gt;</font></div><div><font face="&#39;courier new&#39;, monospace"><br>

</font></div><div><font face="&#39;courier new&#39;, monospace"><br></font></div><div><font face="&#39;courier new&#39;, monospace">select distinct ?s,  ?sciname, ?cname,  ?concept where {</font></div>

<div><font face="&#39;courier new&#39;, monospace">  ?s rdf:type txn:SpeciesConcept.</font></div><div><font face="&#39;courier new&#39;, monospace">  ?s txn:isExpectedIn massachusetts:.</font></div>

<div><font face="&#39;courier new&#39;, monospace">  ?s txn:hasScientificName   ?sciname.</font></div><div><font face="&#39;courier new&#39;, monospace">  ?s dcterms:identifier ?concept.</font></div>

<div><font face="&#39;courier new&#39;, monospace">  optional {?s  txn:commonName ?cname}.</font></div><div><font face="&#39;courier new&#39;, monospace">  }</font></div><div style="font-weight:bold">

<br></div></div><meta charset="utf-8"><div>Run This Query  <a href="http://bit.ly/mt3Tsx" target="_blank">http://bit.ly/mt3Tsx</a></div><div><br></div><div><br></div><div>And get results free of all inappropriate identifications. </div>

<div><br></div><div>Do you want the misidentifications showing up in these species lists?</div><div><br></div><div>How would a general user correctly determine which of these identifications are correct?</div><div><br></div>

<div>Those that are interested in looking at the identification history of a particular specimen can do so,</div><div><br></div><div>It would also be possible to create your own identification RDF file and apply that to the data set.</div>

<div><br></div><div>As to what vocabulary to use, I think it is best to use what exists as long as it works properly.</div><div><br></div><div>With weighting based on how many other data sets use those same URI&#39;s</div>

<div><br></div><div>I use Geonames for locations and DBpedia for Taxonomic Authors.  (I also link to a lot of similar related data sets either through URI&#39;s or their ID like ITIS.)</div><div><br></div><div>I wish the BHL would expose URI&#39;s for publications and GBIF would expose URI&#39;s for specimens - especially type specimens.</div>

<div><br></div><div>There are efforts to document the well known LOD vocabularies and work out interoperability issues. Here is a sample.</div><div><br></div><div><a href="http://www4.wiwiss.fu-berlin.de/lodcloud/state/">http://www4.wiwiss.fu-berlin.de/lodcloud/state/</a>  &lt;= Lists best practices and what data sets seem to be following them. GeoSpecies ~ TaxonConcept<br>

</div><div><br></div><div><a href="http://labs.mondeca.com/dataset/lov/index.html">http://labs.mondeca.com/dataset/lov/index.html</a><br></div><div><br></div><div>I am still thinking about how to handle multiple classifications. </div>

<div><br></div><div>The current thinking has been to markup the different hierarchies with things like #Taxonomy and #NCBI_Taxonomy.</div><div><br></div><div>If someone then chooses to tie their version of the TaxonConcept Species concepts to a specific hierarchy they can create a sameAs mapping file that makes</div>

<div><br></div><div>ID#Taxonomy owl:sameAs ID#Species</div><div><br></div><div>I would also like to use URI&#39;s for the different clades like I have with GeoSpecies but that will take some work.</div><div><br></div><div>

Another option would be to do something like this.</div><div><br></div><div>txn_kingdom: Animalia or URI_To_Animailia</div><div>txn_phylum:  Chordata or URI_To_Chordata</div><meta charset="utf-8"><meta charset="utf-8"><div>

<br></div><div>ncbi_kingdom: <meta charset="utf-8">URI_To_UniProt_Animalia</div><div><meta charset="utf-8">ncbi_phylum: : <meta charset="utf-8">URI_To_UniProt_Chordata</div><div><br></div><div>That is have different predicates to allow one species concept to have several different taxonomic hierarchies.</div>

<div><br></div><div>These operate as tags, not as subclasses.</div><div><br></div><div>One issue with Uniprot and Bio2RDF is that the clades are subclassed, so you don&#39;t want to do owl:SameAs unless you want to entail that subclassing.</div>

<div><br></div><div>Here is an example of a Uniprot taxon <a href="http://www.uniprot.org/taxonomy/6426.rdf">http://www.uniprot.org/taxonomy/6426.rdf</a></div><div><br></div><div>In regards to missing potential identifications or occurrences, I don&#39;t know how much a problem this would actually be since they should show up in the Cloud.</div>

<div><br></div><div>However, it might be interesting to creating a listener that watches for ?:Occurrences and ?:Identifications and harvests them.</div><div><br></div><div>Or make a PingTheSemanticWeb type service for them.</div>

<div><br></div><div>Respectfully,</div><div><br></div><div>- Pete</div><div><br><div class="gmail_quote">On Wed, May 4, 2011 at 6:04 AM, Steve Baskauf <span dir="ltr">&lt;<a href="mailto:steve.baskauf@vanderbilt.edu" target="_blank">steve.baskauf@vanderbilt.edu</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#ffffff" text="#000000">

Thanks, Bob, for the examples.  I will try to dig my way through them. 

<br>

<br>

I don&#39;t want to give the impression that Darwin-SW was not intended to

facilitate any reasoning.  That is, after all why it is called

&quot;Darwin-SW&quot; instead of &quot;Darwin-data-markup&quot;.  I know that Cam is quite

interested in the &quot;semantic&quot; end of it, and when he has Internet again

I hope he will chime in on this.  I&#39;m simply confessing what my primary

concern is (data markup).  When we started working on the ontology, we

decided to make it as simple as possible while still trying to permit

every (or almost every) kind of class and relationship that was

discussed in the Oct/Nov discussion.  The result was to have a single

class Occurrence whose instances are described by properties, not 1.7

million classes N#occurrence and so on for the other six classes in the

model.  The intention was that DSW 1.0 would be constructed in such a

way that it could support the addition of more complex components (Cam

has actually marked the posted version at version 0.2 which means that

it is certainly subject to improvement) and possibly more complex

reasoning.  But the more complex stuff was not put into the model at

the start because we wanted something that (hopefully) most people

could agree represents reality reasonably well (at least a TDWG form of

reality since it uses the structure of DwC as its basis) and hence it

would actually have the possibility of being used by more than two

people.  <br>

<br>

I hope that people realize that I&#39;m not making these comments to give

Pete a hard time or anything.  I really am trying to understand the

relative benefits and problems of modeling on class of cat with many

properties vs. creating a class of cats for every property we care

about.  Clearly Pete&#39;s interest is in Taxon Concepts in the sense that

he has defined them.  OK, just to set up a straw man, let&#39;s say that I

am interested in geography more than taxonomy.  So I define a class and

URI for every state and province in the world.  I have no idea how many

of those there are, but I&#39;ll guess 400.  Now I want to describe other

things in the biodiversity informatics world.  So I mint classes

<a href="http://baskaufgeo.org/lod/ohio#occurrence" target="_blank">http://baskaufgeo.org/lod/ohio#occurrence</a> for occurrences that happen

in Ohio, <a href="http://baskaufgeo.org/lod/swaziland#occurrence" target="_blank">http://baskaufgeo.org/lod/swaziland#occurrence</a> for occurrences

that happen in Swaziland,

<a href="http://baskaufgeo.org/lod/tennessee#occurrence" target="_blank">http://baskaufgeo.org/lod/tennessee#occurrence</a>,

<a href="http://baskaufgeo.org/lod/ohio#taxon" target="_blank">http://baskaufgeo.org/lod/ohio#taxon</a>,

<a href="http://baskaufgeo.org/lod/swaziland#taxon" target="_blank">http://baskaufgeo.org/lod/swaziland#taxon</a>,

<a href="http://baskaufgeo.org/lod/tennessee#taxon" target="_blank">http://baskaufgeo.org/lod/tennessee#taxon</a>, etc. etc. for all 400

state/provinces and all seven basic types of things in the biodiversity

domain.  I can now do cool queries that involve geography.  <br>

<br>

OK, maybe I&#39;m somebody else and I love thinking about temporal

relationships.  So I create

<a href="http://baskauf-time.org/lod/1959may#occurrence" target="_blank">http://baskauf-time.org/lod/1959may#occurrence</a> for occurrences that

happen in May of 1959, <a href="http://baskauf-time.org/lod/2005may#occurrence" target="_blank">http://baskauf-time.org/lod/2005may#occurrence</a>

for occurrences that happened in May of 2005, etc.  Given a billion or

so years of life on earth, that would give me about 12 billion classes

for each of the six other basic kinds of things I want to model.  I

could do all kinds of cool queries that involve time now.  <br>

<br>

So which one of these three ontologies are we going to adopt?  The

taxon based one?  The time based one?  The geography based one?  Now we

are not just having to chose whether to model things as a single class

of cats whose instance have many color and reproductiveMethod

properties vs. many classes of cats each defined on the basis of its

color.  We must decide whether it&#39;s better to have many classes of

colors each defined by the kind of animal that has that color, or many

kinds of reproductive systems, each with different kinds of animals,

etc.  Where is it going to end and how could we agree on which system

to use?  It seems to me that it would be better to have a class of

cats, a class of reproductive systems, etc. and connect their instances

with properties.  <br>

<br>

Am I somehow thinking about this incorrectly?<br><font color="#888888">

Steve</font><div><div></div><div><br>

<br>

Bob Morris wrote:

<blockquote type="cite">

  <pre>See, for example,

Mungall et al., “Integrating phenotype ontologies across multiple

species”, Genome

Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2)

Ward Blondé et al.  &quot;Reasoning with bio-ontologies: using relational

closure rules to enable practical querying&quot;, Bioinformatics (2011)

doi: 10.1093/bioinformatics/btr164

Calder, et al. &quot;Machine Reasoning about Anomalous Sensor Data&quot;

<a href="http://dx.doi.org/10.1016/j.ecoinf.2009.08.007" target="_blank">http://dx.doi.org/10.1016/j.ecoinf.2009.08.007</a> or in manuscript form

at <a href="http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf" target="_blank">http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf</a>

...

OK, so  maybe these knowledge domains are all hypothesis-driven

sciences (i.e.,  sciences), and &lt;whatever dsw is modelling&gt; is not.

But that would be sad.

Bob

p.s. I had almost finished something else on this thread when Hilmar

beat me to the punch. But here&#39;s a slightly different expression of

his point:

It turns out that the differences between instances and classes is

mainly important in contexts in which you have declaimed interest,

namely reasoning.  In the RDF/RDFS/OWL stack, enforcing a distinction

between classes and instances only occurs pretty high up in the stack,

when one desires an OWL variant that will offer guarantees that

reasoners will finish any inference they are asked to verify,

preferably in less than exponential time . I guess, but am not

certain, that even in an LOD context, if data are described with an

OWL ontology that is known to be intractable, e.g. not in OWL DL, that

it is possible to design SPARQL queries that will never complete. In

fact, I believe that even with tractable ontologies, there are SPARQL

queries that are fundamentally exponential in the number of variables.

p.p.s. Irrelevant, but equivalent, aside about mathematics. At the

turn of the 20th century, Whitehead and Russell tried (and failed) to

show that everything about numbers could be logically derived from an

axiomatic description of the natural numbers (i.e. non-negative

integers). It was later shown to be the case that you must include in

your logical foundations something deeper, namely the ability to have

sets that are elements of other sets (roughly, classes that are

individuals in other classes.).  Without this, and starting only with

the natural numbers, you can logically derive all rational numbers

(fractions) and their arithmetic properties, and even all the

irrational numbers that are are the solutions of polynomial equations

with integer coefficients (&quot;algebraic numbers&quot;) such as sqrt(2), and

even solutions of the polynomials that have coefficients that are

algebraic numbers.  But without introducing the notion of the set of

subsets of a set, you cannot logically derive the all the interesting

transcendental numbers (i.e. those which are not the roots of

polynomials), such as e and pi.  So if you love calculus, you better

not insist on  distinguishing instances from classes. But if you are

content with polynomials, you can probably be ontologically sloppy.

Or, if you don&#39;t care about logical foundations of your science, you

can forget about the whole thing. :-)

On Tue, May 3, 2011 at 11:51 PM, Steve Baskauf

<a href="mailto:steve.baskauf@vanderbilt.edu" target="_blank">&lt;steve.baskauf@vanderbilt.edu&gt;</a> wrote:

  </pre>

  <blockquote type="cite">

    <pre>[snip]

OK, so let&#39;s imagine that we mark up several million records of specimens,

tissue samples, and images as RDF.  (We don&#39;t have to imagine very hard, I

think the BiSciCol group is planning to actually do this within the next

several months.)  I would really like to hear from some of the people who

actually use &quot;DL reasoners&quot; (a group which certainly does not include me) to

know what it is that we could actually find out that would be useful about

that big data blob using reasoners.  I have already confessed that my

primary concern is enabling data discovery, transfer, and aggregation using

GUIDs and RDF.  I&#39;m still somewhat of a &quot;semantic web&quot; skeptic as far as the

whole inferencing thing is concerned.  Aside from inferring &quot;duplicates&quot;,

I&#39;m really wanting to know what else there is useful that could be reasoned

outside of the Taxon/TaxonConcept class.  (I can imaging useful reasoning

being done about things in that class like the relationships among names,

concepts, parent taxa, etc. e.g. Rod Page&#39;s Biodiversity Informatics 3:1-15

article <a href="https://journals.ku.edu/index.php/jbi/article/view/25" target="_blank">https://journals.ku.edu/index.php/jbi/article/view/25</a>)  I think this

(data markup priority vs. inferencing priority) is an important discussion

to have before the tdwg community can settle on some kind of consensus way

of turning database records into RDF, particularly if it is going to have a

big influence on the way the RDF model is set up.  To me, there is a clear

and immediate need to be able to mark data up in a straightforward way.  If

we can get the semantic part, too, that would be great but not at the

expense of data markup.  I just was at a meeting of a bunch of herbarium

curators.  They desperately need a way to implement GUIDs and aggregate data

and they need it now.  I really don&#39;t think they care one whit about

inferencing.  If we coalesce on a model that is great for doing cool things

with 10 records but which can&#39;t handle hundreds of thousands of records

easily and simply, then we are wasting our time.  I don&#39;t think we need to

dither about this for another five years.

  I would hate to have to draw an RDF graph of that model

I would as much hate to have to draw an RDF graph of 1.7 million instances.

The point being, in order to draw a graph of how someone models a domain you

don&#39;t draw a graph of the entire RDF triple store.

That was the point I was trying to make (I think).

Thanks for the clarification, Hilmar.

Steve

-hilmar

--

===========================================================

: Hilmar Lapp  -:- Durham, NC -:- <a href="http://informatics.nescent.org" target="_blank">informatics.nescent.org</a> :

===========================================================

--

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: <a href="tel:%28615%29%20343-4582" value="+16153434582" target="_blank">(615) 343-4582</a>,  fax: <a href="tel:%28615%29%20343-6707" value="+16153436707" target="_blank">(615) 343-6707</a>

<a href="http://bioimages.vanderbilt.edu" target="_blank">http://bioimages.vanderbilt.edu</a>

_______________________________________________

tdwg-content mailing list

<a href="mailto:tdwg-content@lists.tdwg.org" target="_blank">tdwg-content@lists.tdwg.org</a>

<a href="http://lists.tdwg.org/mailman/listinfo/tdwg-content" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a>

    </pre>

  </blockquote>

  <pre>  </pre>

</blockquote>

<br>

<pre cols="72">-- 

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences

postal mail address:

VU Station B 351634

Nashville, TN  37235-1634,  U.S.A.

delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235

office: 2128 Stevenson Center

phone: <a href="tel:%28615%29%20343-4582" value="+16153434582" target="_blank">(615) 343-4582</a>,  fax: <a href="tel:%28615%29%20343-6707" value="+16153436707" target="_blank">(615) 343-6707</a>

<a href="http://bioimages.vanderbilt.edu" target="_blank">http://bioimages.vanderbilt.edu</a>

</pre>

</div></div></div>

<br>_______________________________________________<br>

tdwg-content mailing list<br>

<a href="mailto:tdwg-content@lists.tdwg.org" target="_blank">tdwg-content@lists.tdwg.org</a><br>

<a href="http://lists.tdwg.org/mailman/listinfo/tdwg-content" target="_blank">http://lists.tdwg.org/mailman/listinfo/tdwg-content</a><br>

<br></blockquote></div><br><br clear="all"><br>-- <br>

------------------------------------------------------------------------------------<br>Pete DeVries<br>Department of Entomology<br>University of Wisconsin - Madison<br>445 Russell Laboratories<br>1630 Linden Drive<br>Madison, WI 53706<br>

Email: <a href="mailto:pdevries@wisc.edu" target="_blank">pdevries@wisc.edu</a><br><a href="http://www.taxonconcept.org/" target="_blank">TaxonConcept</a>  &amp;  <a href="http://about.geospecies.org/" target="_blank">GeoSpecies</a> Knowledge Bases<br>

A Semantic Web, <a href="http://linkeddata.org/" target="_blank">Linked Open Data</a>  Project<br>--------------------------------------------------------------------------------------<br>

</div>