[tdwg-content] If you need something for referring to a population, then it is probably best to do it as a related class

Steve Baskauf steve.baskauf at vanderbilt.edu
Wed May 4 05:51:57 CEST 2011


Comments inline:

Hilmar Lapp wrote:
>
> On May 3, 2011, at 9:00 PM, Steve Baskauf wrote:
>
>> But I was under the impression that one models things by describing 
>> classes and the properties that connect them.
>
> In OWL, properties connect instances, not classes. RDF allows 
> metaclasses (things that are classes and instances), but doing this 
> will throw most (all?) reasoners off the track.
I knew I would get in trouble talking about this among experts. :-)  
Thanks for the correction.  I should have said "properties that connect 
instances of those classes".  I think that is what I meant.  My point 
was that in creating a model, one doesn't have to enumerate every 
particular instance, particularly if there are many of them.  One can 
describe the class in general and let the users create the instances 
that are appropriate for that class. 
>
>>   Classes are (to me) a very different thing than instances of 
>> classes.  A model containing more than 13.6 million classes is at 
>> least 1.9 million times as complicated as a model with 7 classes.
>
> Yes and no. I can model a taxonomy as a subclass hierarchy of classes, 
> or as a property-based (memberOf or some such) hierarchy of 
> individuals that all instantiate a single "Taxon" class. The former 
> isn't 1 million times more complex than the latter. However, they are 
> not identical either, and which approach one chooses has significant 
> consequences for how easy it is to express things about those taxa, 
> and for inferring new things from those with a DL reasoner.
Well, I guess I was influenced in my thinking about this by 
http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot , particularly the 
part about the cats, which I can actually understand pretty well.  To 
me, the part of that wiki page that is most relevant to this discussion is:

"Whether the subclassing option is preferable to the tagging approach 
depends on the use of the ontology. The TDWG ontology's principal role 
is not modeling the entire domain to permit inference but allowing the 
mark up of data so that it will flow between applications as freely as 
possible. It has to be something that is easy to map into multiple 
technologies and something that people can agree on rapidly.

This strongly suggests that the tagging approach should be taken 
wherever possible. First agree on the basic semantic units and model the 
rest of the semantics with tagging. Only subclass when absolutely 
necessary."

The really great thing about this is that I can dodge further 
responsibility by just blaming my way of thinking on the people who 
posted that page (Roger Hyam modified by Bob Morris, I think). :-)  But 
seriously, I think that the statement above pretty well summarizes what 
may be the difference between what Pete and I are saying.  My primary 
concern is to allow "the mark up of data so that it will flow between 
applications as freely as possible".  Pete's point may be to permit 
inferencing.
. 
OK, so let's imagine that we mark up several million records of 
specimens, tissue samples, and images as RDF.  (We don't have to imagine 
very hard, I think the BiSciCol group is planning to actually do this 
within the next several months.)  I would really like to hear from some 
of the people who actually use "DL reasoners" (a group which certainly 
does not include me) to know what it is that we could actually find out 
that would be useful about that big data blob using reasoners.  I have 
already confessed that my primary concern is enabling data discovery, 
transfer, and aggregation using GUIDs and RDF.  I'm still somewhat of a 
"semantic web" skeptic as far as the whole inferencing thing is 
concerned.  Aside from inferring "duplicates", I'm really wanting to 
know what else there is useful that could be reasoned outside of the 
Taxon/TaxonConcept class.  (I can imaging useful reasoning being done 
about things in that class like the relationships among names,  
concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 
3:1-15 article https://journals.ku.edu/index.php/jbi/article/view/25)  I 
think this (data markup priority vs. inferencing priority) is an 
important discussion to have before the tdwg community can settle on 
some kind of consensus way of turning database records into RDF, 
particularly if it is going to have a big influence on the way the RDF 
model is set up.  To me, there is a clear and immediate need to be able 
to mark data up in a straightforward way.  If we can get the semantic 
part, too, that would be great but not at the expense of data markup.  I 
just was at a meeting of a bunch of herbarium curators.  They 
desperately need a way to implement GUIDs and aggregate data and they 
need it now.  I really don't think they care one whit about 
inferencing.  If we coalesce on a model that is great for doing cool 
things with 10 records but which can't handle hundreds of thousands of 
records easily and simply, then we are wasting our time.  I don't think 
we need to dither about this for another five years.
>
>>   I would hate to have to draw an RDF graph of that model
>
> I would as much hate to have to draw an RDF graph of 1.7 million 
> instances. The point being, in order to draw a graph of how someone 
> models a domain you don't draw a graph of the entire RDF triple store.
That was the point I was trying to make (I think).

Thanks for the clarification, Hilmar.
Steve
>
> -hilmar
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
> ===========================================================
>
>
>

-- 
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences

postal mail address:
VU Station B 351634
Nashville, TN  37235-1634,  U.S.A.

delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235

office: 2128 Stevenson Center
phone: (615) 343-4582,  fax: (615) 343-6707
http://bioimages.vanderbilt.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-content/attachments/20110503/305c07a7/attachment.html 


More information about the tdwg-content mailing list