[tdwg-content] class design, generalization, L(O)D

Mon Nov 15 22:24:06 CET 2010

I sent this to tdwg-tag instead of this more appropriate list.  My
apologies to those who see it twice, along with any replies to it.

Jonathan Reese, an employee of the Science Commons and TDWG member
(and who knows way more about semantic web than I do) recently sent me
this. I copy it here with his permission. Each of the paragraphs seems
to me to be germane in different ways to the discussions about what
should be an Individual. For those not deep into RDF, for the word
"axiom", you could loosely understand "rule", although that term also
has technical meaning that is sometimes a little different. Jonathan
raises an important use case in the second paragraph, which is data
quality control.  That's a topic of interest to many, but especially
those following the new Annotation Interest Group. Originally, this
was part of a discussion we had about my favorite hobby horse,
rdfs:domain.  He is not on my side.  When people who know more than I
do about something are skeptical of my arguments about it, I usually
suspend disbelief and temporarily adopt their position.

Jonathan's first point is pretty much what Paul Murray observed
yesterday in response to a question of Kevin Richards.

"(a) subclassing is the way in RDFS or OWL you would connect the more
specific to the less specific, so that you can apply general theorems
to a more specific entity.  That is, a well-documented data set would
be rendered using classes and properties that were very specific so as
to not lose information, and then could be merged with a
badly-documented data set by relaxing to more general classes and
properties using subclass and subproperty knowledge.

(b) axioms (i.e. specificity) are valuable not only for expressing
operational and inferential semantics, but also for "sanity checking"
e.g. consistency, satisfiability, Clark/Parsia integrity checks (
http://clarkparsia.com/pellet/icv/ ), and similar. Being able to
detect ill-formed inputs is incredibly valuable.

People talk past one another because there are many distinct use cases
for RDF and assumptions are rarely surfaced. For L(O)D, you're
interested in making lots of links with little effort. Semantics is
the enemy because it drives up costs. For semantic web, on the other
hand, you're interested in semantics, i.e. understanding and
documenting the import of what's asserted and making a best effort to
only assert things that are true, even in the presence of open world
assumption and data set extensibility. Semantics is expensive because
it requires real thought and often a lot of reverse engineering.
People coming from these two places will never be able to get along."
---Jonathan Rees in email to Bob Morris
================

Bob Morris

--

-- 
Robert A. Morris
Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob at gmail.com
web: http://bdei.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)