[tdwg-content] Schema-last and crazy: correlated? [SEC=UNCLASSIFIED]

Paul Murray pmurray at anbg.gov.au
Wed Mar 2 02:12:12 CET 2011


On 24/02/2011, at 9:05 AM, joel sachs wrote:

> Is that Organism#hasIdentification URI from the TDWG ontology? I thought the TDWG ontology was de facto deprecated. Am I wrong about that?

(Sorry about the delay in replying. We have had Tony Rees up here and are trying to integrate Taxamatch into our search service. Could be good.)

I was speaking more about the idea of subclassing properties in general than making specific comment of particular "real" ontology terms.

> You wrote: "a person who uses that predicate to describe a
> painting is misusing the vocabulary and deserves what they get."
> 
> The problem is that it's not just the person who misuses a vocabulary that
> gets a mess of incorrect inferences. We all do.

True, but there's just no way to avoid that. Although we like to talk about the semantic web as "all predicates everywhere", that Ontology is inconsistent. In practice, whenever you reason, to take a set of ontologies that you trust - whether you specify them by literal filenames or simply by saying "I trust everything at some SPARQL endpoint".

With a tighter vocabulary, if someone has a bad predicate somewhere then anyone who uses that ontology - whether directly or by indirect inclusion - winds up with an inconsistent vocabulary that can't be reasoned over.

But - the curators of that data are simply getting it wrong, *provided* that the documentation is clear enough about how the predicates *should* be used.

I suppose a parallel is the DNS system. One bad DNS has a ripple effect. For that reason, DNS servers don't take records from just anyone - there's a network of trust and responsibility.

The benefit of a tighter vocabulary is that "getting it wrong" becomes a machine-detectable occurence.

As for usability: the situation is (say) that someone wants to say that something not an occurrence has an identification, and the TDWG vocabulary declares that hasIdentification has domain Occurrece. Well ... then they simply don't mean "hasidentification" in the tdwg vocabulary sense of the word.

A: "Green" means any colour whose HSB equivalent has a hue of 0.22 to 0.44
B: my car is green
A: no it isn't, it's teal
B: well, *I* think it's green
A: Cool! Use your own vocabulary namespace and define green how you like.
B: But I want to use *your* term.
A: why?
B: so that people looking for what you call green cars will find mine
A: but your car isn't what we call green. It's what we call teal. If someone searches for what we call green and gets your car, they will not get what they want to find.
B: but my car *is* green!

And round and round it goes.

C: Ok - how about we define a colour "greenish" and declare that anything that is  green or teal is therefore greenish?
A: I don't want to add that to my vocabulary of colours
C: cool. make a separate vocabulary, host it somewhere. B can use that.

B: but people won't know to search for "greenish" if it's in a separate vocabulary
Me: ask A to add it
B: A?
A: Nope. "greenish" is right out.
B: C?
C: Dude, you simply don't own A's vocabulary, and that's all there is to it. Define your own, import it in your ontology, and anyone who doesn't like it just has to live without your data.

D: Hey! I want to use B's data, but I don't like B's vocabulary, particularly not this "greenish" thing.
A, B, C: I'm sorry, D, but what you are asking for is inherently impossible.

_______________________________________________


If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.

Please consider the environment before printing this email.


More information about the tdwg-content mailing list