[taxon-model] Taxon Data Model list now operational

Eamonn O Tuama eotuama at gbif.org
Tue May 8 10:02:16 CEST 2007



The Taxon Data Model list is now operational. To maintain continuity, I have
copied the relevant emails of the past week into an "archive" below. Please
use this list from now on to continue the discussions.

Best regards,

Éamonn

-----Discussion on TDM that took place between 2 and 7 May 2007 -----


-------------------  from Renato
--------------------------------------------------------

OK, so I "think" I'm starting to understand the problem that has led to the
current approach taken by the taxon data model.

Trying to make a quite simplified analogy with specimen data, imagine a
collection that used a simple OCR process in all labels and now it has only
one table with a single textual field. Since we can find different things in
labels (some may have coordinates, others not, some may have collecting
date, etc.) the suggestion for this kind situation would then be to tag all
records individually. For instance saying that "in this specific record you
may find something about the location, in this other record you may find
something about location and date" and so on.

Now back to species databases, if tagging is really something at the record
level (and I suppose it is), I would be really surprised to see a species
database which is ready to use some kind of tagging mechanism. Tagging at
the record level would therefore require changing the data structure and
revising all records. 

If this kind of work is being considered, then why not restructure
everything according to the new terminology that was proposed during the
meeting? Unless we are talking about some kind of data that simply cannot be
separated and structured according to the proposed terms...

Looking at the results of the meeting, it's really tempting to take all
terms and put them into a simple conceptual schema like DarwinCore. It would
not only provide a common XML vocabulary, but we would almost instantly
benefit from the existing technology for sharing/accessing distributed data.
>>>From the TAPIR perspective, data exchange schemas like PlinianCore could be
seen as output models. 
All providers from the different networks could still try to map the same
agreed terms/concepts.

If tagging will not take place at the record level, but at the field level,
like "I have field X which sometimes has content about behaviour, sometimes
about evolution, sometimes both, so I will automatically tag all records
with both terms", then I see no big difference if in the current way of
using TAPIR we just take the two corresponding concepts and map them against
the same local field.

RDF could still be one of the TAPIR outputs, but the ontology would probably
need a different approach (as discussed in previous messages).

Best Regards,
--
Renato

-------------------  from William
-------------------------------------------------------

Hi Renato, Bob and all,

I am a new to this thread and I have not been able to go through it all, but
as some of you already know, for the IABIN Species and Specimens Thematic
Network we will be setting up TAPIR providers with information on Species
structured with the Plinian Core (www.pliniancore.org).  The first providers
will hopefully be ready by mid-year... our datasource (~4K Species records),
will hopefully be on-line in the next month... 

Additionally, I would really like to try coding some of the information from
our Timber Tree Species from the Flora of Costa Rica to set up the TDM on
top of that, particularly looking for those that do not map well.

I promise to write back next week,

William Ulate
Coordinator
Biodiversity Informatics Unit
Instituto Nacional de Biodiversidad, INBio.

P.S.: Where can I find the minutes/results of the Workshop?


-------------------  from Bob
------------------------------------------------------------



Renato De Giovanni wrote:
> 
> 
> Anyway, I'm not quite familiar with species-level data sources. From 
> the previous messages, it seems that the main reason for using the 
> generic tagging approach is that most data sources will have chunks of 
> text including information about one or more TDM categories, and it 
> will be impractical to separate this information in a more structured 
> way. Did I understand the problem correctly?

Yes, but it is worse. Many such sources have \both/ textual---but
categorized---data and structured data. And both may need ontological
mapping so that both machine integration and human display applications have
a chance of putting together the right stuff and also not ignoring what the
client wishes not be ignored.
> 
> In this case, then you're right that it would be interesting if 
> someone could investigate this a bit more, make some tests and give us 
> a more practical feedback. If most participants of the species model 
> workshop have this kind of database, maybe they could try to map their 
> fields to the TDM categories.

I am presently doing some of that, albeit first trying to hand code some
instances with Protege and Altova SemanticWorks. I guess the interesting
part will come for stuff that \doesn't/ map well. At the moment, I am
somewhat at a loss for what our intent was in this case, but maybe in
another few hours I will have figured that out. ...

Bob

-------------------  from Renato
-----------------------------------------------------

Hello Roger,

Markus was right on his comment. I wasn't thinking about any particular
implementation of TAPIR, I just wanted to warn about some implications of
using generic models like TDM in the TAPIR context.

Take DarwinCore as an example:

* Most providers of specimen data use relational databases where Darwin
concepts correspond to table columns, so the mapping process is easier.
* If I'm a client and I'm interested in providers that have content for
lat/long, I can just inspect the capabilities response to see if they mapped
the corresponding concepts.
* Since we have different concept ids for each kind of data, we have more
possibilites when designing output models.

Now if I understood correctly, TDM is so generic that the same kind of model
could be used for DarwinCore - just replace the TDM terms by the Darwin
concepts. And although there's nothing intrinsically wrong with this
approach:

* If most providers will have databases where TDM categories correspond to
table columns, then they will need to prepare a super view to make all data
appear under a single InfoItem column, just beside another column with the
corresponding category value. It's possible, but it's more work for
providers and performance will not be good.
* If I'm a client and I'm interested in providers that have content for
habitat, I cannot simply inspect the capabilities response, because it will
just show me that the providers have InfoItems. I'll need to send additional
search/inventory requests to discover what kind of data is available.
* Since there will be only a few generic concepts, output models will be
very limited in TAPIR. As you know, at the moment we cannot have conditional
mappings in TAPIR, for instance: InfoItem corresponds to element habitat
only when category equals habitat.

I'm not against generic models. I also used them myself in specific
circumstances like meta modelling applications, or when the application had
such a mutable nature that it was better to use a more generic approach
(even at the cost of performance penalties and other additional work).

Anyway, I'm not quite familiar with species-level data sources. From the
previous messages, it seems that the main reason for using the generic
tagging approach is that most data sources will have chunks of text
including information about one or more TDM categories, and it will be
impractical to separate this information in a more structured way. Did I
understand the problem correctly?

In this case, then you're right that it would be interesting if someone
could investigate this a bit more, make some tests and give us a more
practical feedback. If most participants of the species model workshop have
this kind of database, maybe they could try to map their fields to the TDM
categories.

Best Regards,
--
Renato


-------------------  from Marcus
-------------------------------------------------------

Hello,
some small corrections below regarding my understanding
--
Markus



On 04.05.2007, at 12:58, Roger Hyam wrote:

> Hi Markus,
>
> I am replying to this and cc'ing the TAG list because I really think 
> we should be having the discussion there. I am sure there are other 
> people who might like to be involve from a technical stand point. I 
> hope they can read this message thread backwards to catch up.
>
> If I can summarize:
>
> We are talking about the data models that were dreamt up at the 
> SpeciesDataModel workshop
>
> http://rs.tdwg.org/ontology/voc/TaxonDataModel
> http://rs.tdwg.org/ontology/voc/TDMTerm
>
> The choice is whether to have an inherited hierarchy of classes of 
> object to represent information items or to have a single information 
> item and 'tag' it with categories (instances).
The inherited hierarchy of classes is not needed for semantics in both
approaches. We can derive *all* classes directly form the base InfoItem
class. Just as you are planning to do with the controlled vocabulary terms,
deriving them from BaseTerm. This kind of inheritance is more technical by
nature simply passing on all properties of InfoItems. The semantical
hierarchy InfoItem-  >BehaviorInfoItem->EvolutionaryBehaviorInfoItem can be
modelled in both approaches if we want. But in both ways we dont have to!


> Having info items as different classes means that they would be 
> possibly be clearer in a straight serialization.
>
> <tdm:TaxonDataModel>
> 	<tdm:aboutTaxon>.....</tdm:aboutTaxon>
> 	<tdm:hasInformation>
> 		<tdmt:Behaviour>
> 			<tdmt:hasContent>Some stuff about
behaviour</tdmt:hasContent>
> 		</tdmt:Behaviour>
> 	</tdm:hasInformation>
> 	<tdm:hasInformation>
> 		<tdmt:Evolution>
> 			<tdmt:hasContent>Some stuff about
evolution</tdmt:hasContent>
> 		</tdmt:Evolutionr>
> 	</tdm:hasInformation>
> 	<tdm:hasInformation>
> 		<tdmt:BehaviouralEvolution>
> 			<tdmt:hasContent>Some stuff about evolution of
behaviour</ 
> tdmt:hasContent>
> 		</tdmt:BehaviouralEvolution>
> 	</tdm:hasInformation>
> </tdm:TaxonDataModel>
>
> But taking the tagging approach:
>
> <tdm:TaxonDataModel>
> 	<tdm:aboutTaxon>.....</tdm:aboutTaxon>
> 	<tdm:hasInformation>
> 		<tdm:InfoItem>
> 			<tdm:category
rdf:resource="http://rs.tdwg.org/ontology/voc/
> TDMTerm#Behaviour"/>
> 			<tdmt:hasContent>Some stuff about
behaviour</tdmt:hasContent>
> 		</tdm:InfoItem>
> 	</tdm:hasInformation>
> 	<tdm:hasInformation>
> 		<tdm:InfoItem>
> 			<tdm:category
rdf:resource="http://rs.tdwg.org/ontology/voc/
> TDMTerm#Evolution"/>
> 			<tdmt:hasContent>Some stuff about
evolution</tdmt:hasContent>
> 		</tdm:InfoItem>
> 	</tdm:hasInformation>
> 	<tdm:hasInformation>
> 		<tdm:InfoItem>
> 			<tdm:category
rdf:resource="http://rs.tdwg.org/ontology/voc/
> TDMTerm#Behaviour"/>
> 			<tdm:category
rdf:resource="http://rs.tdwg.org/ontology/voc/
> TDMTerm#Evolution"/>
> 			<tdmt:hasContent>Some stuff about evolution of
behaviour</ 
> tdmt:hasContent>
> 		</tdm:InfoItem>
> 	</tdm:hasInformation>
> </tdm:TaxonDataModel>
>
> Renato raised questions about serving that tagged version with TAPIR 
> by which I think he meant TAPIRLink as it would not be possible to do 
> the above example as a flat schema. This is the same problem as 
> serving multiple identifications for a specimen I guess
> - is this right?
Not really. I think the problem has to do with creating an XML schema to be
used for a TAPIR model that has *different* node paths to be mapped to.
Using the generic InfoItem "metamodel" you always end up with this path that
you map to:

/tdm:TaxonDataModel/tdm:hasInformation/tdm:InfoItem/tdmt:hasContent


instead of having different ones when using inheritance:
/tdm:TaxonDataModel/tdm:hasInformation/tdm:InfoItem/tdmt:Behaviour
/tdm:TaxonDataModel/tdm:hasInformation/tdm:InfoItem/tdmt:Evolution
/tdm:TaxonDataModel/tdm:hasInformation/tdm:InfoItem/
tdmt:BehaviouralEvolution


> Reminds me of the point  I think Markus raised it at the beginning.  
> Why not have InfoItem as the top level element and move the taxon  
> into it?
>
> <InfoItem>
> 	<aboutTaxon>...</aboutTaxon>
> 	<category rdf:resource="http://rs.tdwg.org/ontology/voc/ 
> TDMTerm#Evolution"/>
> 	<hasContent>Some stuff about evolution</hasContent>
> </InfoItem>
>
> Info item is then like a DwC record and the category property is  
> like the BasisOfRecord. (TAPIRLink couldn't do multiple category  
> stuff).
>
> The argument against this is that the metadata would have to be  
> repeated for multiple InfoItems. Most requests would be for  
> multiple InfoItems about the same species - I guess but I really  
> need clearer examples as to what this will be applied to. Who is  
> going to implement this in the near future? Perhaps they should  
> have a go and decide? Isn't Wouter doing something on it? I don't  
> have the time just now to try out some examples and I think that is  
> what is needed.
>
> What does everyone else think?
>
> All the best,
>
> Roger
>
-------------------  from Roger
---------------------------------------------------------

Hi Markus,

I am replying to this and cc'ing the TAG list because I really think we
should be having the discussion there. I am sure there are other people who
might like to be involve from a technical stand point. I hope they can read
this message thread backwards to catch up.

If I can summarize:

We are talking about the data models that were dreamt up at the
SpeciesDataModel workshop

http://rs.tdwg.org/ontology/voc/TaxonDataModel
http://rs.tdwg.org/ontology/voc/TDMTerm

The choice is whether to have an inherited hierarchy of classes of object to
represent information items or to have a single information item and 'tag'
it with categories (instances).

Having info items as different classes means that they would be possibly be
clearer in a straight serialization.

<tdm:TaxonDataModel>
	<tdm:aboutTaxon>.....</tdm:aboutTaxon>
	<tdm:hasInformation>
		<tdmt:Behaviour>
			<tdmt:hasContent>Some stuff about
behaviour</tdmt:hasContent>
		</tdmt:Behaviour>
	</tdm:hasInformation>
	<tdm:hasInformation>
		<tdmt:Evolution>
			<tdmt:hasContent>Some stuff about
evolution</tdmt:hasContent>
		</tdmt:Evolutionr>
	</tdm:hasInformation>
	<tdm:hasInformation>
		<tdmt:BehaviouralEvolution>
			<tdmt:hasContent>Some stuff about evolution of
behaviour</ tdmt:hasContent>
		</tdmt:BehaviouralEvolution>
	</tdm:hasInformation>
</tdm:TaxonDataModel>

But taking the tagging approach:

<tdm:TaxonDataModel>
	<tdm:aboutTaxon>.....</tdm:aboutTaxon>
	<tdm:hasInformation>
		<tdm:InfoItem>
			<tdm:category
rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Behaviour"/>
			<tdmt:hasContent>Some stuff about
behaviour</tdmt:hasContent>
		</tdm:InfoItem>
	</tdm:hasInformation>
	<tdm:hasInformation>
		<tdm:InfoItem>
			<tdm:category
rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Evolution"/>
			<tdmt:hasContent>Some stuff about
evolution</tdmt:hasContent>
		</tdm:InfoItem>
	</tdm:hasInformation>
	<tdm:hasInformation>
		<tdm:InfoItem>
			<tdm:category
rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Behaviour"/>
			<tdm:category
rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Evolution"/>
			<tdmt:hasContent>Some stuff about evolution of
behaviour</ tdmt:hasContent>
		</tdm:InfoItem>
	</tdm:hasInformation>
</tdm:TaxonDataModel>

Renato raised questions about serving that tagged version with TAPIR by
which I think he meant TAPIRLink as it would not be possible to do the above
example as a flat schema. This is the same problem as serving multiple
identifications for a specimen I guess - is this right?

Reminds me of the point  I think Markus raised it at the beginning.  
Why not have InfoItem as the top level element and move the taxon into it?

<InfoItem>
	<aboutTaxon>...</aboutTaxon>
	<category rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Evolution"/>
	<hasContent>Some stuff about evolution</hasContent> </InfoItem>

Info item is then like a DwC record and the category property is like the
BasisOfRecord. (TAPIRLink couldn't do multiple category stuff).

The argument against this is that the metadata would have to be repeated for
multiple InfoItems. Most requests would be for multiple InfoItems about the
same species - I guess but I really need clearer examples as to what this
will be applied to. Who is going to implement this in the near future?
Perhaps they should have a go and decide? Isn't Wouter doing something on
it? I don't have the time just now to try out some examples and I think that
is what is needed.

What does everyone else think?

All the best,

Roger


-------------------  from Marcus
-------------------------------------------------------

On 03.05.2007, at 11:18, Roger Hyam wrote:

> Hi Markus,
>
> I have downsized the cc list for this discussion as I think it may be 
> just confusing to the less technically focussed or otherwise involved 
> people who would rather just hear the answer.
>
> I am not sure I totally follow you. Currently InfoItem.category 
> property has a range of DefinedTerm which means anything that it 
> should contain an instance of DefinedTerm - i.e. the simplified 
> controlled vocabulary things we are using. It should perhaps have a 
> range of http://rs.tdwg.org/ontology/voc/TDMTerm#TDMTerm.
yes, so the definition of what the infoitem is about is a separate ontology,
thats what I naively called "domain" ontology before. This can be a simple
list of terms or a hierarchical list. I suspect you aim at the flat list to
avoid inheritance for reasons given in the TAG wiki.


> Are you suggesting that we have different InfoItem child classes one 
> for each of the categories we are talking about (as listed here 
> http://rs.tdwg.org/ontology/voc/TDMTerm)?
>
> So we would have
>
>  <owl:Class rdf:ID="EvolutionInfoItem">
> 	<rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/
> Base#InfoItem"/>
> </owl:Class>
>
> and
>
>  <owl:Class rdf:ID="BehaviourInfoItem">
> 	<rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/
> Base#InfoItem"/>
> </owl:Class>
>
> etc.
yes. As Renato has mentioned in a related parallel discussion this also
allows us to create TAPIR models, cause an XML schema for these classes
would have different element names and thus become mappable easily.


>
> And that instance data would look like this:
>
> <bii:BehaviourInfoItem>
> 	<ii:hasContent>Some stuff about behaviour</ii:hasContent> 
> </bii:Behaviour>
yes, thats what I was thinking of


> If you had evolutionary-behaviour data you might do this
>
> <bii:BehaviourInfoItem rdf:about="1233">
> 	<rdf:type rdf:resource="http://rs.tdwg.org/ontology/voc/
> EvolutionInfoItem#EvolutionInfoItem"/>
> 	<ii:hasContent>Some stuff about evolution of behaviour</ 
> ii:hasContent> </bii:Behaviour>
I dont understand your intention here. If you want a more specific infoitem
about evolutionary-behaviour why not define a new class?

<eii:EvolutionInfoItem rdf:about="1233">
	<ii:hasContent>Some stuff about evolution of
behaviour</ii:hasContent> </eii:EvolutionInfoItem >


> Here we say that the thing 1233 is an instance of both classes.
>
> The same instance data the way we came up with it at the meeting would 
> look like this:
>
> <tdm:InfoItem>
> 	<tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/
> TDMTerm#Evolution"/>
> 	<tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/
> TDMTerm#Behaviour"/>
> 	<tdm:hasContent>Some stuff about evolution of behaviour</ 
> tdm:hasContent> </tdm:InfoItem>
alright, so one idea about having a category property is that it allows you
to "tag" one fact (infoitem) with several categories. Is that a requirement
you had in mind when designing TDM?


> The attraction of doing it this way to me (and I think Donald 
> suggested it) was that it is easy to write a client that will digest 
> InfoItems without knowing what they are. If the client hadn't heard of 
> Behaviour it could do nothing with a class based examples unless it 
> was capable of exploring the class hierarchy and finding something it 
> did know about and even then there may have been restrictions on the 
> properties that it didn't understand.
> Effectively every client would need to know OWL.
well, not really. If all InfoItem instances are bundled through the
TDMClass, you know all of the instances in there are InfoItems. And I cant
see a difference for an ignorant application in not understanding the class
name or not understanding the category class.  
For OWL aware applications on the other hand this gives extra knowledge
which is not as easy to get if you have to understand the categories "domain
ontology" / controlled vocabulary (Because you would need to know OWL AND
know how to interpret the non-OWL category
property)

> If an application (particular thematic network) really did want a 
> BehaviourInfoItem class it could define one itself.
>
> <owl:Class rdf:ID="BehaviourInfoItem">
>   <rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/
> Base#InfoItem"/>
>   <owl:equivalentClass>
>     <owl:Restriction>
>       <owl:onProperty rdf:resource="tdm:category" />
>       <owl:hasValue rdf:resource="http://rs.tdwg.org/ontology/voc/
> TDMTerm#Behaviour" />
>     </owl:Restriction>
>   </owl:equivalentClass>
> </owl:Class>
>
> Which I believe would give the same inferences that would be found by 
> going with subclasses (though I am no expert).
>
> The important thing is that we keep the instance data as simple and 
> stable as possible and impose meaning later.
>
> If we were working in a pure semantic web world I would be more 
> inclined to go down the class based route but we have to also deal 
> with instance data as if it were plain XML documents that we can use 
> through TAPIR, validate with XML Schema and transform with XSLT.
exactly this will be a problem if everything is an InfoItem...


> We could always change the definitions of the tdm:hasValue property 
> (and the others) so that they inherit from a high level property.
> This kind of change is good because it doesn't affect the instance 
> data.
>
> Have I understood your points correctly or have I just gone off on a 
> circle explaining something that is completely off track?
I think we are one the same track.
Thanks for the insight, Roger!

Markus

-------------------  from Bob
------------------------------------------------------------

On 5/3/07, Roger Hyam <roger at hyam.net> wrote:
Hi Markus,
[...] 



The same instance data the way we came up with it at the meeting 
would look like this:

<tdm:InfoItem>
        <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Evolution"/>
        <tdm:category rdf:resource=" http://rs.tdwg.org/ontology/voc/
TDMTerm#Behaviour"/>
        <tdm:hasContent>Some stuff about evolution of behaviour</
tdm:hasContent>
</tdm:InfoItem> 

 


The attraction of doing it this way to me (and I think Donald 
suggested it) was that it is easy to write a client that will digest
InfoItems without knowing what they are. If the client hadn't heard
of Behaviour it could do nothing with a class based examples unless
it was capable of exploring the class hierarchy and finding something
it did know about and even then there may have been restrictions on
the properties that it didn't understand. Effectively every client
would need to know OWL. 

<naive>
Ability to cleanly ignore stuff  is  very important to me as an application
writer. I don't yet have a clear vision of what this means for the current
discussion. I might later this week or next, because one of the too many
things I am doing at the moment is trying to hand-code some TDM instances
from some of the species data that we have in UMB Electronic Field Guide
(EFG) data.  This is more an exercise in learning the tools (Protege and
Altova SemanticWorks) than anything else, but it has given me an
appreciation of the simultaneous advantage and pain of being able to ignore
classes, or not, with OWL. My experience is very possibly due to naive,
learning-effect use of the tools so may not be worth much. It arises from my
initial annoyance with Protege for not recursively importing what seemed to
me to be the most central TDWG ontologies that TDM itself references
directly or indirectly, notably TDMTerm and TaxonConcept. [I don't yet know
what Altova does about this]. My annoyance began to turn to gratitude
when--with more and more stuff in TDM that happens to not be represented in
EFG data--I became relieved about the more and more stuff that \wasn't/ in
my face when I would never use it. [The instance management tools ought
themselves to be semantically aware. I should be able to tell Protege---or
better, it should learn---what I care about...] 

</naive>

If an application (particular thematic network) really did want a 
BehaviourInfoItem class it could define one itself.

<owl:Class rdf:ID="BehaviourInfoItem">
   <rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/ 
Base#InfoItem"/>
   <owl:equivalentClass>
     <owl:Restriction>
       <owl:onProperty rdf:resource="tdm:category" />
       <owl:hasValue rdf:resource=" http://rs.tdwg.org/ontology/voc/
TDMTerm#Behaviour" />
     </owl:Restriction>
   </owl:equivalentClass>
</owl:Class>

Yes, but... this will not solve the problem of proliferation of concepts and
the  need to relate them to one another when applications \don't/ want to
ignore them. The only maxim that the open world assumption addresses is
"ignorance is bliss". It leaves you on your own to deal with "a little bit
of knowledge is a dangerous thing." 


"A little learning is a dangerous thing; 
drink deep, or taste not the Pierian spring: 
there shallow draughts intoxicate the brain, 
and drinking largely sobers us again." 


Alexander Pope (1688 - 1744) in An Essay on Criticism, 1709. First use of
the maxim, according to phrase.com 

"Yet ah! why should they know their fate? 
Since sorrow never comes too late, 
And happiness too swiftly flies. 
Thought would destroy their paradise. 
No more; where ignorance is bliss, 
'Tis folly to be wise" 

Thomas Gray
Ode on a Distant Prospect of Eton College, 1742


I propose these two poems as the TDWG standard for the poetic representation
of RDF. :-)


Which I believe would give the same inferences that would be found by
going with subclasses (though I am no expert).


Nor I, but this smells like a place where  application writers would  find
themselves staring into the fiery pit of OWL Full. Remember that classes
cannot be instances in OWL DL, i.e. there is no Class class. This is very
annoying in OOP and makes code-reuse harder.  If reasoning with subclasses
turns out to require a Class class, then modeling TDM that way removes it
from first order logic.



The important thing is that we keep the instance data as simple and
stable as possible and impose meaning later. 


As we know, "instance data is simple" to biologists means "instance data is
a row in a spreadsheet".  Hah, hah, just serious. Further, close to the only
cell entry tolerable is text, a URL, or the file name of an image. 




If we were working in a pure semantic web world I would be more
inclined to go down the class based route but we have to also deal 
with instance data as if it were plain XML documents that we can use
through TAPIR, validate with XML Schema and transform with XSLT.

We could always change the definitions of the tdm:hasValue property
(and the others) so that they inherit from a high level property. 
This kind of change is good because it doesn't affect the instance data.

Have I understood your points correctly or have I just gone off on a
circle explaining something that is completely off track?

All the best,

Roger

Bob

-------------------  from Roger
----------------------------------------------------------
Hi Markus,

I have downsized the cc list for this discussion as I think it may be just
confusing to the less technically focussed or otherwise involved people who
would rather just hear the answer.

I am not sure I totally follow you. Currently InfoItem.category property has
a range of DefinedTerm which means anything that it should contain an
instance of DefinedTerm - i.e. the simplified controlled vocabulary things
we are using. It should perhaps have a range of
http://rs.tdwg.org/ontology/voc/TDMTerm#TDMTerm.

Are you suggesting that we have different InfoItem child classes one for
each of the categories we are talking about (as listed here
http://rs.tdwg.org/ontology/voc/TDMTerm)?

So we would have

  <owl:Class rdf:ID="EvolutionInfoItem">
	<rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/
Base#InfoItem"/>
</owl:Class>

and

  <owl:Class rdf:ID="BehaviourInfoItem">
	<rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/
Base#InfoItem"/>
</owl:Class>

etc.

And that instance data would look like this:

<bii:BehaviourInfoItem>
	<ii:hasContent>Some stuff about behaviour</ii:hasContent>
</bii:Behaviour>

If you had evolutionary-behaviour data you might do this

<bii:BehaviourInfoItem rdf:about="1233">
	<rdf:type rdf:resource="http://rs.tdwg.org/ontology/voc/
EvolutionInfoItem#EvolutionInfoItem"/>
	<ii:hasContent>Some stuff about evolution of
behaviour</ii:hasContent> </bii:Behaviour>

Here we say that the thing 1233 is an instance of both classes.

The same instance data the way we came up with it at the meeting would look
like this:

<tdm:InfoItem>
	<tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Evolution"/>
	<tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Behaviour"/>
	<tdm:hasContent>Some stuff about evolution of behaviour</
tdm:hasContent> </tdm:InfoItem>

The attraction of doing it this way to me (and I think Donald suggested it)
was that it is easy to write a client that will digest InfoItems without
knowing what they are. If the client hadn't heard of Behaviour it could do
nothing with a class based examples unless it was capable of exploring the
class hierarchy and finding something it did know about and even then there
may have been restrictions on the properties that it didn't understand.
Effectively every client would need to know OWL.

If an application (particular thematic network) really did want a
BehaviourInfoItem class it could define one itself.

<owl:Class rdf:ID="BehaviourInfoItem">
   <rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/
Base#InfoItem"/>
   <owl:equivalentClass>
     <owl:Restriction>
       <owl:onProperty rdf:resource="tdm:category" />
       <owl:hasValue rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Behaviour" />
     </owl:Restriction>
   </owl:equivalentClass>
</owl:Class>

Which I believe would give the same inferences that would be found by going
with subclasses (though I am no expert).

The important thing is that we keep the instance data as simple and stable
as possible and impose meaning later.

If we were working in a pure semantic web world I would be more inclined to
go down the class based route but we have to also deal with instance data as
if it were plain XML documents that we can use through TAPIR, validate with
XML Schema and transform with XSLT.

We could always change the definitions of the tdm:hasValue property (and the
others) so that they inherit from a high level property.  
This kind of change is good because it doesn't affect the instance data.

Have I understood your points correctly or have I just gone off on a circle
explaining something that is completely off track?

All the best,

Roger

--------------  from Marcus ---------------------------------------

Roger,
thanks for this. The wiki guide really is a good advice and we should
probably not use inheritance to model the domain ontology. We might use it
carefully for more "technical" decisions like shared "global"  
properties, e.g. in the case of the Base#DefinedTerm class you created to
derive all terms used for a controlled vocabulary from.

But I still believe there is a difference from the Cat example in the wiki
and the InfoItem class. The InfoItem class doesn't use concrete properties,
like Cat::hasMarkings in your example, but rather uses a very flexible,
generic property Info::hasValue or Info::hasContent.  
And exactly that abstraction makes me feel uncomfortable. We have to use
another property "Info::category" to give semantics to the other
value/content property. I doubt that any reasoner understands that (even if
dont make use of them).

The alternatives in my previous message don't have to use much of
inheritance in any case.
Applying A (using a common InfoItem Base Class) leaves us with the same
situation that we have now. Instead of deriving terms in the domain ontology
from Base#DefinedTerm we derive them from Base#InfoItem. And voila, we dont
need the category property anymore.

Applying "pattern B", i.e. deriving all properties from a base property,
doesn't mean we have to use inheritance to model the domain. We can still
derive all properties from a basic hasFact property for example. In this
case (which I still feel is the most natural way of doing this) hasSize and
hasDescription exist in parallel, but you would at least know they are a
property of a taxon and that they have a value (either free text or a term
from a list, using datatype or object property respectively). We would use
inheritance mainly as a "technical" mean and not to model a hierarchy of
properties for a taxon.

--
Markus



On 24.04.2007, at 18:54, Roger Hyam wrote:

>
> Hi Markus,
>
> I am glad you like it. Any resemblance to the Fact stuff in ABCD is 
> purely accidental ;).
>
>> (1) The first simple question is why we need a TDM class at all.  
>> Wouldn't it be sufficient to add an aboutTaxon property to the 
>> InfoItem class?
>
> This is the easy question. We need a container for sets of InfoItems 
> so that they can all be tagged with the same metadata.
> Principle use case might be to get a set of info about a particular 
> taxon from a single provider and render it as a web page.
>
>> (2) If I understand TDM correctly, the TDM InformationItem class is 
>> kept independent from the real "domain" ontology (the fact category), 
>> which is linked from an InfoItem instance via the category property. 
>> On the other hand the property category is not part of the OWL 
>> language, so no reasoner will understand that the hasValue/hasContent 
>> property of an InfoItem instance really belongs to the domain 
>> ontology class.
>
> This raises a general modeling point and makes me realize that we 
> haven't written it down anywhere. I spent some time talking to Rob 
> Gales about it last year. I have written a wiki page that I hope 
> explains it.
>
> http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot
>
> Please take a look, edit and feedback - perhaps to the TAG list.
>
> Subclassing properties is probably out as we have to allow for naive 
> implementations wherever possible.
>
> This may be way too techie for most of the audience. If we did change 
> the way we modeled it it may not have big implementations for the 
> 'domain experts' unless we ask them to produce a class hierarchy - 
> which may slow them down.
>
> Hope this helps,
>
> Roger
>
>
>
> On 23 Apr 2007, at 16:52, Markus Döring wrote:
>
>> Dear all,
>> first of all I'd like to thank you guys for coming up with a nice RDF 
>> ontology for TDM!
>>
>> When taking a first look at it I couldn't exactly understand some 
>> modelling decisions though, so I would be happy if someone could shed 
>> light on my questions below that are based on this ontology:
>> http://rs.tdwg.org/ontology/voc/TaxonDataModel.rdf
>>
>> I am still new to OWL ontologies, so I hope the following questions 
>> do not sound totally stupid.
>>
>>
>> (1) The first simple question is why we need a TDM class at all.  
>> Wouldn't it be sufficient to add an aboutTaxon property to the 
>> InfoItem class?
>>
>>
>> (2) If I understand TDM correctly, the TDM InformationItem class is 
>> kept independent from the real "domain" ontology (the fact category), 
>> which is linked from an InfoItem instance via the category property. 
>> On the other hand the property category is not part of the OWL 
>> language, so no reasoner will understand that the hasValue/hasContent 
>> property of an InfoItem instance really belongs to the domain 
>> ontology class.
>>
>> I can see two alternatives to this, so Im curious to know what you 
>> think about them. In case you have discussed them already, could 
>> someone explain to me why they were considered less appropriate?
>>
>>
>> Alternative A) - InfoItem Base Class
>> Derive all domain ontology classes from an InfoItem base class with 
>> properties (context, hasValue, ...)
>>
>>
>> Alternative B) - TCS Object Properties Define the domain ontology 
>> mainly as object properties (similar to dublin core) that have a 
>> rdfs:domain=tcs:TaxonConcept and an rdfs:range that points to an 
>> InfoItem like base class that allows for context. We can define 
>> different InfoItem derived classes as ranges for some properties, 
>> allowing us to enforce free text
>> (hasValue) or some specific controlled vocabulary (hasContent).  
>> The properties can also use inheritance to allow for broad and 
>> specialized descriptors. The TaxonConcept class already has some 
>> properties i.e. describedBy (for descriptive text) and 
>> circumscribedBy (specimen). These two properties could already serve 
>> as base properties to create specialised descriptors via 
>> rdfs:subPropertyOf.
>>
>>
>> Well, as always there are many ways of doing the same thing.
>> Best wishes
>> Markus
>>
>>
>> --
>> Markus Döring
>>   Botanic Garden and Botanical Museum Berlin Dahlem,
>>   Dept. of Biodiversity Informatics
>>   Königin-Luise-Str. 6-8, D-14191 Berlin
>>   +49 (30) 83850-284
>>   m.doering at bgbm.org
>>
>>
>>
>>
>>
>

________________________________________________________________
Éamonn Ó Tuama, Ph.D. (eotuama at gbif.org), 
Senior Programme Officer for Data Access & Database Interoperability (DADI),
Global Biodiversity Information Facility Secretariat,
Universitetsparken 15, DK-2100 Copenhagen Ø, DENMARK 
Phone:  +45 3532 1494; Fax:  +45 3532 1480 




More information about the tdwg-content mailing list