Hi Markus,
I am replying to this and cc'ing the TAG list because I really think we should be having the discussion there. I am sure there are other people who might like to be involve from a technical stand point. I hope they can read this message thread backwards to catch up.
If I can summarize:
We are talking about the data models that were dreamt up at the SpeciesDataModel workshop
http://rs.tdwg.org/ontology/voc/TaxonDataModel http://rs.tdwg.org/ontology/voc/TDMTerm
The choice is whether to have an inherited hierarchy of classes of object to represent information items or to have a single information item and 'tag' it with categories (instances).
Having info items as different classes means that they would be possibly be clearer in a straight serialization.
tdm:TaxonDataModel tdm:aboutTaxon.....</tdm:aboutTaxon> tdm:hasInformation tdmt:Behaviour tdmt:hasContentSome stuff about behaviour</tdmt:hasContent> </tdmt:Behaviour> </tdm:hasInformation> tdm:hasInformation tdmt:Evolution tdmt:hasContentSome stuff about evolution</tdmt:hasContent> </tdmt:Evolutionr> </tdm:hasInformation> tdm:hasInformation tdmt:BehaviouralEvolution tdmt:hasContentSome stuff about evolution of behaviour</ tdmt:hasContent> </tdmt:BehaviouralEvolution> </tdm:hasInformation> </tdm:TaxonDataModel>
But taking the tagging approach:
tdm:TaxonDataModel tdm:aboutTaxon.....</tdm:aboutTaxon> tdm:hasInformation tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour"/> tdmt:hasContentSome stuff about behaviour</tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> tdm:hasInformation tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> tdmt:hasContentSome stuff about evolution</tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> tdm:hasInformation tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour"/> <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> tdmt:hasContentSome stuff about evolution of behaviour</ tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> </tdm:TaxonDataModel>
Renato raised questions about serving that tagged version with TAPIR by which I think he meant TAPIRLink as it would not be possible to do the above example as a flat schema. This is the same problem as serving multiple identifications for a specimen I guess - is this right?
Reminds me of the point I think Markus raised it at the beginning. Why not have InfoItem as the top level element and move the taxon into it?
<InfoItem> <aboutTaxon>...</aboutTaxon> <category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> <hasContent>Some stuff about evolution</hasContent> </InfoItem>
Info item is then like a DwC record and the category property is like the BasisOfRecord. (TAPIRLink couldn't do multiple category stuff).
The argument against this is that the metadata would have to be repeated for multiple InfoItems. Most requests would be for multiple InfoItems about the same species - I guess but I really need clearer examples as to what this will be applied to. Who is going to implement this in the near future? Perhaps they should have a go and decide? Isn't Wouter doing something on it? I don't have the time just now to try out some examples and I think that is what is needed.
What does everyone else think?
All the best,
Roger
On 4 May 2007, at 10:56, Markus Döring wrote:
On 03.05.2007, at 11:18, Roger Hyam wrote:
Hi Markus,
I have downsized the cc list for this discussion as I think it may be just confusing to the less technically focussed or otherwise involved people who would rather just hear the answer.
I am not sure I totally follow you. Currently InfoItem.category property has a range of DefinedTerm which means anything that it should contain an instance of DefinedTerm - i.e. the simplified controlled vocabulary things we are using. It should perhaps have a range of http://rs.tdwg.org/ontology/voc/TDMTerm#TDMTerm.
yes, so the definition of what the infoitem is about is a separate ontology, thats what I naively called "domain" ontology before. This can be a simple list of terms or a hierarchical list. I suspect you aim at the flat list to avoid inheritance for reasons given in the TAG wiki.
Are you suggesting that we have different InfoItem child classes one for each of the categories we are talking about (as listed here http://rs.tdwg.org/ontology/voc/TDMTerm)?
So we would have
<owl:Class rdf:ID="EvolutionInfoItem"> <rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/ Base#InfoItem"/> </owl:Class>
and
<owl:Class rdf:ID="BehaviourInfoItem"> <rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/ Base#InfoItem"/> </owl:Class>
etc.
yes. As Renato has mentioned in a related parallel discussion this also allows us to create TAPIR models, cause an XML schema for these classes would have different element names and thus become mappable easily.
And that instance data would look like this:
bii:BehaviourInfoItem ii:hasContentSome stuff about behaviour</ii:hasContent> </bii:Behaviour>
yes, thats what I was thinking of
If you had evolutionary-behaviour data you might do this
<bii:BehaviourInfoItem rdf:about="1233"> <rdf:type rdf:resource="http://rs.tdwg.org/ontology/voc/ EvolutionInfoItem#EvolutionInfoItem"/> ii:hasContentSome stuff about evolution of behaviour</ ii:hasContent> </bii:Behaviour>
I dont understand your intention here. If you want a more specific infoitem about evolutionary-behaviour why not define a new class?
<eii:EvolutionInfoItem rdf:about="1233"> ii:hasContentSome stuff about evolution of behaviour</ ii:hasContent> </eii:EvolutionInfoItem >
Here we say that the thing 1233 is an instance of both classes.
The same instance data the way we came up with it at the meeting would look like this:
tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour"/> tdm:hasContentSome stuff about evolution of behaviour</ tdm:hasContent> </tdm:InfoItem>
alright, so one idea about having a category property is that it allows you to "tag" one fact (infoitem) with several categories. Is that a requirement you had in mind when designing TDM?
The attraction of doing it this way to me (and I think Donald suggested it) was that it is easy to write a client that will digest InfoItems without knowing what they are. If the client hadn't heard of Behaviour it could do nothing with a class based examples unless it was capable of exploring the class hierarchy and finding something it did know about and even then there may have been restrictions on the properties that it didn't understand. Effectively every client would need to know OWL.
well, not really. If all InfoItem instances are bundled through the TDMClass, you know all of the instances in there are InfoItems. And I cant see a difference for an ignorant application in not understanding the class name or not understanding the category class. For OWL aware applications on the other hand this gives extra knowledge which is not as easy to get if you have to understand the categories "domain ontology" / controlled vocabulary (Because you would need to know OWL AND know how to interpret the non-OWL category property)
If an application (particular thematic network) really did want a BehaviourInfoItem class it could define one itself.
<owl:Class rdf:ID="BehaviourInfoItem"> <rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/ Base#InfoItem"/> owl:equivalentClass owl:Restriction <owl:onProperty rdf:resource="tdm:category" /> <owl:hasValue rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour" /> </owl:Restriction> </owl:equivalentClass> </owl:Class>
Which I believe would give the same inferences that would be found by going with subclasses (though I am no expert).
The important thing is that we keep the instance data as simple and stable as possible and impose meaning later.
If we were working in a pure semantic web world I would be more inclined to go down the class based route but we have to also deal with instance data as if it were plain XML documents that we can use through TAPIR, validate with XML Schema and transform with XSLT.
exactly this will be a problem if everything is an InfoItem...
We could always change the definitions of the tdm:hasValue property (and the others) so that they inherit from a high level property. This kind of change is good because it doesn't affect the instance data.
Have I understood your points correctly or have I just gone off on a circle explaining something that is completely off track?
I think we are one the same track. Thanks for the insight, Roger!
Markus
All the best,
Roger
On 2 May 2007, at 16:21, Markus Döring wrote:
Roger, thanks for this. The wiki guide really is a good advice and we should probably not use inheritance to model the domain ontology. We might use it carefully for more "technical" decisions like shared "global" properties, e.g. in the case of the Base#DefinedTerm class you created to derive all terms used for a controlled vocabulary from.
But I still believe there is a difference from the Cat example in the wiki and the InfoItem class. The InfoItem class doesn't use concrete properties, like Cat::hasMarkings in your example, but rather uses a very flexible, generic property Info::hasValue or Info::hasContent. And exactly that abstraction makes me feel uncomfortable. We have to use another property "Info::category" to give semantics to the other value/content property. I doubt that any reasoner understands that (even if dont make use of them).
The alternatives in my previous message don't have to use much of inheritance in any case. Applying A (using a common InfoItem Base Class) leaves us with the same situation that we have now. Instead of deriving terms in the domain ontology from Base#DefinedTerm we derive them from Base#InfoItem. And voila, we dont need the category property anymore.
Applying "pattern B", i.e. deriving all properties from a base property, doesn't mean we have to use inheritance to model the domain. We can still derive all properties from a basic hasFact property for example. In this case (which I still feel is the most natural way of doing this) hasSize and hasDescription exist in parallel, but you would at least know they are a property of a taxon and that they have a value (either free text or a term from a list, using datatype or object property respectively). We would use inheritance mainly as a "technical" mean and not to model a hierarchy of properties for a taxon.
-- Markus
On 24.04.2007, at 18:54, Roger Hyam wrote:
Hi Markus,
I am glad you like it. Any resemblance to the Fact stuff in ABCD is purely accidental ;).
(1) The first simple question is why we need a TDM class at all. Wouldn't it be sufficient to add an aboutTaxon property to the InfoItem class?
This is the easy question. We need a container for sets of InfoItems so that they can all be tagged with the same metadata. Principle use case might be to get a set of info about a particular taxon from a single provider and render it as a web page.
(2) If I understand TDM correctly, the TDM InformationItem class is kept independent from the real "domain" ontology (the fact category), which is linked from an InfoItem instance via the category property. On the other hand the property category is not part of the OWL language, so no reasoner will understand that the hasValue/hasContent property of an InfoItem instance really belongs to the domain ontology class.
This raises a general modeling point and makes me realize that we haven't written it down anywhere. I spent some time talking to Rob Gales about it last year. I have written a wiki page that I hope explains it.
http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot
Please take a look, edit and feedback - perhaps to the TAG list.
Subclassing properties is probably out as we have to allow for naive implementations wherever possible.
This may be way too techie for most of the audience. If we did change the way we modeled it it may not have big implementations for the 'domain experts' unless we ask them to produce a class hierarchy - which may slow them down.
Hope this helps,
Roger
On 23 Apr 2007, at 16:52, Markus Döring wrote:
Dear all, first of all I'd like to thank you guys for coming up with a nice RDF ontology for TDM!
When taking a first look at it I couldn't exactly understand some modelling decisions though, so I would be happy if someone could shed light on my questions below that are based on this ontology: http://rs.tdwg.org/ontology/voc/TaxonDataModel.rdf
I am still new to OWL ontologies, so I hope the following questions do not sound totally stupid.
(1) The first simple question is why we need a TDM class at all. Wouldn't it be sufficient to add an aboutTaxon property to the InfoItem class?
(2) If I understand TDM correctly, the TDM InformationItem class is kept independent from the real "domain" ontology (the fact category), which is linked from an InfoItem instance via the category property. On the other hand the property category is not part of the OWL language, so no reasoner will understand that the hasValue/hasContent property of an InfoItem instance really belongs to the domain ontology class.
I can see two alternatives to this, so Im curious to know what you think about them. In case you have discussed them already, could someone explain to me why they were considered less appropriate?
Alternative A) - InfoItem Base Class Derive all domain ontology classes from an InfoItem base class with properties (context, hasValue, ...)
Alternative B) - TCS Object Properties Define the domain ontology mainly as object properties (similar to dublin core) that have a rdfs:domain=tcs:TaxonConcept and an rdfs:range that points to an InfoItem like base class that allows for context. We can define different InfoItem derived classes as ranges for some properties, allowing us to enforce free text (hasValue) or some specific controlled vocabulary (hasContent). The properties can also use inheritance to allow for broad and specialized descriptors. The TaxonConcept class already has some properties i.e. describedBy (for descriptive text) and circumscribedBy (specimen). These two properties could already serve as base properties to create specialised descriptors via rdfs:subPropertyOf.
Well, as always there are many ways of doing the same thing. Best wishes Markus
-- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin +49 (30) 83850-284 m.doering@bgbm.org
I still hope that the future model will be usable with xml-schema (w3c or whatever) as well as with OWL, I would vote for element names. The tagging approach means the the content model can not be elaborated in a way that allows validation (schema will not react to values, which the tags are).
I believe tagging is good if we want to make sure that the content model cannot be validated, so software will not make a contract about it. UBIF/SDD had purposely avoided element names and choose a tagging approach where we were looking for simple extensibility that is not expected to be validated (e.g. the different types of object label we need to support).
So, it just looks to me like the kind of high level categories you are discussing may call for category-specific content in the future, in this case element names are safer, whereever we go in the future.
I am not really into the discussion and would need some place to read up on what Infoitem, TaxonDataModel etc. is. (there is naught on the Wiki...)
Gregor
---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
Hi Everyone,
Well it seems like everyone (apart from me!) thinks the subclass route is the way to go. Perhaps someone with some real data to serve and problems to solve would like to take it forward along this route. Perhaps set up a wiki page and work out what the model should look like.
I know Eamonn was looking into resources for the group but I think he is away from his desk just at the moment.
If anyone is keen to get started please use the TAG wiki. Pages can always be moved across if required.
From an architectural point of view as long as the model is 'frame base' (i.e. we can easily identify classes and their properties) I'll be happy.
All the best,
Roger
On 4 May 2007, at 12:20, Gregor Hagedorn wrote:
I still hope that the future model will be usable with xml-schema (w3c or whatever) as well as with OWL, I would vote for element names. The tagging approach means the the content model can not be elaborated in a way that allows validation (schema will not react to values, which the tags are).
I believe tagging is good if we want to make sure that the content model cannot be validated, so software will not make a contract about it. UBIF/ SDD had purposely avoided element names and choose a tagging approach where we were looking for simple extensibility that is not expected to be validated (e.g. the different types of object label we need to support).
So, it just looks to me like the kind of high level categories you are discussing may call for category-specific content in the future, in this case element names are safer, whereever we go in the future.
I am not really into the discussion and would need some place to read up on what Infoitem, TaxonDataModel etc. is. (there is naught on the Wiki...)
Gregor
Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Königin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Hello, some small corrections below regarding my understanding -- Markus
On 04.05.2007, at 12:58, Roger Hyam wrote:
Hi Markus,
I am replying to this and cc'ing the TAG list because I really think we should be having the discussion there. I am sure there are other people who might like to be involve from a technical stand point. I hope they can read this message thread backwards to catch up.
If I can summarize:
We are talking about the data models that were dreamt up at the SpeciesDataModel workshop
http://rs.tdwg.org/ontology/voc/TaxonDataModel http://rs.tdwg.org/ontology/voc/TDMTerm
The choice is whether to have an inherited hierarchy of classes of object to represent information items or to have a single information item and 'tag' it with categories (instances).
The inherited hierarchy of classes is not needed for semantics in both approaches. We can derive *all* classes directly form the base InfoItem class. Just as you are planning to do with the controlled vocabulary terms, deriving them from BaseTerm. This kind of inheritance is more technical by nature simply passing on all properties of InfoItems. The semantical hierarchy InfoItem-
BehaviorInfoItem->EvolutionaryBehaviorInfoItem can be modelled in
both approaches if we want. But in both ways we dont have to!
Having info items as different classes means that they would be possibly be clearer in a straight serialization.
tdm:TaxonDataModel tdm:aboutTaxon.....</tdm:aboutTaxon> tdm:hasInformation tdmt:Behaviour tdmt:hasContentSome stuff about behaviour</tdmt:hasContent> </tdmt:Behaviour> </tdm:hasInformation> tdm:hasInformation tdmt:Evolution tdmt:hasContentSome stuff about evolution</tdmt:hasContent> </tdmt:Evolutionr> </tdm:hasInformation> tdm:hasInformation tdmt:BehaviouralEvolution tdmt:hasContentSome stuff about evolution of behaviour</ tdmt:hasContent> </tdmt:BehaviouralEvolution> </tdm:hasInformation> </tdm:TaxonDataModel>
But taking the tagging approach:
tdm:TaxonDataModel tdm:aboutTaxon.....</tdm:aboutTaxon> tdm:hasInformation tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour"/> tdmt:hasContentSome stuff about behaviour</tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> tdm:hasInformation tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> tdmt:hasContentSome stuff about evolution</tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> tdm:hasInformation tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour"/> <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> tdmt:hasContentSome stuff about evolution of behaviour</ tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> </tdm:TaxonDataModel>
Renato raised questions about serving that tagged version with TAPIR by which I think he meant TAPIRLink as it would not be possible to do the above example as a flat schema. This is the same problem as serving multiple identifications for a specimen I guess
- is this right?
Not really. I think the problem has to do with creating an XML schema to be used for a TAPIR model that has *different* node paths to be mapped to. Using the generic InfoItem "metamodel" you always end up with this path that you map to:
/tdm:TaxonDataModel/tdm:hasInformation/tdm:InfoItem/tdmt:hasContent
instead of having different ones when using inheritance: /tdm:TaxonDataModel/tdm:hasInformation/tdm:InfoItem/tdmt:Behaviour /tdm:TaxonDataModel/tdm:hasInformation/tdm:InfoItem/tdmt:Evolution /tdm:TaxonDataModel/tdm:hasInformation/tdm:InfoItem/ tdmt:BehaviouralEvolution
Reminds me of the point I think Markus raised it at the beginning. Why not have InfoItem as the top level element and move the taxon into it?
<InfoItem> <aboutTaxon>...</aboutTaxon> <category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> <hasContent>Some stuff about evolution</hasContent> </InfoItem>
Info item is then like a DwC record and the category property is like the BasisOfRecord. (TAPIRLink couldn't do multiple category stuff).
The argument against this is that the metadata would have to be repeated for multiple InfoItems. Most requests would be for multiple InfoItems about the same species - I guess but I really need clearer examples as to what this will be applied to. Who is going to implement this in the near future? Perhaps they should have a go and decide? Isn't Wouter doing something on it? I don't have the time just now to try out some examples and I think that is what is needed.
What does everyone else think?
All the best,
Roger
On 4 May 2007, at 10:56, Markus Döring wrote:
On 03.05.2007, at 11:18, Roger Hyam wrote:
Hi Markus,
I have downsized the cc list for this discussion as I think it may be just confusing to the less technically focussed or otherwise involved people who would rather just hear the answer.
I am not sure I totally follow you. Currently InfoItem.category property has a range of DefinedTerm which means anything that it should contain an instance of DefinedTerm - i.e. the simplified controlled vocabulary things we are using. It should perhaps have a range of http://rs.tdwg.org/ontology/voc/TDMTerm#TDMTerm.
yes, so the definition of what the infoitem is about is a separate ontology, thats what I naively called "domain" ontology before. This can be a simple list of terms or a hierarchical list. I suspect you aim at the flat list to avoid inheritance for reasons given in the TAG wiki.
Are you suggesting that we have different InfoItem child classes one for each of the categories we are talking about (as listed here http://rs.tdwg.org/ontology/voc/TDMTerm)?
So we would have
<owl:Class rdf:ID="EvolutionInfoItem"> <rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/ Base#InfoItem"/> </owl:Class>
and
<owl:Class rdf:ID="BehaviourInfoItem"> <rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/ Base#InfoItem"/> </owl:Class>
etc.
yes. As Renato has mentioned in a related parallel discussion this also allows us to create TAPIR models, cause an XML schema for these classes would have different element names and thus become mappable easily.
And that instance data would look like this:
bii:BehaviourInfoItem ii:hasContentSome stuff about behaviour</ii:hasContent> </bii:Behaviour>
yes, thats what I was thinking of
If you had evolutionary-behaviour data you might do this
<bii:BehaviourInfoItem rdf:about="1233"> <rdf:type rdf:resource="http://rs.tdwg.org/ontology/voc/ EvolutionInfoItem#EvolutionInfoItem"/> ii:hasContentSome stuff about evolution of behaviour</ ii:hasContent> </bii:Behaviour>
I dont understand your intention here. If you want a more specific infoitem about evolutionary-behaviour why not define a new class?
<eii:EvolutionInfoItem rdf:about="1233"> ii:hasContentSome stuff about evolution of behaviour</ ii:hasContent> </eii:EvolutionInfoItem >
Here we say that the thing 1233 is an instance of both classes.
The same instance data the way we came up with it at the meeting would look like this:
tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour"/> tdm:hasContentSome stuff about evolution of behaviour</ tdm:hasContent> </tdm:InfoItem>
alright, so one idea about having a category property is that it allows you to "tag" one fact (infoitem) with several categories. Is that a requirement you had in mind when designing TDM?
The attraction of doing it this way to me (and I think Donald suggested it) was that it is easy to write a client that will digest InfoItems without knowing what they are. If the client hadn't heard of Behaviour it could do nothing with a class based examples unless it was capable of exploring the class hierarchy and finding something it did know about and even then there may have been restrictions on the properties that it didn't understand. Effectively every client would need to know OWL.
well, not really. If all InfoItem instances are bundled through the TDMClass, you know all of the instances in there are InfoItems. And I cant see a difference for an ignorant application in not understanding the class name or not understanding the category class. For OWL aware applications on the other hand this gives extra knowledge which is not as easy to get if you have to understand the categories "domain ontology" / controlled vocabulary (Because you would need to know OWL AND know how to interpret the non-OWL category property)
If an application (particular thematic network) really did want a BehaviourInfoItem class it could define one itself.
<owl:Class rdf:ID="BehaviourInfoItem"> <rdfs:subClassOf rdf:resource="http://rs.tdwg.org/ontology/ Base#InfoItem"/> owl:equivalentClass owl:Restriction <owl:onProperty rdf:resource="tdm:category" /> <owl:hasValue rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour" /> </owl:Restriction> </owl:equivalentClass> </owl:Class>
Which I believe would give the same inferences that would be found by going with subclasses (though I am no expert).
The important thing is that we keep the instance data as simple and stable as possible and impose meaning later.
If we were working in a pure semantic web world I would be more inclined to go down the class based route but we have to also deal with instance data as if it were plain XML documents that we can use through TAPIR, validate with XML Schema and transform with XSLT.
exactly this will be a problem if everything is an InfoItem...
We could always change the definitions of the tdm:hasValue property (and the others) so that they inherit from a high level property. This kind of change is good because it doesn't affect the instance data.
Have I understood your points correctly or have I just gone off on a circle explaining something that is completely off track?
I think we are one the same track. Thanks for the insight, Roger!
Markus
All the best,
Roger
On 2 May 2007, at 16:21, Markus Döring wrote:
Roger, thanks for this. The wiki guide really is a good advice and we should probably not use inheritance to model the domain ontology. We might use it carefully for more "technical" decisions like shared "global" properties, e.g. in the case of the Base#DefinedTerm class you created to derive all terms used for a controlled vocabulary from.
But I still believe there is a difference from the Cat example in the wiki and the InfoItem class. The InfoItem class doesn't use concrete properties, like Cat::hasMarkings in your example, but rather uses a very flexible, generic property Info::hasValue or Info::hasContent. And exactly that abstraction makes me feel uncomfortable. We have to use another property "Info::category" to give semantics to the other value/content property. I doubt that any reasoner understands that (even if dont make use of them).
The alternatives in my previous message don't have to use much of inheritance in any case. Applying A (using a common InfoItem Base Class) leaves us with the same situation that we have now. Instead of deriving terms in the domain ontology from Base#DefinedTerm we derive them from Base#InfoItem. And voila, we dont need the category property anymore.
Applying "pattern B", i.e. deriving all properties from a base property, doesn't mean we have to use inheritance to model the domain. We can still derive all properties from a basic hasFact property for example. In this case (which I still feel is the most natural way of doing this) hasSize and hasDescription exist in parallel, but you would at least know they are a property of a taxon and that they have a value (either free text or a term from a list, using datatype or object property respectively). We would use inheritance mainly as a "technical" mean and not to model a hierarchy of properties for a taxon.
-- Markus
On 24.04.2007, at 18:54, Roger Hyam wrote:
Hi Markus,
I am glad you like it. Any resemblance to the Fact stuff in ABCD is purely accidental ;).
(1) The first simple question is why we need a TDM class at all. Wouldn't it be sufficient to add an aboutTaxon property to the InfoItem class?
This is the easy question. We need a container for sets of InfoItems so that they can all be tagged with the same metadata. Principle use case might be to get a set of info about a particular taxon from a single provider and render it as a web page.
(2) If I understand TDM correctly, the TDM InformationItem class is kept independent from the real "domain" ontology (the fact category), which is linked from an InfoItem instance via the category property. On the other hand the property category is not part of the OWL language, so no reasoner will understand that the hasValue/hasContent property of an InfoItem instance really belongs to the domain ontology class.
This raises a general modeling point and makes me realize that we haven't written it down anywhere. I spent some time talking to Rob Gales about it last year. I have written a wiki page that I hope explains it.
http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot
Please take a look, edit and feedback - perhaps to the TAG list.
Subclassing properties is probably out as we have to allow for naive implementations wherever possible.
This may be way too techie for most of the audience. If we did change the way we modeled it it may not have big implementations for the 'domain experts' unless we ask them to produce a class hierarchy - which may slow them down.
Hope this helps,
Roger
On 23 Apr 2007, at 16:52, Markus Döring wrote:
Dear all, first of all I'd like to thank you guys for coming up with a nice RDF ontology for TDM!
When taking a first look at it I couldn't exactly understand some modelling decisions though, so I would be happy if someone could shed light on my questions below that are based on this ontology: http://rs.tdwg.org/ontology/voc/ TaxonDataModel.rdf
I am still new to OWL ontologies, so I hope the following questions do not sound totally stupid.
(1) The first simple question is why we need a TDM class at all. Wouldn't it be sufficient to add an aboutTaxon property to the InfoItem class?
(2) If I understand TDM correctly, the TDM InformationItem class is kept independent from the real "domain" ontology (the fact category), which is linked from an InfoItem instance via the category property. On the other hand the property category is not part of the OWL language, so no reasoner will understand that the hasValue/hasContent property of an InfoItem instance really belongs to the domain ontology class.
I can see two alternatives to this, so Im curious to know what you think about them. In case you have discussed them already, could someone explain to me why they were considered less appropriate?
Alternative A) - InfoItem Base Class Derive all domain ontology classes from an InfoItem base class with properties (context, hasValue, ...)
Alternative B) - TCS Object Properties Define the domain ontology mainly as object properties (similar to dublin core) that have a rdfs:domain=tcs:TaxonConcept and an rdfs:range that points to an InfoItem like base class that allows for context. We can define different InfoItem derived classes as ranges for some properties, allowing us to enforce free text (hasValue) or some specific controlled vocabulary (hasContent). The properties can also use inheritance to allow for broad and specialized descriptors. The TaxonConcept class already has some properties i.e. describedBy (for descriptive text) and circumscribedBy (specimen). These two properties could already serve as base properties to create specialised descriptors via rdfs:subPropertyOf.
Well, as always there are many ways of doing the same thing. Best wishes Markus
-- Markus Döring Botanic Garden and Botanical Museum Berlin Dahlem, Dept. of Biodiversity Informatics Königin-Luise-Str. 6-8, D-14191 Berlin +49 (30) 83850-284 m.doering@bgbm.org
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Hello Roger,
Markus was right on his comment. I wasn't thinking about any particular implementation of TAPIR, I just wanted to warn about some implications of using generic models like TDM in the TAPIR context.
Take DarwinCore as an example:
* Most providers of specimen data use relational databases where Darwin concepts correspond to table columns, so the mapping process is easier. * If I'm a client and I'm interested in providers that have content for lat/long, I can just inspect the capabilities response to see if they mapped the corresponding concepts. * Since we have different concept ids for each kind of data, we have more possibilites when designing output models.
Now if I understood correctly, TDM is so generic that the same kind of model could be used for DarwinCore - just replace the TDM terms by the Darwin concepts. And although there's nothing intrinsically wrong with this approach:
* If most providers will have databases where TDM categories correspond to table columns, then they will need to prepare a super view to make all data appear under a single InfoItem column, just beside another column with the corresponding category value. It's possible, but it's more work for providers and performance will not be good. * If I'm a client and I'm interested in providers that have content for habitat, I cannot simply inspect the capabilities response, because it will just show me that the providers have InfoItems. I'll need to send additional search/inventory requests to discover what kind of data is available. * Since there will be only a few generic concepts, output models will be very limited in TAPIR. As you know, at the moment we cannot have conditional mappings in TAPIR, for instance: InfoItem corresponds to element habitat only when category equals habitat.
I'm not against generic models. I also used them myself in specific circumstances like meta modelling applications, or when the application had such a mutable nature that it was better to use a more generic approach (even at the cost of performance penalties and other additional work).
Anyway, I'm not quite familiar with species-level data sources. From the previous messages, it seems that the main reason for using the generic tagging approach is that most data sources will have chunks of text including information about one or more TDM categories, and it will be impractical to separate this information in a more structured way. Did I understand the problem correctly?
In this case, then you're right that it would be interesting if someone could investigate this a bit more, make some tests and give us a more practical feedback. If most participants of the species model workshop have this kind of database, maybe they could try to map their fields to the TDM categories.
Best Regards, -- Renato
Hi Markus,
I am replying to this and cc'ing the TAG list because I really think we should be having the discussion there. I am sure there are other people who might like to be involve from a technical stand point. I hope they can read this message thread backwards to catch up.
If I can summarize:
We are talking about the data models that were dreamt up at the SpeciesDataModel workshop
http://rs.tdwg.org/ontology/voc/TaxonDataModel http://rs.tdwg.org/ontology/voc/TDMTerm
The choice is whether to have an inherited hierarchy of classes of object to represent information items or to have a single information item and 'tag' it with categories (instances).
Having info items as different classes means that they would be possibly be clearer in a straight serialization.
tdm:TaxonDataModel tdm:aboutTaxon.....</tdm:aboutTaxon> tdm:hasInformation tdmt:Behaviour tdmt:hasContentSome stuff about behaviour</tdmt:hasContent> </tdmt:Behaviour> </tdm:hasInformation> tdm:hasInformation tdmt:Evolution tdmt:hasContentSome stuff about evolution</tdmt:hasContent> </tdmt:Evolutionr> </tdm:hasInformation> tdm:hasInformation tdmt:BehaviouralEvolution tdmt:hasContentSome stuff about evolution of behaviour</ tdmt:hasContent> </tdmt:BehaviouralEvolution> </tdm:hasInformation> </tdm:TaxonDataModel>
But taking the tagging approach:
tdm:TaxonDataModel tdm:aboutTaxon.....</tdm:aboutTaxon> tdm:hasInformation tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour"/> tdmt:hasContentSome stuff about behaviour</tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> tdm:hasInformation tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> tdmt:hasContentSome stuff about evolution</tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> tdm:hasInformation tdm:InfoItem <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour"/> <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> tdmt:hasContentSome stuff about evolution of behaviour</ tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> </tdm:TaxonDataModel>
Renato raised questions about serving that tagged version with TAPIR by which I think he meant TAPIRLink as it would not be possible to do the above example as a flat schema. This is the same problem as serving multiple identifications for a specimen I guess - is this right?
Reminds me of the point I think Markus raised it at the beginning. Why not have InfoItem as the top level element and move the taxon into it?
<InfoItem> <aboutTaxon>...</aboutTaxon> <category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> <hasContent>Some stuff about evolution</hasContent> </InfoItem>
Info item is then like a DwC record and the category property is like the BasisOfRecord. (TAPIRLink couldn't do multiple category stuff).
The argument against this is that the metadata would have to be repeated for multiple InfoItems. Most requests would be for multiple InfoItems about the same species - I guess but I really need clearer examples as to what this will be applied to. Who is going to implement this in the near future? Perhaps they should have a go and decide? Isn't Wouter doing something on it? I don't have the time just now to try out some examples and I think that is what is needed.
What does everyone else think?
All the best,
Roger
Renato De Giovanni wrote:
Anyway, I'm not quite familiar with species-level data sources. From the previous messages, it seems that the main reason for using the generic tagging approach is that most data sources will have chunks of text including information about one or more TDM categories, and it will be impractical to separate this information in a more structured way. Did I understand the problem correctly?
Yes, but it is worse. Many such sources have \both/ textual---but categorized---data and structured data. And both may need ontological mapping so that both machine integration and human display applications have a chance of putting together the right stuff and also not ignoring what the client wishes not be ignored.
In this case, then you're right that it would be interesting if someone could investigate this a bit more, make some tests and give us a more practical feedback. If most participants of the species model workshop have this kind of database, maybe they could try to map their fields to the TDM categories.
I am presently doing some of that, albeit first trying to hand code some instances with Protege and Altova SemanticWorks. I guess the interesting part will come for stuff that \doesn't/ map well. At the moment, I am somewhat at a loss for what our intent was in this case, but maybe in another few hours I will have figured that out. ...
Bob
Best Regards,
Renato
OK, so I "think" I'm starting to understand the problem that has led to the current approach taken by the taxon data model.
Trying to make a quite simplified analogy with specimen data, imagine a collection that used a simple OCR process in all labels and now it has only one table with a single textual field. Since we can find different things in labels (some may have coordinates, others not, some may have collecting date, etc.) the suggestion for this kind situation would then be to tag all records individually. For instance saying that "in this specific record you may find something about the location, in this other record you may find something about location and date" and so on.
Now back to species databases, if tagging is really something at the record level (and I suppose it is), I would be really surprised to see a species database which is ready to use some kind of tagging mechanism. Tagging at the record level would therefore require changing the data structure and revising all records.
If this kind of work is being considered, then why not restructure everything according to the new terminology that was proposed during the meeting? Unless we are talking about some kind of data that simply cannot be separated and structured according to the proposed terms...
Looking at the results of the meeting, it's really tempting to take all terms and put them into a simple conceptual schema like DarwinCore. It would not only provide a common XML vocabulary, but we would almost instantly benefit from the existing technology for sharing/accessing distributed data. From the TAPIR perspective, data exchange schemas like PlinianCore could be seen as output models. All providers from the different networks could still try to map the same agreed terms/concepts.
If tagging will not take place at the record level, but at the field level, like "I have field X which sometimes has content about behaviour, sometimes about evolution, sometimes both, so I will automatically tag all records with both terms", then I see no big difference if in the current way of using TAPIR we just take the two corresponding concepts and map them against the same local field.
RDF could still be one of the TAPIR outputs, but the ontology would probably need a different approach (as discussed in previous messages).
Best Regards, -- Renato
On 4 May 2007 at 13:16, Bob Morris wrote:
Anyway, I'm not quite familiar with species-level data sources. From the previous messages, it seems that the main reason for using the generic tagging approach is that most data sources will have chunks of text including information about one or more TDM categories, and it will be impractical to separate this information in a more structured way. Did I understand the problem correctly?
Yes, but it is worse. Many such sources have \both/ textual---but categorized---data and structured data. And both may need ontological mapping so that both machine integration and human display applications have a chance of putting together the right stuff and also not ignoring what the client wishes not be ignored.
In this case, then you're right that it would be interesting if someone could investigate this a bit more, make some tests and give us a more practical feedback. If most participants of the species model workshop have this kind of database, maybe they could try to map their fields to the TDM categories.
I am presently doing some of that, albeit first trying to hand code some instances with Protege and Altova SemanticWorks. I guess the interesting part will come for stuff that \doesn't/ map well. At the moment, I am somewhat at a loss for what our intent was in this case, but maybe in another few hours I will have figured that out. ...
Bob
Best Regards,
Renato
Renato,
Thanks for your comments. That is an interesting view of the problem and I think you may be correct for the supplier databases (though I don't have first hand knowledge of these database schemas). Generally the nearer the exchange format is to the supplier's schema the easier it will be for them to publish. Taking the approach Markus suggests would produce the result you are after I believe.
There is just one problem that you didn't address.
Who wants to consume the data and what do they want to do with it?
To have something that is easy to produce, easy to consume and easy to extend is more or less impossible. There has to be some pain somewhere!
What is your vision of a client application? How would it handle elements it hadn't seen before - or is this not a requirement?
All the best,
Roger
On 7 May 2007, at 17:35, Renato De Giovanni wrote:
OK, so I "think" I'm starting to understand the problem that has led to the current approach taken by the taxon data model.
Trying to make a quite simplified analogy with specimen data, imagine a collection that used a simple OCR process in all labels and now it has only one table with a single textual field. Since we can find different things in labels (some may have coordinates, others not, some may have collecting date, etc.) the suggestion for this kind situation would then be to tag all records individually. For instance saying that "in this specific record you may find something about the location, in this other record you may find something about location and date" and so on.
Now back to species databases, if tagging is really something at the record level (and I suppose it is), I would be really surprised to see a species database which is ready to use some kind of tagging mechanism. Tagging at the record level would therefore require changing the data structure and revising all records.
If this kind of work is being considered, then why not restructure everything according to the new terminology that was proposed during the meeting? Unless we are talking about some kind of data that simply cannot be separated and structured according to the proposed terms...
Looking at the results of the meeting, it's really tempting to take all terms and put them into a simple conceptual schema like DarwinCore. It would not only provide a common XML vocabulary, but we would almost instantly benefit from the existing technology for sharing/accessing distributed data. From the TAPIR perspective, data exchange schemas like PlinianCore could be seen as output models. All providers from the different networks could still try to map the same agreed terms/concepts.
If tagging will not take place at the record level, but at the field level, like "I have field X which sometimes has content about behaviour, sometimes about evolution, sometimes both, so I will automatically tag all records with both terms", then I see no big difference if in the current way of using TAPIR we just take the two corresponding concepts and map them against the same local field.
RDF could still be one of the TAPIR outputs, but the ontology would probably need a different approach (as discussed in previous messages).
Best Regards,
Renato
On 4 May 2007 at 13:16, Bob Morris wrote:
Anyway, I'm not quite familiar with species-level data sources. From the previous messages, it seems that the main reason for using the generic tagging approach is that most data sources will have chunks of text including information about one or more TDM categories, and it will be impractical to separate this information in a more structured way. Did I understand the problem correctly?
Yes, but it is worse. Many such sources have \both/ textual---but categorized---data and structured data. And both may need ontological mapping so that both machine integration and human display applications have a chance of putting together the right stuff and also not ignoring what the client wishes not be ignored.
In this case, then you're right that it would be interesting if someone could investigate this a bit more, make some tests and give us a more practical feedback. If most participants of the species model workshop have this kind of database, maybe they could try to map their fields to the TDM categories.
I am presently doing some of that, albeit first trying to hand code some instances with Protege and Altova SemanticWorks. I guess the interesting part will come for stuff that \doesn't/ map well. At the moment, I am somewhat at a loss for what our intent was in this case, but maybe in another few hours I will have figured that out. ...
Bob
Best Regards,
Renato
tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Hi Roger,
I'm not sure I share this vision of a "law of conservation of pain". It's true that one of the points in the other message was to ease the process of sharing data, but this doesn't mean that clients will necessarily have trouble (I hope not!).
From the TAPIR perspective, we handle extensibility by allowing
providers to work with multiple conceptual schemas. If you produce a list of concepts from the TDM terms, anyone is free to produce other complementary lists in the future, without breaking compatibility.
You know that in TAPIR it's also possible to produce outputs in different XML formats, even RDF. This should facilitate the work of clients.
I suppose that clients will usually request data in formats that include elements that they know something about. But anyway, nothing prevents them to request things that they don't have any knowledge about. The TapirLink browser that I demonstrated during the TAPIR workshop is one of those clients: it dynamically builds an output model based on what the provider declared to have, and it simply displays this data in a tabular form.
Now let's assume that we decide to work with a generic conceptual schema with two main concepts, category of InfoItem and InfoItem value. Let's also assume that providers will be able to easily share their data according to this conceptual model. In TAPIR, the output formats will be very limited - they will need to follow this generic approach. But let's suppose that this will not be a problem. What is going to happen is that clients will get amost anything from there - basically values of things that can be categorised in many ways. If clients want to perform validation they will need to do it themselves (the output format will be too generic, so we cannot use XML validation). Perhaps RDF validation will offer more possibilities, but then you're only considering data exchange in an RDF world. The meaning of InfoItems you would get from a dictionary of categories, in the same way that you could get the meaning of elements from a dictionary (DarwinCore for instance, or some ontology).
In this case, it's not clear to me what would be the big benefits of using the generic model approach, but maybe I'm missing something. The more knowledge you have about the elements or concepts, the more interesting and powerful the applications will be. It's a philosophical issue.
If we decide to avoid the more "traditional" way of structuring and modelling data because we feel it somehow limits our applications, then I think we first need to clearly understand what are these limitations. Otherwise, by doing things in a very different way we may miss the opportunity of using existing tools and resources - but still running the risk of facing again in a different road the same data structuring issues that we tried to avoid.
Best Wishes, -- Renato
PS: I'm sorry for crossposting. I'll send any follow-ups only to the new taxon-model mailing list: http://lists.tdwg.org/mailman/listinfo/taxon-model
On 8 May 2007 at 9:35, Roger Hyam wrote:
Renato,
Thanks for your comments. That is an interesting view of the problem and I think you may be correct for the supplier databases (though I don't have first hand knowledge of these database schemas). Generally the nearer the exchange format is to the supplier's schema the easier it will be for them to publish. Taking the approach Markus suggests would produce the result you are after I believe.
There is just one problem that you didn't address.
Who wants to consume the data and what do they want to do with it?
To have something that is easy to produce, easy to consume and easy to extend is more or less impossible. There has to be some pain somewhere!
What is your vision of a client application? How would it handle elements it hadn't seen before - or is this not a requirement?
All the best,
Roger
participants (5)
-
Bob Morris
-
Gregor Hagedorn
-
Markus Döring
-
Renato De Giovanni
-
Roger Hyam