Hello Roger, Markus was right on his comment. I wasn't thinking about any particular implementation of TAPIR, I just wanted to warn about some implications of using generic models like TDM in the TAPIR context. Take DarwinCore as an example: * Most providers of specimen data use relational databases where Darwin concepts correspond to table columns, so the mapping process is easier. * If I'm a client and I'm interested in providers that have content for lat/long, I can just inspect the capabilities response to see if they mapped the corresponding concepts. * Since we have different concept ids for each kind of data, we have more possibilites when designing output models. Now if I understood correctly, TDM is so generic that the same kind of model could be used for DarwinCore - just replace the TDM terms by the Darwin concepts. And although there's nothing intrinsically wrong with this approach: * If most providers will have databases where TDM categories correspond to table columns, then they will need to prepare a super view to make all data appear under a single InfoItem column, just beside another column with the corresponding category value. It's possible, but it's more work for providers and performance will not be good. * If I'm a client and I'm interested in providers that have content for habitat, I cannot simply inspect the capabilities response, because it will just show me that the providers have InfoItems. I'll need to send additional search/inventory requests to discover what kind of data is available. * Since there will be only a few generic concepts, output models will be very limited in TAPIR. As you know, at the moment we cannot have conditional mappings in TAPIR, for instance: InfoItem corresponds to element habitat only when category equals habitat. I'm not against generic models. I also used them myself in specific circumstances like meta modelling applications, or when the application had such a mutable nature that it was better to use a more generic approach (even at the cost of performance penalties and other additional work). Anyway, I'm not quite familiar with species-level data sources. From the previous messages, it seems that the main reason for using the generic tagging approach is that most data sources will have chunks of text including information about one or more TDM categories, and it will be impractical to separate this information in a more structured way. Did I understand the problem correctly? In this case, then you're right that it would be interesting if someone could investigate this a bit more, make some tests and give us a more practical feedback. If most participants of the species model workshop have this kind of database, maybe they could try to map their fields to the TDM categories. Best Regards, -- Renato
Hi Markus,
I am replying to this and cc'ing the TAG list because I really think we should be having the discussion there. I am sure there are other people who might like to be involve from a technical stand point. I hope they can read this message thread backwards to catch up.
If I can summarize:
We are talking about the data models that were dreamt up at the SpeciesDataModel workshop
http://rs.tdwg.org/ontology/voc/TaxonDataModel http://rs.tdwg.org/ontology/voc/TDMTerm
The choice is whether to have an inherited hierarchy of classes of object to represent information items or to have a single information item and 'tag' it with categories (instances).
Having info items as different classes means that they would be possibly be clearer in a straight serialization.
<tdm:TaxonDataModel> <tdm:aboutTaxon>.....</tdm:aboutTaxon> <tdm:hasInformation> <tdmt:Behaviour> <tdmt:hasContent>Some stuff about behaviour</tdmt:hasContent> </tdmt:Behaviour> </tdm:hasInformation> <tdm:hasInformation> <tdmt:Evolution> <tdmt:hasContent>Some stuff about evolution</tdmt:hasContent> </tdmt:Evolutionr> </tdm:hasInformation> <tdm:hasInformation> <tdmt:BehaviouralEvolution> <tdmt:hasContent>Some stuff about evolution of behaviour</ tdmt:hasContent> </tdmt:BehaviouralEvolution> </tdm:hasInformation> </tdm:TaxonDataModel>
But taking the tagging approach:
<tdm:TaxonDataModel> <tdm:aboutTaxon>.....</tdm:aboutTaxon> <tdm:hasInformation> <tdm:InfoItem> <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour"/> <tdmt:hasContent>Some stuff about behaviour</tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> <tdm:hasInformation> <tdm:InfoItem> <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> <tdmt:hasContent>Some stuff about evolution</tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> <tdm:hasInformation> <tdm:InfoItem> <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Behaviour"/> <tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> <tdmt:hasContent>Some stuff about evolution of behaviour</ tdmt:hasContent> </tdm:InfoItem> </tdm:hasInformation> </tdm:TaxonDataModel>
Renato raised questions about serving that tagged version with TAPIR by which I think he meant TAPIRLink as it would not be possible to do the above example as a flat schema. This is the same problem as serving multiple identifications for a specimen I guess - is this right?
Reminds me of the point I think Markus raised it at the beginning. Why not have InfoItem as the top level element and move the taxon into it?
<InfoItem> <aboutTaxon>...</aboutTaxon> <category rdf:resource="http://rs.tdwg.org/ontology/voc/ TDMTerm#Evolution"/> <hasContent>Some stuff about evolution</hasContent> </InfoItem>
Info item is then like a DwC record and the category property is like the BasisOfRecord. (TAPIRLink couldn't do multiple category stuff).
The argument against this is that the metadata would have to be repeated for multiple InfoItems. Most requests would be for multiple InfoItems about the same species - I guess but I really need clearer examples as to what this will be applied to. Who is going to implement this in the near future? Perhaps they should have a go and decide? Isn't Wouter doing something on it? I don't have the time just now to try out some examples and I think that is what is needed.
What does everyone else think?
All the best,
Roger