[tdwg-tag] Re: TDM Ontology

4 May 2007

      Hello Roger,

Markus was right on his comment. I wasn't thinking about any particular
implementation of TAPIR, I just wanted to warn about some implications of
using generic models like TDM in the TAPIR context.

Take DarwinCore as an example:

* Most providers of specimen data use relational databases where Darwin
concepts correspond to table columns, so the mapping process is easier.
* If I'm a client and I'm interested in providers that have content for
lat/long, I can just inspect the capabilities response to see if they
mapped the corresponding concepts.
* Since we have different concept ids for each kind of data, we have more
possibilites when designing output models.

Now if I understood correctly, TDM is so generic that the same kind of
model could be used for DarwinCore - just replace the TDM terms by the
Darwin concepts. And although there's nothing intrinsically wrong with
this approach:

* If most providers will have databases where TDM categories correspond to
table columns, then they will need to prepare a super view to make all
data appear under a single InfoItem column, just beside another column
with the corresponding category value. It's possible, but it's more work
for providers and performance will not be good.
* If I'm a client and I'm interested in providers that have content for
habitat, I cannot simply inspect the capabilities response, because it
will just show me that the providers have InfoItems. I'll need to send
additional search/inventory requests to discover what kind of data is
available.
* Since there will be only a few generic concepts, output models will be
very limited in TAPIR. As you know, at the moment we cannot have
conditional mappings in TAPIR, for instance: InfoItem corresponds to
element habitat only when category equals habitat.

I'm not against generic models. I also used them myself in specific
circumstances like meta modelling applications, or when the application
had such a mutable nature that it was better to use a more generic
approach (even at the cost of performance penalties and other additional
work).

Anyway, I'm not quite familiar with species-level data sources. From the
previous messages, it seems that the main reason for using the generic
tagging approach is that most data sources will have chunks of text
including information about one or more TDM categories, and it will be
impractical to separate this information in a more structured way. Did I
understand the problem correctly?

In this case, then you're right that it would be interesting if someone
could investigate this a bit more, make some tests and give us a more
practical feedback. If most participants of the species model workshop
have this kind of database, maybe they could try to map their fields to
the TDM categories.

Best Regards,
--
Renato
...
Hi Markus,
I am replying to this and cc'ing the TAG list because I really think
we should be having the discussion there. I am sure there are other
people who might like to be involve from a technical stand point. I
hope they can read this message thread backwards to catch up.
If I can summarize:
We are talking about the data models that were dreamt up at the
SpeciesDataModel workshop
http://rs.tdwg.org/ontology/voc/TaxonDataModel
http://rs.tdwg.org/ontology/voc/TDMTerm
The choice is whether to have an inherited hierarchy of classes of
object to represent information items or to have a single information
item and 'tag' it with categories (instances).
Having info items as different classes means that they would be
possibly be clearer in a straight serialization.
<tdm:TaxonDataModel>
  <tdm:aboutTaxon>.....</tdm:aboutTaxon>
  <tdm:hasInformation>
      <tdmt:Behaviour>
      	<tdmt:hasContent>Some stuff about behaviour</tdmt:hasContent>
      </tdmt:Behaviour>
  </tdm:hasInformation>
  <tdm:hasInformation>
      <tdmt:Evolution>
      	<tdmt:hasContent>Some stuff about evolution</tdmt:hasContent>
      </tdmt:Evolutionr>
  </tdm:hasInformation>
  <tdm:hasInformation>
      <tdmt:BehaviouralEvolution>
      	<tdmt:hasContent>Some stuff about evolution of behaviour</
tdmt:hasContent>
      </tdmt:BehaviouralEvolution>
  </tdm:hasInformation>
</tdm:TaxonDataModel>
But taking the tagging approach:
<tdm:TaxonDataModel>
  <tdm:aboutTaxon>.....</tdm:aboutTaxon>
  <tdm:hasInformation>
      <tdm:InfoItem>
      	<tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Behaviour"/>
      	<tdmt:hasContent>Some stuff about behaviour</tdmt:hasContent>
      </tdm:InfoItem>
  </tdm:hasInformation>
  <tdm:hasInformation>
      <tdm:InfoItem>
      	<tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Evolution"/>
      	<tdmt:hasContent>Some stuff about evolution</tdmt:hasContent>
      </tdm:InfoItem>
  </tdm:hasInformation>
  <tdm:hasInformation>
      <tdm:InfoItem>
      	<tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Behaviour"/>
      	<tdm:category rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Evolution"/>
      	<tdmt:hasContent>Some stuff about evolution of behaviour</
tdmt:hasContent>
      </tdm:InfoItem>
  </tdm:hasInformation>
</tdm:TaxonDataModel>
Renato raised questions about serving that tagged version with TAPIR
by which I think he meant TAPIRLink as it would not be possible to do
the above example as a flat schema. This is the same problem as
serving multiple identifications for a specimen I guess - is this right?
Reminds me of the point  I think Markus raised it at the beginning.
Why not have InfoItem as the top level element and move the taxon
into it?
<InfoItem>
  <aboutTaxon>...</aboutTaxon>
  <category rdf:resource="http://rs.tdwg.org/ontology/voc/
TDMTerm#Evolution"/>
  <hasContent>Some stuff about evolution</hasContent>
</InfoItem>
Info item is then like a DwC record and the category property is like
the BasisOfRecord. (TAPIRLink couldn't do multiple category stuff).
The argument against this is that the metadata would have to be
repeated for multiple InfoItems. Most requests would be for multiple
InfoItems about the same species - I guess but I really need clearer
examples as to what this will be applied to. Who is going to
implement this in the near future? Perhaps they should have a go and
decide? Isn't Wouter doing something on it? I don't have the time
just now to try out some examples and I think that is what is needed.
What does everyone else think?
All the best,
Roger

[tdwg-tag] Re: TDM Ontology

Renato De Giovanni