[tdwg-tag] Re: TDM Ontology

7 May 2007

      OK, so I "think" I'm starting to understand the problem that has led 
to the current approach taken by the taxon data model.

Trying to make a quite simplified analogy with specimen data, imagine 
a collection that used a simple OCR process in all labels and now it 
has only one table with a single textual field. Since we can find 
different things in labels (some may have coordinates, others not, 
some may have collecting date, etc.) the suggestion for this kind 
situation would then be to tag all records individually. For instance 
saying that "in this specific record you may find something about the 
location, in this other record you may find something about location 
and date" and so on.

Now back to species databases, if tagging is really something at the 
record level (and I suppose it is), I would be really surprised to 
see a species database which is ready to use some kind of tagging 
mechanism. Tagging at the record level would therefore require 
changing the data structure and revising all records. 

If this kind of work is being considered, then why not restructure 
everything according to the new terminology that was proposed during 
the meeting? Unless we are talking about some kind of data that 
simply cannot be separated and structured according to the proposed 
terms...

Looking at the results of the meeting, it's really tempting to take 
all terms and put them into a simple conceptual schema like 
DarwinCore. It would not only provide a common XML vocabulary, but we 
would almost instantly benefit from the existing technology for 
sharing/accessing distributed data. From the TAPIR perspective, data 
exchange schemas like PlinianCore could be seen as output models. 
All providers from the different networks could still try to map the 
same agreed terms/concepts.

If tagging will not take place at the record level, but at the field 
level, like "I have field X which sometimes has content about 
behaviour, sometimes about evolution, sometimes both, so I will 
automatically tag all records with both terms", then I see no big 
difference if in the current way of using TAPIR we just take the two 
corresponding concepts and map them against the same local field.

RDF could still be one of the TAPIR outputs, but the ontology would 
probably need a different approach (as discussed in previous 
messages).

Best Regards,
--
Renato

On 4 May 2007 at 13:16, Bob Morris wrote:
...
...
Anyway, I'm not quite familiar with species-level data sources. From the
previous messages, it seems that the main reason for using the generic
tagging approach is that most data sources will have chunks of text
including information about one or more TDM categories, and it will be
impractical to separate this information in a more structured way. Did I
understand the problem correctly?
Yes, but it is worse. Many such sources have \both/ textual---but 
categorized---data and structured data. And both may need ontological 
mapping so that both machine integration and human display applications 
have a chance of putting together the right stuff and also not ignoring 
what the client wishes not be ignored.
...
In this case, then you're right that it would be interesting if someone
could investigate this a bit more, make some tests and give us a more
practical feedback. If most participants of the species model workshop
have this kind of database, maybe they could try to map their fields to
the TDM categories.
I am presently doing some of that, albeit first trying to hand code some 
instances with Protege and Altova SemanticWorks. I guess the interesting 
part will come for stuff that \doesn't/ map well. At the moment, I am 
somewhat at a loss for what our intent was in this case, but maybe in 
another few hours I will have figured that out. ...
Bob
...
Best Regards,
--
Renato

[tdwg-tag] Re: TDM Ontology

Renato De Giovanni