OK, so I "think" I'm starting to understand the problem that has led to the current approach taken by the taxon data model.
Trying to make a quite simplified analogy with specimen data, imagine a collection that used a simple OCR process in all labels and now it has only one table with a single textual field. Since we can find different things in labels (some may have coordinates, others not, some may have collecting date, etc.) the suggestion for this kind situation would then be to tag all records individually. For instance saying that "in this specific record you may find something about the location, in this other record you may find something about location and date" and so on.
Now back to species databases, if tagging is really something at the record level (and I suppose it is), I would be really surprised to see a species database which is ready to use some kind of tagging mechanism. Tagging at the record level would therefore require changing the data structure and revising all records.
If this kind of work is being considered, then why not restructure everything according to the new terminology that was proposed during the meeting? Unless we are talking about some kind of data that simply cannot be separated and structured according to the proposed terms...
Looking at the results of the meeting, it's really tempting to take all terms and put them into a simple conceptual schema like DarwinCore. It would not only provide a common XML vocabulary, but we would almost instantly benefit from the existing technology for sharing/accessing distributed data. From the TAPIR perspective, data exchange schemas like PlinianCore could be seen as output models. All providers from the different networks could still try to map the same agreed terms/concepts.
If tagging will not take place at the record level, but at the field level, like "I have field X which sometimes has content about behaviour, sometimes about evolution, sometimes both, so I will automatically tag all records with both terms", then I see no big difference if in the current way of using TAPIR we just take the two corresponding concepts and map them against the same local field.
RDF could still be one of the TAPIR outputs, but the ontology would probably need a different approach (as discussed in previous messages).
Best Regards, -- Renato
On 4 May 2007 at 13:16, Bob Morris wrote:
Anyway, I'm not quite familiar with species-level data sources. From the previous messages, it seems that the main reason for using the generic tagging approach is that most data sources will have chunks of text including information about one or more TDM categories, and it will be impractical to separate this information in a more structured way. Did I understand the problem correctly?
Yes, but it is worse. Many such sources have \both/ textual---but categorized---data and structured data. And both may need ontological mapping so that both machine integration and human display applications have a chance of putting together the right stuff and also not ignoring what the client wishes not be ignored.
In this case, then you're right that it would be interesting if someone could investigate this a bit more, make some tests and give us a more practical feedback. If most participants of the species model workshop have this kind of database, maybe they could try to map their fields to the TDM categories.
I am presently doing some of that, albeit first trying to hand code some instances with Protege and Altova SemanticWorks. I guess the interesting part will come for stuff that \doesn't/ map well. At the moment, I am somewhat at a loss for what our intent was in this case, but maybe in another few hours I will have figured that out. ...
Bob
Best Regards,
Renato