Hi Roger,
I'm not sure I share this vision of a "law of conservation of pain". It's true that one of the points in the other message was to ease the process of sharing data, but this doesn't mean that clients will necessarily have trouble (I hope not!).
From the TAPIR perspective, we handle extensibility by allowing
providers to work with multiple conceptual schemas. If you produce a list of concepts from the TDM terms, anyone is free to produce other complementary lists in the future, without breaking compatibility.
You know that in TAPIR it's also possible to produce outputs in different XML formats, even RDF. This should facilitate the work of clients.
I suppose that clients will usually request data in formats that include elements that they know something about. But anyway, nothing prevents them to request things that they don't have any knowledge about. The TapirLink browser that I demonstrated during the TAPIR workshop is one of those clients: it dynamically builds an output model based on what the provider declared to have, and it simply displays this data in a tabular form.
Now let's assume that we decide to work with a generic conceptual schema with two main concepts, category of InfoItem and InfoItem value. Let's also assume that providers will be able to easily share their data according to this conceptual model. In TAPIR, the output formats will be very limited - they will need to follow this generic approach. But let's suppose that this will not be a problem. What is going to happen is that clients will get amost anything from there - basically values of things that can be categorised in many ways. If clients want to perform validation they will need to do it themselves (the output format will be too generic, so we cannot use XML validation). Perhaps RDF validation will offer more possibilities, but then you're only considering data exchange in an RDF world. The meaning of InfoItems you would get from a dictionary of categories, in the same way that you could get the meaning of elements from a dictionary (DarwinCore for instance, or some ontology).
In this case, it's not clear to me what would be the big benefits of using the generic model approach, but maybe I'm missing something. The more knowledge you have about the elements or concepts, the more interesting and powerful the applications will be. It's a philosophical issue.
If we decide to avoid the more "traditional" way of structuring and modelling data because we feel it somehow limits our applications, then I think we first need to clearly understand what are these limitations. Otherwise, by doing things in a very different way we may miss the opportunity of using existing tools and resources - but still running the risk of facing again in a different road the same data structuring issues that we tried to avoid.
Best Wishes, -- Renato
PS: I'm sorry for crossposting. I'll send any follow-ups only to the new taxon-model mailing list: http://lists.tdwg.org/mailman/listinfo/taxon-model
On 8 May 2007 at 9:35, Roger Hyam wrote:
Renato,
Thanks for your comments. That is an interesting view of the problem and I think you may be correct for the supplier databases (though I don't have first hand knowledge of these database schemas). Generally the nearer the exchange format is to the supplier's schema the easier it will be for them to publish. Taking the approach Markus suggests would produce the result you are after I believe.
There is just one problem that you didn't address.
Who wants to consume the data and what do they want to do with it?
To have something that is easy to produce, easy to consume and easy to extend is more or less impossible. There has to be some pain somewhere!
What is your vision of a client application? How would it handle elements it hadn't seen before - or is this not a requirement?
All the best,
Roger