Re: It's How the Data will be Used that Counts
Kevin wrote:
I seem to be in a minority of one here again, but I'll continue to argue my case for a bit longer.
why not... whe have be doing this for several years now, a few more email iterations won't hurt... :)
- Exactly, it's easier to go from data to +/- natural language, which is
precisely why we need to try hard to facilitate the reverse.
Which is not all that difficult... to mark up a single description manually is a relatively trivial task... but to process a lot automatically is another matter... and to do it in a way that the whole community accepts is another matter entirely... :)
- If we can effectively embed fully parsable data in a natural-language
paragraph, why not?
becasue it creates mixed content (legal, but evil) where the data is parsable, but not structured. If we can not represent our data universe in a database in an elegant and easily understood way we may not necessarily have failed, but we will have fallen short of the target by a large margin.
In fact, DELTA already does this sort of thing by allowing liberal appending and prepending <freeform comments> all over the place. While this makes for quasireadable descriptions, authors often embed interesting character data in the comments making it unusable by other applications, even within the DELTA suite. The 'rarely' and 'misinterpreted' scoring options in Lucid might be seen as an attempt to capture some of this information.
- If a structured data document based on our standard is a subset of a
marked-up description based on the standard, then creating a standard that can support the latter gives us the best of both worlds. If it can be done, why not?
I agree... this is a laudable aim... but I must admit to never thinking of a structured document as a subset of a marked up description - oftentimes the reverse may be the case... Isn't it better to think of both structured documents and marked up descriptions as being subsets of the standard we are trying to create?
Personally I think that creating an XML representation of structured data would be a doddle.
But as we have seen, getting everyone to agree that one person's way of doing it is the one true and proper path to descriptive enlightenment is no easy task...
Creating a fully parsable but lossless XML representation of a natural language description (which hence can also handle the degraded case of structured data) - now that would really be something to write home about!
Well, dreaming about it at least... I think we are dealing with two different, and I fear irreconsilable things here... descriptions by their very nature are lossy - they are abstract representations of the gestalt of a sample of a taxon, often with an arbitrary word limit, attempting to portray in a familiar format what an author thinks a taxon looks like. Structured documents such as DELTA and Lucid at least have to potential to store everything that is remotely interesting about every taxon in the set and often come close to achieving it in reality. So what if the resulting descriptions do not have the poetic beauty of a Shakespearian sonnet; at least the information will be there and retrievable... In the case of biological description, beauty is not necessarily truth, or at least the whloe truth...
Anyone else out there +/- agree with me, or should I give up now?
Don't do that... if you do, you will never have anything to write home about...
jim
participants (1)
-
Jim Croft