----- Original Message ----- From: "Steve Shattuck" Steve.Shattuck@CSIRO.AU To: TDWG-SDD@USOBI.ORG Sent: Tuesday, December 04, 2001 11:31 AM Subject: It's How the Data will be Used that Counts
| Kevin's representation is too focused on text descriptions. A more complete | representation might be: | | <character name="leaf"> | <state>present</state> | <character name="leaf margin"> | <state>serrate</state> | <character name = "tooth orientation"> | <state>forward-pointing</state> | </character> | </character> | </character>
I would argue that this is a less complete representation, because you've abstracted the original data further than I have.
I won't use the "markup" from my last email for "Leaf margins serrate with forward-pointing teeth", as this was designed to exemplify one limited problem (how to expand a description if characters are nested) and wasn't actually marked up in the way I proposed for challenge case 1. Two possible ways I'd do this example are:
"Leaf margins serrate with forward-pointing teeth"
{Using the rule "<states>s cannot have <characters>s as siblings"}
<Feature><Name>Leaf</Name> <Feature name = "Presence" value = "Present"> <Feature Name="marginal toothing">margins <Value>serrate</Value> </Feature> with <Feature Name = "tooth orientation"> <State>forward-pointing</State> teeth </Feature> </Feature>
{Relaxing the rule so that <states>s can have <characters>s as siblings}
<Feature><Name>Leaf</Name> <Feature name = "Presence" value = "Present"> <Feature Name="marginal toothing">margins <Value>serrate</Value> with <Feature Name = "tooth orientation"> <State>forward-pointing</State> teeth </Feature> </Feature> </Feature>
Your proposal is:
<character name="leaf"> <state>present</state> <character name="leaf margin"> <state>serrate</state> <character name = "tooth orientation"> <state>forward-pointing</state> </character> </character> </character>
The score:
1. Can we parse from these the data atoms "leaf = present "leaf margin = serrate" and "tooth orientation = forward-pointing"?
Kevin's = Yes Steve's = Yes
Can we easily retrieve from these the original natural language string?
Kevin's = Yes Steve's = No
On this scoring I'm one up. Then again, yours would be slightly easier to parse than mine, so we're probably equal. What's most important here? Dunno.
Further, it seems to me that yours is a subset of mine: a Schema that allowed mine would also allow yours, but not vice versa.
| In my original, DELTA-centric model I used a <description> tag to try and | capture the text description information separate from the <state> | information. My thinking was that these two | requirements/approaches/viewpoints are too distinct to cram together without | falling into the same trap as the current DELTA Standard (which is a | least-common denominator approach).
Yes we could tag the bits of free-form text. But is there any need? They will (by definition) be ignored by all processors except for natural-language - since the NL is fully retrievable from my model, why not leave them untagged? In your model, they would need to be tagged since the model does not represent a natural description - it represents abstracted data from which a description can be +/- created.
| The problem here is that the phrase "Leaf margins serrate with | forward-pointing teeth" concerns 3 characters (leaf, margin and teeth) and 1 | implied and 2 expressed states (present, serrate and forward pointing) with | the characters being dependent (and therefore the context containing | significant information - we know that 'teeth' have something to do with | 'serrate' which has something to do with 'leaf margins' - the leaves being | present because we're describing them). There's a lot of logic involved in | parsing this. I can't think of a simple way of representing all this | complex information without separating it at some level. Kevin's suggestion | represents the text description and mine the underlying data, but neither | works well for the other.
I agree - there's still too much complex logic even in the very simple types of examples we're using so far. We need somehow to step back further to even more basic examples to tease these issues out.
Cheers - k