Re: It's How the Data will be Used that Counts
Thanks Steve - at last we have some alternatives we can sink our teeth into. Comments below.
----- Original Message ----- From: "Steve Shattuck" Steve.Shattuck@CSIRO.AU To: TDWG-SDD@USOBI.ORG Sent: Tuesday, December 04, 2001 11:31 AM Subject: It's How the Data will be Used that Counts
| I've been giving Kevin's approach some thought and have the following | comments: | | Kevin's original information flow model is too simplistic. A more realistic | model would be something like: | | Text Descriptions | | Text Descriptions | Phylogenetics |---> Structured Data --->| Phylogenetics | Specimens | | Identification Tools | (many)
Yes, of course. There are varied sources for the structured data. It still seems to me that capturing the non-text sources will probably be a subset of what's needed to capture the text sources. This is because textual descriptions are probably the least formally structured data we need to deal with as input (with the exception of original observations which, in some taxonomists' minds at least, are highly structureless but are readily structurable)
| The sources are much more varied and are often group-specific. For example, | invertebrates have very few good quality text descriptions (most are old, | are in a range of languages (English, French, German, etc), vary greatly in | style, quality, etc. etc) and the majority of invertebrates are currently | undescribed (having 80% new taxa during a revision is common).
Yes I agree, a botany bias is showing through here.
| Similarly, the outputs required vary greatly and in ways hard to predict. | While text descriptions would seem to be a common requirement, they are in | some ways "legacy" and may become less important in the future as | applications (and users) become more sophisticated. We need to make sure we | keep this range of uses in mind at all times.
Yes, but see comment above.
| Because of this I don't really think the details of the model | matter too much, more that it is rich enough to represent all data of | interest.
Exactly my point - it needs to be rich enough to capture and express a textual description, hopefully losslessly!
| ******** | | I've also been thinking about Kevin's latest example: | | "Leaf margins serrate with forward-pointing teeth" | <feature name="leaf"> | <feature name="margin"> | <feature name = "teething shape"> | <value>serrate</value> | </feature> | <feature name = "teeth orientation">with | <value>forward-pointing</value>teeth | </feature> | </feature> | </feature> | | First, it seems to me that "feature" is what taxonomists call "character" | and "value" is "state". Being a traditionalist I'll switch back to this | common terminology:
The terminology is +/- trivial at this stage, but I'll explain that I chose something different from character/state simply to break with tradition for a while. Traditionally, a character has states and that's it - a 2-level tree. In the example above one character (leaf) has as child another character (margin). This seems odd to many people thinking traditionally about characters/states. Let's agree that we'll use them interchangeably for now.
(Other points have been split into separate emails)
Cheers - k
participants (1)
-
Kevin Thiele