It's How the Data will be Used that Counts

Kevin Thiele kevin.thiele at BIGPOND.COM
Tue Dec 4 17:42:57 CET 2001


Thanks Steve - at last we have some alternatives we can sink our teeth into.
Comments below.

----- Original Message -----
From: "Steve Shattuck" <Steve.Shattuck at CSIRO.AU>
To: <TDWG-SDD at USOBI.ORG>
Sent: Tuesday, December 04, 2001 11:31 AM
Subject: It's How the Data will be Used that Counts


| I've been giving Kevin's approach some thought and have the following
| comments:
|
| Kevin's original information flow model is too simplistic.  A more
realistic
| model would be something like:
|
|     Text Descriptions |                         | Text Descriptions
|         Phylogenetics |---> Structured Data --->| Phylogenetics
|             Specimens |                         | Identification Tools
| (many)

Yes, of course. There are varied sources for the structured data. It still
seems to me that capturing the non-text sources will probably be a subset of
what's needed to capture the text sources. This is because textual
descriptions are probably the least formally structured data we need to deal
with as input (with the exception of original observations which, in some
taxonomists' minds at least, are highly structureless but are readily
structurable)

| The sources are much more varied and are often group-specific.  For
example,
| invertebrates have very few good quality text descriptions (most are old,
| are in a range of languages (English, French, German, etc), vary greatly
in
| style, quality, etc. etc) and the majority of invertebrates are currently
| undescribed (having 80% new taxa during a revision is common).

Yes I agree, a botany bias is showing through here.

| Similarly, the outputs required vary greatly and in ways hard to predict.
| While text descriptions would seem to be a common requirement, they are in
| some ways "legacy" and may become less important in the future as
| applications (and users) become more sophisticated.  We need to make sure
we
| keep this range of uses in mind at all times.

Yes, but see comment above.


| Because of this I don't really think the details of the model
| matter too much, more that it is rich enough to represent all data of
| interest.

Exactly my point - it needs to be rich enough to capture and express a
textual description, hopefully losslessly!

| ********
|
| I've also been thinking about Kevin's latest example:
|
| "Leaf margins serrate with forward-pointing teeth"
| <feature name="leaf">
|   <feature name="margin">
|     <feature name = "teething shape">
|         <value>serrate</value>
|     </feature>
|     <feature name = "teeth orientation">with
|         <value>forward-pointing</value>teeth
|     </feature>
|   </feature>
| </feature>
|
| First, it seems to me that "feature" is what taxonomists call "character"
| and "value" is "state".  Being a traditionalist I'll switch back to this
| common terminology:

The terminology is +/- trivial at this stage, but I'll explain that I chose
something different from character/state simply to break with tradition for
a while. Traditionally, a character has states and that's it - a 2-level
tree. In the example above one character (leaf) has as child another
character (margin). This seems odd to many people thinking traditionally
about characters/states. Let's agree that we'll use them interchangeably for
now.

(Other points have been split into separate emails)

Cheers - k




More information about the tdwg-content mailing list