It's How the Data will be Used that Counts

Tue Dec 4 17:42:57 CET 2001

Thanks Steve - at last we have some alternatives we can sink our teeth into.
Comments below.

----- Original Message -----
From: "Steve Shattuck" <Steve.Shattuck at CSIRO.AU>
To: <TDWG-SDD at USOBI.ORG>
Sent: Tuesday, December 04, 2001 11:31 AM
Subject: It's How the Data will be Used that Counts

Yes, of course. There are varied sources for the structured data. It still
seems to me that capturing the non-text sources will probably be a subset of
what's needed to capture the text sources. This is because textual
descriptions are probably the least formally structured data we need to deal
with as input (with the exception of original observations which, in some
taxonomists' minds at least, are highly structureless but are readily
structurable)

| The sources are much more varied and are often group-specific.  For
example,
| invertebrates have very few good quality text descriptions (most are old,
| are in a range of languages (English, French, German, etc), vary greatly
in
| style, quality, etc. etc) and the majority of invertebrates are currently
| undescribed (having 80% new taxa during a revision is common).

Yes I agree, a botany bias is showing through here.

| Similarly, the outputs required vary greatly and in ways hard to predict.
| While text descriptions would seem to be a common requirement, they are in
| some ways "legacy" and may become less important in the future as
| applications (and users) become more sophisticated.  We need to make sure
we
| keep this range of uses in mind at all times.

Yes, but see comment above.

| Because of this I don't really think the details of the model
| matter too much, more that it is rich enough to represent all data of
| interest.

Exactly my point - it needs to be rich enough to capture and express a
textual description, hopefully losslessly!

The terminology is +/- trivial at this stage, but I'll explain that I chose
something different from character/state simply to break with tradition for
a while. Traditionally, a character has states and that's it - a 2-level
tree. In the example above one character (leaf) has as child another
character (margin). This seems odd to many people thinking traditionally
about characters/states. Let's agree that we'll use them interchangeably for
now.

(Other points have been split into separate emails)

Cheers - k