Character lists as default

Kevin Thiele kevin.thiele at PI.CSIRO.AU
Mon Sep 11 14:40:43 CEST 2000


>I'd prefer to see highly structured information as the "default", with
>loosely structured "blobs"
>being optional, rather than the other way round.

This preference has been endorsed by several people, and seems to come down
to whether a character list is required or optional.

In my draft model, a character list is optional (but obviously pretty
necessary if anyone wants to parse the description). If a character list is
present it can be used for validation, if one is absent no validation can
occur. Similarly, if a part of a description is tagged as a defined
character, it's useable as such, if another part is not so tagged, it's not.

Eric offered:

<DOCUMENT>
    <DESCRIPTION Taxon_Name = "Viola odorata">
        <CHARACTER type="defined" Character_Name = "Leaves">
            <STATE State_Name = "present">
        </CHARACTER>
        <CHARACTER type="arbitrary" Character_Name = "scent">
           a marvelous perfume on a perfect spring day
        </CHARACTER>
    </DESCRIPTION>
</DOCUMENT>

But this still requires atomisation of the description (the character
"scent" has been atomised out of the description 'ground matrix'). Since
we're agreed that an 'arbitrary' character is not usefully parseable, what's
the point of atomising it?

How would this model accommodate my example:

<DOCUMENT>
<DESCRIPTION Name = "Viola eminens">
Perennial herb spreading by stolons; rootstock sometimes somewhat swollen
and bulbous at the stem bases. Stems contracted so that the leaves form
rosettes, never elongate with caulescent leaves. Leaves broad-reniform, the
largest (10-)12-15(-25) mm long, (20-)25-35(-45) mm wide, 1.5-3.2 times
wider than long, usually with a broad basal sinus; lamina with 9-20 +/-
prominent teeth, glabrous or with scattered unicellular hairs on the upper
surface, +/- concolorous bright green; petioles 2-8 cm long; stipules
narrowly triangular, usually with several small, glandular teeth on each
side. Flowers ... etc
</DESCRIPTION>
</DOCUMENT>

if Thiele and Prober are not yet up to the stage of atomising the
description. Under Eric's model, it seems to me that this would need to be:

<DOCUMENT>
    <CHARACTER LIST>
        <CHARACTER Name = "Free text" Type = "arbitrary"/>
    </CHARACTER LIST>
<DESCRIPTION Name = "Viola eminens">
<CHARACTER Character_Name = "Free text">
Perennial herb spreading by stolons; rootstock sometimes somewhat swollen
and bulbous at the stem bases. Stems contracted so that the leaves form
rosettes, never elongate with caulescent leaves. Leaves broad-reniform, the
largest (10-)12-15(-25) mm long, (20-)25-35(-45) mm wide, 1.5-3.2 times
wider than long, usually with a broad basal sinus; lamina with 9-20 +/-
prominent teeth, glabrous or with scattered unicellular hairs on the upper
surface, +/- concolorous bright green; petioles 2-8 cm long; stipules
narrowly triangular, usually with several small, glandular teeth on each
side. Flowers ... etc
</CHARACTER>
</DESCRIPTION>
</DOCUMENT>

This seems like essentially tagging the blob as tagless.

In terms of a data model, tagging data adds information to it, so surely the
untagged state is more basic (fundamental) than the tagged. The question of
whether our
default behaviour should be to tag data is to me a sociological one, and
should be handled outside the data model.

Cheers - k




More information about the tdwg-content mailing list