Response to Jim

Kevin Thiele kevin.thiele at PI.CSIRO.AU
Mon Sep 4 09:06:02 CEST 2000


Jim wrote

| >3. As I see it the spec itself would define a set of allowable qualifiers
| >(such as "rare" (= "rarely"?), "by misinterpretation", "uncertain" etc).
I
| >think we could probably agree on a limited set of qualifiers, and stick
to
| >that (with allowance for extension). If we do this, then "anchovies,
| >please!" will be out for several reasons.

| It would be nice to put a cap on the number/type of 'qualifiers' allowed,
| but you can bet that there will always be something on a list that people
| will want/need to use, but at this stage we probaly should confine ouselve
| to the need for the list itself, not to the content of the list?
|
| in theory yes, but in practice, if you open this can of worms, the
| discussion will probably go on for weeks...

I think there's a difference between the spec defining allowable qualifiers
and defining allowable values for allowable characters (the old name-space
or lexicon chestnut). I've argued strongly in the past against the latter,
but I do think the former is tractable.

| >Two notes:
| >* I'd like to allow but not enforce that a valid document have the
| >allowable_values property filled. By not enforcing it, a simply marked-up
| >natural-language description could be a valid document. This would
perhaps
| >mean that the spec could meet half-way the automated markup of legacy
| >descriptions, and I'm keen to do this. Of course, a document without this
| >property specified would not be able to be validated in the way you
suggest
| >and hence may not be very useful for some purposes, but this may be a
price
| >one is willing to pay for other benefits, and I think we need to keep
this
| >open.
|
| It would be pretty difficult to validate a blob of free text...  but I can
| see a case for wanting to include them in the exercise...  so perhaps
there
| should be a deprecated class of elements that are essentially unable to be
| validated but perhaps of interest in some instances?

We need to remember that 99.99999% of all the descriptive data in the world
has not been validated except by error-prone humans. I think the bulk of the
endeavour of the last few centuries is still a pretty valuable resource.

| >* I'm using "allowable values" rather than "states" as this seems to me
to
| >be more general, and subsumes the requirement to distinguish between, for
| >instance, multistate and numeric "characters". A numeric "character" of
| >course, doesn't have "states", but it does have allowable values
(integers
| >in the case of an integer numeric, real numbers in the case of a real
| >numeric).
|
| This would cover the use of text rather than various numeric values -
there
| is nothing that should say everything as to be represented numerically.
It
| should be possible to use text *and* to have it validated in some way.

Yes, but I don't mean using a character of type TEXT. I don't think this is
necessary.

| >1. How strong is the requirement for this type of validation? Enforcing
this
| >seems to me to be like requiring that all word documents carry in their
| >header a dictionary of the english language to allow validation of
| >spellings. It seems to me that providing tools that allow people to check
| >these strings against a predefined list (defined either within the
document
| >or in an external resource) would be useful, but not absolutely
necessary. A
| >document that is not or cannot be validated in this way would not be
| >useless, and would perhaps be more free.
|
| Documents do not have to carry their validation with them - they could
| refer to an external source of validation (or dictionary in your above
| example - this is what is implied and expected in Word documents at the
moment)

Exactly. A document that is validated against an external resource is more
valuable than one not validated. That makes the validated document more
valuable, but it doesn't make the non-validated one value-less (if that were
the case Jim all your emails, with their liberal sprinkling of typos, would
be worthless!)

| >Note that the spec as I see it would allow (but again, not enforce as
DELTA
| >and Lucid do) the encoding of descriptions. Thus, a valid document may be
d1
| >as below. This would preempt the need for typographic validation, and
allow
| >allowable-values validation. But for some reason I don't want to disallow
d2
| >as a valid document also.
|
| This may be a non threatening approach to gradually introducing validity
| and rigour into descriptive data and is probably worth exploring some
more...

A non-threatening approach ... gun for it!

Cheers - k




More information about the tdwg-content mailing list