[Tdwg-obs] Controlled vocabularies are a bad thing!

tdwg at achapman.org tdwg at achapman.org
Sun Feb 12 22:00:41 CET 2006


I couldn't agree with you more!

The first question that should always be asked is "why do I want a controlled vocabulary?"  Often the answer will be "so that I know what terms to use when I am searching"

A static vocabulary seldom works for long (other than in a small closed community), so you need an updateable vocabulary.  I think it is often better to use some form of linked  thesaurus etc. that can be added to, expanded, definitions modified if necessary.  This allows the user to get a definition, and look for possible synonymy in terms if they need to. I agree with your suggestion, Roger, of following a Dublin Core type process if necessary for different groups/classes of users if necessary.

There are several environmental thesauri around the world - some better than others - most overlapping in terms, etc.  Again off topic (but one of the things we need is a global, multilingual environmental thesaurus. There have been several attempts, some got close but then went their separate ways. There are standards available for preparing them .... but lets leave that for another discussion another day).


Arthur D. Chapman
Toowoomba, Australia

>>>From Roger Hyam <roger at tdwg.org> on 12 Feb 2006:

> Hi,
> I am splitting this out of the thread on observation definition as it is
> somewhat off that topic but I am quoting from something Denis said.
> Denis said: "The main issue here will be to *standardize how people are
> allowed to use the BasisOfObservation field *(through controlled
> vocabulary), and to find ways to know whether you are reporting
> measurements (eg, mass) about the sample only or the full organism." (My
> emphasis)
> I am interested in the notion of controlled vocabularies. We don't have
> them in normal text communication. Although I am picking words from a
> dictionary as I type I am free to make up my own words at any time. You
> may be able to understand their meaning from context or you may simply
> fail to understand what I am talking about. Words are normally added to
> dictionaries *after *they are in use not before.
> If we invent controlled vocabularies for communication then we have to
> guess all the terms that anyone is likely to use ahead of time. A little
> like compiling a dictionary before we can use a language. This is:
>     * Difficult - we have to have some person or group of people with
>       100% knowledge of the domain.
>     * Time Consuming - because people don't know everything they have to
>       go away and learn and debate etc etc - possibly for years.
>     * Error Prone - Once a list is settled on it will contain errors
>       (everything does). People will also use the existing terms to mean
>       something different from their official definition because they
>       can't invent their own terms - the purpose of having the
>       controlled vocabulary will be lost.
>     * Unresponsive - once we have a list we have to stick with it till
>       we update the whole thing.
>     * Potentially difficult to manage - if the list is embedded in an
>       XML Schema or imported into such a schema then changing it to add
>       a new kind of animal trace for example will involve updating
>       multiple schemas and possibly software etc etc.
> I therefore suggest that it is far better to have 'recommended' 
> vocabularies similar to Dublin Core where terms can be added and mixed 
> with those from other vocabularies rather than 'hard-coded' vocabularies
> as occur in may XML Schema implementations.
> People should be allowed to put what they like in BasisOfObservation 
> field but should only expect to be understood if they use something from
> a common vocabulary. (The vocabulary has the chance to be self 
> explaining if it uses URI/GUIDs - but that is another topic.)
> I appreciate that this may be a sea change for some people in the way we
> think about transfer standards but believe it is the only way forwards 
> if we want to handle the complexity of our data.
> Thoughts and observations(!) much appreciated.
> Roger
> -- 
> -------------------------------------
>  Roger Hyam
>  Technical Architect
>  Taxonomic Databases Working Group
> -------------------------------------
>  http://www.tdwg.org
>  roger at tdwg.org
>  +44 1578 722782
> -------------------------------------
> _______________________________________________
> Tdwg-obs mailing list
> Tdwg-obs at lists.tdwg.org
> http://lists.tdwg.org/mailman/listinfo/tdwg-obs_lists.tdwg.org

More information about the tdwg-content mailing list