[Tdwg-obs] Controlled vocabularies are a bad thing!

Roger Hyam roger at tdwg.org
Sun Feb 12 15:53:56 CET 2006


I am splitting this out of the thread on observation definition as it is 
somewhat off that topic but I am quoting from something Denis said.

Denis said: "The main issue here will be to *standardize how people are allowed to use the BasisOfObservation field *(through controlled vocabulary), and to find ways to know whether you are reporting measurements (eg, mass) about the sample only or the full organism." (My emphasis)

I am interested in the notion of controlled vocabularies. We don't have them in normal text communication. Although I am picking words from a dictionary as I type I am free to make up my own words at any time. You may be able to understand their meaning from context or you may simply fail to understand what I am talking about. Words are normally added to dictionaries *after *they are in use not before.

If we invent controlled vocabularies for communication then we have to guess all the terms that anyone is likely to use ahead of time. A little like compiling a dictionary before we can use a language. This is:

    * Difficult - we have to have some person or group of people with
      100% knowledge of the domain.
    * Time Consuming - because people don't know everything they have to
      go away and learn and debate etc etc - possibly for years.
    * Error Prone - Once a list is settled on it will contain errors
      (everything does). People will also use the existing terms to mean
      something different from their official definition because they
      can't invent their own terms - the purpose of having the
      controlled vocabulary will be lost.
    * Unresponsive - once we have a list we have to stick with it till
      we update the whole thing.
    * Potentially difficult to manage - if the list is embedded in an
      XML Schema or imported into such a schema then changing it to add
      a new kind of animal trace for example will involve updating
      multiple schemas and possibly software etc etc.

I therefore suggest that it is far better to have 'recommended' 
vocabularies similar to Dublin Core where terms can be added and mixed 
with those from other vocabularies rather than 'hard-coded' vocabularies 
as occur in may XML Schema implementations.

People should be allowed to put what they like in BasisOfObservation 
field but should only expect to be understood if they use something from 
a common vocabulary. (The vocabulary has the chance to be self 
explaining if it uses URI/GUIDs - but that is another topic.)

I appreciate that this may be a sea change for some people in the way we 
think about transfer standards but believe it is the only way forwards 
if we want to handle the complexity of our data.

Thoughts and observations(!) much appreciated.



 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
 roger at tdwg.org
 +44 1578 722782

More information about the tdwg-content mailing list