Roger
I couldn't agree with you more!
The first question that should always be asked is "why do I want a controlled vocabulary?" Often the answer will be "so that I know what terms to use when I am searching"
A static vocabulary seldom works for long (other than in a small closed community), so you need an updateable vocabulary. I think it is often better to use some form of linked thesaurus etc. that can be added to, expanded, definitions modified if necessary. This allows the user to get a definition, and look for possible synonymy in terms if they need to. I agree with your suggestion, Roger, of following a Dublin Core type process if necessary for different groups/classes of users if necessary.
There are several environmental thesauri around the world - some better than others - most overlapping in terms, etc. Again off topic (but one of the things we need is a global, multilingual environmental thesaurus. There have been several attempts, some got close but then went their separate ways. There are standards available for preparing them .... but lets leave that for another discussion another day).
Cheers
Arthur D. Chapman Toowoomba, Australia
From Roger Hyam roger@tdwg.org on 12 Feb 2006:
Hi,
I am splitting this out of the thread on observation definition as it is
somewhat off that topic but I am quoting from something Denis said.
Denis said: "The main issue here will be to *standardize how people are allowed to use the BasisOfObservation field *(through controlled vocabulary), and to find ways to know whether you are reporting measurements (eg, mass) about the sample only or the full organism." (My emphasis)
I am interested in the notion of controlled vocabularies. We don't have them in normal text communication. Although I am picking words from a dictionary as I type I am free to make up my own words at any time. You may be able to understand their meaning from context or you may simply fail to understand what I am talking about. Words are normally added to dictionaries *after *they are in use not before.
If we invent controlled vocabularies for communication then we have to guess all the terms that anyone is likely to use ahead of time. A little like compiling a dictionary before we can use a language. This is:
* Difficult - we have to have some person or group of people with 100% knowledge of the domain. * Time Consuming - because people don't know everything they have to go away and learn and debate etc etc - possibly for years. * Error Prone - Once a list is settled on it will contain errors (everything does). People will also use the existing terms to mean something different from their official definition because they can't invent their own terms - the purpose of having the controlled vocabulary will be lost. * Unresponsive - once we have a list we have to stick with it till we update the whole thing. * Potentially difficult to manage - if the list is embedded in an XML Schema or imported into such a schema then changing it to add a new kind of animal trace for example will involve updating multiple schemas and possibly software etc etc.
I therefore suggest that it is far better to have 'recommended' vocabularies similar to Dublin Core where terms can be added and mixed with those from other vocabularies rather than 'hard-coded' vocabularies
as occur in may XML Schema implementations.
People should be allowed to put what they like in BasisOfObservation field but should only expect to be understood if they use something from
a common vocabulary. (The vocabulary has the chance to be self explaining if it uses URI/GUIDs - but that is another topic.)
I appreciate that this may be a sea change for some people in the way we
think about transfer standards but believe it is the only way forwards if we want to handle the complexity of our data.
Thoughts and observations(!) much appreciated.
Roger
--
Roger Hyam Technical Architect Taxonomic Databases Working Group
http://www.tdwg.org roger@tdwg.org
+44 1578 722782
Tdwg-obs mailing list Tdwg-obs@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-obs_lists.tdwg.org