Well, I get your point and it certainly wouldn't be a huge burden on developers to check both the Latin and English names. But I thought the whole point of suggesting a controlled vocabulary was so that people would have one recommended value to use for each state they want to represent. Why do we have ISO 639-1 language codes? A good developer could just check for English, english, EN, en, eng., Eng., Engl, engl., etc. etc. Steve
Bob Morris wrote:
On Fri, Nov 19, 2010 at 8:56 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
[concensus discussion] ... Why make all software check for two alternatives when a consensus would fix the problem? (Consensus... did I say that word in a tdwg-content email????)
Ummm, because plenty of data will not meet the consensus? Because robust software checks for things that may occur even if they violate expectations, rules, standards, recommendations, conventions, or consensus?
One model of consensus might be "what most of the real current data does". Since so much biodiversity data is dirty, I'm pretty sure that would bring howls from taxonomists following this list. But it would also have a shot at exposing more data, if one also believes that most consuming applications are not robust. Better is for the community to make recommendations (by consensus or some other mechanism) and let developers of non-robust applications accept responsibility for their non-robustness.
My self-serving(*) position is that a huge amount of dirty current data is being served by organizations/people who have no idea that it is dirty, and no systematic way to find out that it is. By contrast, the relatively small number of client developers are likely to have a good idea of where the dirt is and often can deal with it. The well-known social problem with that arises in circumstances such as yours, where domain scientists are writing software out of necessity or urgency, and rightfully want to get on with their science. They then have to make choices about where to spend their time: on software engineering or science. Many have little choice, since they are paid to be scientists, not software engineers. Nor are lay software engineers the only authors of non-robust software. Analogous time-constraints imposed on professionals often result in the same kinds of problems. Alas, there is no single solution to this conundrum. But my (not biologically informed) guess is that for the problem in this thread, supporting both alternatives does not impose a big burden on developers. In which case there is no need for consensus. :-)
Bob Morris (*) http://etaxonomy.org/mw/FP2010:_Continuous_Quality_Control