Re: [tdwg-content] [tdwg-tag] Inclusion of authorship in DwC scientificName: good or bad?

19 Nov 2010

      On Fri, Nov 19, 2010 at 8:56 AM, Steve Baskauf
<steve.baskauf@vanderbilt.edu> wrote:
...
[concensus discussion]
...  Why make all software check for two alternatives
when a consensus would fix the problem?  (Consensus... did I say that
word in a tdwg-content email????)
Ummm, because plenty of data will not meet the consensus? Because
robust software checks for things that may occur even if  they violate
expectations, rules, standards, recommendations, conventions, or
consensus?

One model of consensus might be "what most of the real current data
does". Since so much biodiversity data is dirty,  I'm pretty sure that
would bring howls from taxonomists following this list. But it would
also have a shot at exposing more data, if one also believes that most
consuming applications are not robust. Better is for the community to
make recommendations (by consensus or some other mechanism) and let
developers of non-robust applications accept responsibility for their
non-robustness.

My self-serving(*) position is that a huge amount of dirty current
data is being served by organizations/people who have no idea that it
is dirty, and no systematic way to find out that it is. By contrast,
the relatively small number of client developers are likely to have a
good idea of where the dirt is and often can deal with it. The
well-known social problem with that arises in circumstances such as
yours, where domain scientists are writing software out of necessity
or urgency, and rightfully want to get on with their science. They
then have to make choices about where to spend their time: on software
engineering or science. Many have little choice, since they are paid
to be scientists, not software engineers.  Nor are lay software
engineers the only authors of non-robust software. Analogous
time-constraints imposed on professionals often result in the same
kinds of problems.  Alas, there is no single solution to this
conundrum. But my (not biologically informed) guess is that for the
problem in this thread, supporting both alternatives does not impose a
big burden on developers.  In which case there is no need for
consensus. :-)

Bob Morris
(*) http://etaxonomy.org/mw/FP2010:_Continuous_Quality_Control
-- 
Robert A. Morris
Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390
Associate, Harvard University Herbaria
email: morris.bob@gmail.com
web: http://bdei.cs.umb.edu/
web: http://etaxonomy.org/mw/FilteredPush
http://www.cs.umb.edu/~ram
phone (+1) 857 222 7992 (mobile)