SEEK Project and TDWG-SDD

Thu Apr 15 10:44:14 CEST 2004

Well, I hesitate to stick my neck out and into the space of this august,
expert group, but there hasn't been enough news from Iraq recently, so
here's something to ponder.

Bryan Heidorn visited Kansas this week and we had a discussion about
where TDWG-SDD is going, and the history of your deliberations.  We at
Kansas have not been involved with character data (although I have the
original interactive key program for plants (IDENT4) archived on paper
tape for the TDWG-SDD museum), but our recent SEEK project
(http://seek.ecoinformatics.org), with which I see there has been some
interaction with SDD, is attempting to build a global infrastructure for
handling, resolving and communicating taxonomic concepts (as opposed to
taxonomic names).  

Bryan described to us how your schema is intended in include all of the
data and metadata associated with a diagnostic character set for a group
of taxa. My understanding is thus incomplete and based only on our
single discussion, but it struck me that TDWG-SDD has an opportunity to
have much broader acceptance and support if your schema was not designed
as a single data object--to contain both the metadata about the package
(or work or whatever you refer to it as) *and* the descriptive data that
describe the individual concepts.

If the taxa/concepts had their own schemas and were linked to the
package metadata with a GUID, maybe a DOI or some other globally unique
identifier, then the XML concept data sets could be used for other
systems like concept based classification or database management
systems.  This would in theory, and in my view, give the work of your
group much more leverage, exposure and relevance to a broader group of
scientists and users of names and concepts.

The overhead for the traditional diagnostic identification software
makers would be that the XML parts would need to assembled for the
various applications that use the data and there would be the potential
risk that SDD data sets would be incomplete, if there were some careless
file management.  But presumably you guys are thinking about a registry
or distributed federation of these data sets anyway, where they would be
archived and served intact from a trusted source.

I also understand that data sets of diagnostic identification
information are far from complete descriptions of concepts in either a
taxonomic or phylogenetic sense, but if the SDD concept schema could
accommodate additional characters, then the opportunity would be there
for other people to use SDD for other kinds of systems.  The UI of
diagnostic key programs would likely not need to use or display DNA
sequences for interactive identification, but no harm done, they could
just ignore fields of no use to the program at hand.

Another objection I anticipate would be that broadening SDD to
accommodate the functional requirements of the broader taxon concept
management objectives would make the SDD schema too complex and
difficult for anyone to work with.

Overall, it seems to this SDD-novice, that the mechanical overhead of
applications having to assemble the taxon/concept pieces with the
character definition and package metadata, would not so onerous that it
would outweigh the benefit of separating out the taxon/concept schema as
data sets that *could* standalone if someone wanted to use them that
way.

Respond if you dare.

Jim B.

--------------------------------
James H. Beach
Biodiversity Research Center
University of Kansas
1345 Jayhawk Boulevard
Lawrence, KS 66045, USA
Tel: 785 864-4645, Fax: 785 864-5335
Televideocon: (H.323): 129.237.201.102