SEEK Project and TDWG-SDD

Gregor Hagedorn G.Hagedorn at BBA.DE
Thu Apr 15 19:41:52 CEST 2004

Hi Jim!

> Well, I hesitate to stick my neck out and into the space of this
> august, expert group

Actually this expert group is small and struggling with a complex

Too few people have the time either to penetrate it and find the
critical points still lacking, or to write comprehensive

Any contributions are most welcome!

> (, with which I see there has been some
> interaction with SDD, is attempting to build a global infrastructure
> for handling, resolving and communicating taxonomic concepts (as
> opposed to taxonomic names).

.... and as I see it, the most valuable tax. concepts are not based on
synonymy or specimen lists, but on comprehensive descriptions.

.... which makes me critical of the current lack of providing name
identifiers separately from taxon concept identifiers. To describe a
new concept in SDD we need name, not concept IDs... But this is a
GBIF ECAT issue...

> If the taxa/concepts had their own schemas and were linked to the
> package metadata with a GUID, maybe a DOI or some other globally
> unique identifier, then the XML concept data sets could be used for
> other systems like concept based classification or database management
> systems.  This would in theory, and in my view, give the work of your
> group much more leverage, exposure and relevance to a broader group of
> scientists and users of names and concepts.

This is right into the heart of our discussions. We absolutely do NOT
want to define names, we want to use them. You may say link to them.
The current schema contains a general attempt to link to external
resources, including specimens and taxonomic names. See
Entities/Objects and Entities/Classes in the schema.

We call these objects ResourceConnectors. Each connector has an
external ID, a provider, and a human-readable text representation. In
addition, the derived connector types may add additional information.

The name "ResourceConnector" may be vague or confusing. In a way, it
is intended as a proxy design pattern, each connector stands as a
proxy object for a external object, the link to which may be
temporarily unavailable. Any suggestions what better name to call

> some careless file management.  But presumably you guys are thinking
> about a registry or distributed federation of these data sets anyway,
> where they would be archived and served intact from a trusted source.

Rather struggling with UDDI... I am still confused what, besides a
UDDI URL should be stored to reproducible retrieve a service wsdl URI
(that may change over time, thus the use of a UDDI). UDDI is defined
extremely general, too general for me.

Also, we would need to develop interface definitions containing a set
of standard operations to retrieve lists of external objects to make
a selection from, or to retrieve rich information about an external
object (i.e. more than the proxy inside SDD saves. I believe this is
essential for the future of GBIF but wonder where the resources are
to tackle these problems...

> I also understand that data sets of diagnostic identification
> information are far from complete descriptions of concepts in either a
> taxonomic or phylogenetic sense, but if the SDD concept schema could
> accommodate additional characters, then the opportunity would be there
> for other people to use SDD for other kinds of systems.  The UI of
> diagnostic key programs would likely not need to use or display DNA
> sequences for interactive identification, but no harm done, they could
> just ignore fields of no use to the program at hand.

I believe this is a misunderstanding. SDD is in no ways confined to
diagnostic sets, if diagnostic is understood as a minimized set of
field-observable characters. That may work for plants, it does not
work for fungi or other microorganisms.

We do attempt to provide means to differentiate between different
kind of characters. One method is that concept trees point to the
characters, and the method tree contains different methods. Another
is an attempt to allow ratings to rate characters and concepts (the
characters would inherit from concepts) for
  identification convenience,
  identification availability,
  identification reliability of feature

DELTA had a simple character wheight. Problem with that is that it
had no semantics other than change it until your selected program
produces a desired result.

Any comments or experience with this?

> Another objection I anticipate would be that broadening SDD to
> accommodate the functional requirements of the broader taxon concept
> management objectives would make the SDD schema too complex and
> difficult for anyone to work with.

No, I think that is where SDD has to go. I guess difficult is where
we are already... We try to break it up into modular components to
show that SDD tackles a range of problems, and that smaller subsets
of SDD are available for individual problems.

Can you provide some ideas about how you would modularize the data,
with just a keyword /example list of things in each module. I think
without you looking in detail at the schema, such a list might be
very valuable to the development of SDD!

Thanks a lot!

Gregor Hagedorn (G.Hagedorn at
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Koenigin-Luise-Str. 19          Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203



