Re: SEEK Project and TDWG-SDD

15 Apr 2004

      ...
No, I was thinking of seperating the metadata about the package "This
is a data set of Magnolias from FNA, it was assembled by, organized
by, dates, etc.) from the data describing the character states of the
individual taxa.  So a good question is what do you do with the
character definitions! It seems the character state values without the
character definitions would not be of much use for any system to
interpret the meaning of the states.  Two options, de-normalize the
character definitions and put them in each concept schema, or two have
a separate server, and an external reference in the data schema that,
has the character definitions. Not sure how that choice would play
out.
This is on our agenda, but I am stuggling how to do it.

What we already know is: SDD can have a project that only defines
terminology. We anticipate that other projects containing
descriptions would like to use a common terminology project. Perhaps
even better: select blocks from multiple such projects.

The problem is how to organize this. One problem is technical: the
xml schema identity constraints only work within a single file, which
I think is a stupid design bug in xml schema.

The second problem is that the obvious solution would be to use some
kind of xinclude that inserts one file into another. We would need
inserting fragments, or split the terminology exactly at some "block
level" (modules of several characters or concepts) into files.

My experience is that xml tools do not support xinclude. I mostly
work with xml spy, and I could not get it to work. Does anybody have
experience how this modular design can be solved with current tools,
so that we can do some testing?

Which examples exist that have a similar modular structure, but still
use identifiers the referential integrity of which should be
validated?
...
...
...
If the taxa/concepts had their own schemas and were linked to the
package metadata with a GUID, maybe a DOI or some other globally
unique identifier, then the XML concept data sets could be used for
other systems like concept based classification or database
management systems.
What is a DOI?
Wouldn't I need more than the GUID information?

Currently we achieve GUIDs through combining a Project GUID with
local 32 bit integer IDs. I personally am a fan of the long GUIDs for
such purposes, but the majority prefers URI-type IDs.

The Project GUID is planned either to be a self-controlled URL, or a
registered Project name, e.g. registered through a GBIF UDDI.
...
...
The overhead for the traditional diagnostic identification software
...
makers would be that the XML parts would need to assembled for the
various applications that use the data and there would be the
potential risk that SDD data sets would be incomplete, if there were
some careless file management.
What parts could get lost? the taxonomic parts?
yes, if you had multiple xml docs for the same 'diagnostic package'
they would be managed as distinct files.
Already asked above: should it really be files? Can we not have
fragment identifiers to point within files? What are the techniques
to do this?

Gregor
----------------------------------------------------------
Gregor Hagedorn (G.Hagedorn@bba.de)
Institute for Plant Virology, Microbiology, and Biosafety
Federal Research Center for Agriculture and Forestry (BBA)
Koenigin-Luise-Str. 19          Tel: +49-30-8304-2220
14195 Berlin, Germany           Fax: +49-30-8304-2203

Often wrong but never in doubt!