No, I was thinking of seperating the metadata about the package "This is a data set of Magnolias from FNA, it was assembled by, organized by, dates, etc.) from the data describing the character states of the individual taxa. So a good question is what do you do with the character definitions! It seems the character state values without the character definitions would not be of much use for any system to interpret the meaning of the states. Two options, de-normalize the character definitions and put them in each concept schema, or two have a separate server, and an external reference in the data schema that, has the character definitions. Not sure how that choice would play out.
This is on our agenda, but I am stuggling how to do it.
What we already know is: SDD can have a project that only defines terminology. We anticipate that other projects containing descriptions would like to use a common terminology project. Perhaps even better: select blocks from multiple such projects.
The problem is how to organize this. One problem is technical: the xml schema identity constraints only work within a single file, which I think is a stupid design bug in xml schema.
The second problem is that the obvious solution would be to use some kind of xinclude that inserts one file into another. We would need inserting fragments, or split the terminology exactly at some "block level" (modules of several characters or concepts) into files.
My experience is that xml tools do not support xinclude. I mostly work with xml spy, and I could not get it to work. Does anybody have experience how this modular design can be solved with current tools, so that we can do some testing?
Which examples exist that have a similar modular structure, but still use identifiers the referential integrity of which should be validated?
If the taxa/concepts had their own schemas and were linked to the package metadata with a GUID, maybe a DOI or some other globally unique identifier, then the XML concept data sets could be used for other systems like concept based classification or database management systems.
What is a DOI? Wouldn't I need more than the GUID information?
Currently we achieve GUIDs through combining a Project GUID with local 32 bit integer IDs. I personally am a fan of the long GUIDs for such purposes, but the majority prefers URI-type IDs.
The Project GUID is planned either to be a self-controlled URL, or a registered Project name, e.g. registered through a GBIF UDDI.
The overhead for the traditional diagnostic identification software
makers would be that the XML parts would need to assembled for the various applications that use the data and there would be the potential risk that SDD data sets would be incomplete, if there were some careless file management.
What parts could get lost? the taxonomic parts?
yes, if you had multiple xml docs for the same 'diagnostic package' they would be managed as distinct files.
Already asked above: should it really be files? Can we not have fragment identifiers to point within files? What are the techniques to do this?
Gregor ---------------------------------------------------------- Gregor Hagedorn (G.Hagedorn@bba.de) Institute for Plant Virology, Microbiology, and Biosafety Federal Research Center for Agriculture and Forestry (BBA) Koenigin-Luise-Str. 19 Tel: +49-30-8304-2220 14195 Berlin, Germany Fax: +49-30-8304-2203
Often wrong but never in doubt!