At 12:57 PM 4/15/2004, you wrote:
No, I was thinking of seperating the metadata about the package "This is a data set of Magnolias from FNA, it was assembled by, organized by, dates, etc.) from the data describing the character states of the
... The problem is how to organize this. One problem is technical: the xml schema identity constraints only work within a single file, which I think is a stupid design bug in xml schema.
The second problem is that the obvious solution would be to use some kind of xinclude that inserts one file into another. We would need inserting fragments, or split the terminology exactly at some "block level" (modules of several characters or concepts) into files.
Xlink?
Won't most of this happen at some kind of web service level? Application builders will need to know what schemas to draw on for different components of their tools, but the complexity of the underlying documents shouldn't be too much of a barrier. On the other hand, maintaining referential integrity (and resolving conflicting standards for similar concepts) across multiple standards may be a real challenge.
Fundamentally, Jim's question about where to define "characters" addresses this in a real world way. For truly simple, widely accepted character concepts either solution works, but where definitions diverge, both solution have problems. I think this has to be denormalized and rely on data mining tools to rationalize equivalence of concepts across multiple definitions of "flower color."
BTW, there is an interesting article on bioinformatic (my spell checker suggests 'bondwoman, bookworm and bonbons' for bioinformatic!) databases (focusing on PDB) in this weeks Nature. They at least mention some of these issues.
Julian
Julian Humphries DigiMorph.Org Geological Sciences University of Texas at Austin Austin, TX 78712 512-471-3275