Re: SEEK Project and TDWG-SDD

15 Apr 2004

      At 12:57 PM 4/15/2004, you wrote:
...
...
No, I was thinking of seperating the metadata about the package "This
is a data set of Magnolias from FNA, it was assembled by, organized
by, dates, etc.) from the data describing the character states of the
...
The problem is how to organize this. One problem is technical: the
xml schema identity constraints only work within a single file, which
I think is a stupid design bug in xml schema.
The second problem is that the obvious solution would be to use some
kind of xinclude that inserts one file into another. We would need
inserting fragments, or split the terminology exactly at some "block
level" (modules of several characters or concepts) into files.
Xlink?

Won't most of this happen at some kind of web service level?  Application
builders will need to know what schemas to draw on for different components
of their tools, but the complexity of the underlying documents shouldn't be
too much of a barrier.  On the other hand, maintaining referential
integrity (and resolving conflicting standards for similar concepts) across
multiple standards may be a real challenge.

Fundamentally, Jim's question about where to define "characters" addresses
this in a real world way.  For truly simple, widely accepted character
concepts either solution works, but where definitions diverge, both
solution have problems.  I think this has to be denormalized and rely on
data mining tools to rationalize equivalence of concepts across multiple
definitions of "flower color."

BTW, there is an interesting article on bioinformatic (my spell checker
suggests 'bondwoman, bookworm and bonbons' for bioinformatic!) databases
(focusing on PDB) in this weeks Nature.  They at least mention some of
these issues.

Julian

Julian Humphries
DigiMorph.Org
Geological Sciences
University of Texas at Austin
Austin, TX 78712
512-471-3275