SEEK Project and TDWG-SDD

Julian H humphries at MAIL.UTEXAS.EDU
Thu Apr 15 13:25:34 CEST 2004

At 12:57 PM 4/15/2004, you wrote:
> > No, I was thinking of seperating the metadata about the package "This
> > is a data set of Magnolias from FNA, it was assembled by, organized
> > by, dates, etc.) from the data describing the character states of the
>The problem is how to organize this. One problem is technical: the
>xml schema identity constraints only work within a single file, which
>I think is a stupid design bug in xml schema.
>The second problem is that the obvious solution would be to use some
>kind of xinclude that inserts one file into another. We would need
>inserting fragments, or split the terminology exactly at some "block
>level" (modules of several characters or concepts) into files.


Won't most of this happen at some kind of web service level?  Application
builders will need to know what schemas to draw on for different components
of their tools, but the complexity of the underlying documents shouldn't be
too much of a barrier.  On the other hand, maintaining referential
integrity (and resolving conflicting standards for similar concepts) across
multiple standards may be a real challenge.

Fundamentally, Jim's question about where to define "characters" addresses
this in a real world way.  For truly simple, widely accepted character
concepts either solution works, but where definitions diverge, both
solution have problems.  I think this has to be denormalized and rely on
data mining tools to rationalize equivalence of concepts across multiple
definitions of "flower color."

BTW, there is an interesting article on bioinformatic (my spell checker
suggests 'bondwoman, bookworm and bonbons' for bioinformatic!) databases
(focusing on PDB) in this weeks Nature.  They at least mention some of
these issues.


Julian Humphries
Geological Sciences
University of Texas at Austin
Austin, TX 78712

More information about the tdwg-content mailing list