Re: SEEK Project and TDWG-SDD

15 Apr 2004

      On Thu, 15 Apr 2004 10:54:36 -0500, Julian H <humphries@MAIL.UTEXAS.EDU>
wrote:
...
At 10:44 AM 4/15/2004, you wrote:
...
single discussion, but it struck me that TDWG-SDD has an opportunity to
have much broader acceptance and support if your schema was not designed
as a single data object--to contain both the metadata about the package
(or work or whatever you refer to it as) *and* the descriptive data that
describe the individual concepts.
Another novice (pre-novice) here, are you specifically referring to
separating out taxonomic concept information (metadata) from the
descriptive data?
No, I was thinking of seperating the metadata about the package "This is a
data set of Magnolias from FNA, it was assembled by, organized by, dates,
etc.) from the data describing the character states of the individual
taxa.  So a good question is what do you do with the character
definitions! It seems the character state values without the character
definitions would not be of much use for any system to interpret the
meaning of the states.  Two options, de-normalize the character
definitions and put them in each concept schema, or two have a separate
server, and an external reference in the data schema that, has the
character definitions. Not sure how that choice would play out.
...
...
If the taxa/concepts had their own schemas and were linked to the
package metadata with a GUID, maybe a DOI or some other globally unique
identifier, then the XML concept data sets could be used for other
systems like concept based classification or database management
systems.
Could you write this sentence with a few more words?  I'm want to be sure
I
...
get the concept.
How about an ASCII graphic?  I'm on thin ice, but if the metadata for the
package is this:

MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

and the individual taxon character state sets are this:

taxon1       taxon2      taxon3
char-val-1   char-val-1  etc.
char-val-2   etc.
char-val-3

if the taxon data sets (and maybe also their character definitions)we in
sperate XML documents, then we could use them as fodder for other concept
systems.
...
The overhead for the traditional diagnostic identification software
...
makers would be that the XML parts would need to assembled for the
various applications that use the data and there would be the potential
risk that SDD data sets would be incomplete, if there were some careless
file management.
What parts could get lost? the taxonomic parts?
yes, if you had multiple xml docs for the same 'diagnostic package' they
would be managed as distinct files.
...
...
But presumably you guys are thinking about a registry
or distributed federation of these data sets anyway, where they would be
archived and served intact from a trusted source.
Um, now I am really lost, amplify please? What does this have to do with
incomplete SDD data sets?  More on dataset archives in the next email.
People serving SDD data sets thorugh the web, would presumably be aware of
data set integrity issues and make sure their SDD packages were complete.
...
...
I also understand that data sets of diagnostic identification
information are far from complete descriptions of concepts in either a
taxonomic or phylogenetic sense, but if the SDD concept schema could
accommodate additional characters, then the opportunity would be there
for other people to use SDD for other kinds of systems.  The UI of
diagnostic key programs would likely not need to use or display DNA
sequences for interactive identification, but no harm done, they could
just ignore fields of no use to the program at hand.
Ok, now we are getting to something I know about.  See the next email for
some comments on this...
Julian
Julian Humphries
DigiMorph.Org
Geological Sciences
University of Texas at Austin
Austin, TX 78712
512-471-3275

Re: SEEK Project and TDWG-SDD

Jim Beach