Date: Thu, 15 Apr 2004 20:03:07 -0500
To: TDWG - Structure of Descriptive Data <TDWG-SDD@LISTSERV.NHM.KU.EDU>
From: "P. Bryan Heidorn" <pheidorn@uiuc.edu>
Subject: Re: SEEK Project and TDWG-SDD
Cc: pheidorn@uiuc.edu

My notes below are pushing it a bit far but i had time to think about this on the plane back from Kansas and the SEEK project today.

At 11:44 AM 4/15/2004, Jim Beach wrote:
On Thu, 15 Apr 2004 10:54:36 -0500, Julian H <humphries@MAIL.UTEXAS.EDU>
wrote:

>At 10:44 AM 4/15/2004, you wrote:
>>single discussion, but it struck me that TDWG-SDD has an opportunity to
>>have much broader acceptance and support if your schema was not designed
>>as a single data object--to contain both the metadata about the package
>>(or work or whatever you refer to it as) *and* the descriptive data that
>>describe the individual concepts.
>
>Another novice (pre-novice) here, are you specifically referring to
>separating out taxonomic concept information (metadata) from the
>descriptive data?

No, I was thinking of seperating the metadata about the package "This is a
data set of Magnolias from FNA, it was assembled by, organized by, dates,
etc.) from the data describing the character states of the individual
taxa.  So a good question is what do you do with the character
definitions! It seems the character state values without the character
definitions would not be of much use for any system to interpret the
meaning of the states.  Two options, de-normalize the character
definitions and put them in each concept schema, or two have a separate
server, and an external reference in the data schema that, has the
character definitions. Not sure how that choice would play out.

To this point we avoided the character naming issue, I think in part because of the controversy that is associated with his issue. The general external reference mechanism described in Gregor's later post, is an attempt to push off that controversy, in part to a system outside of SDD. That does have the advantage of simplifying SDD a bit which is greatly needed.

It might be worth considering how the character definition service might actually work. Building on Jim's ideas, we could imagine a mechanism that would allow anyone (with authority) to add definitions
and globally unique identifiers for the characters. It may beyond our financial means now but there could be a service where one could send a character / or state definition, including a current context of application. It would be date stamped. Open to some very minimal revision permission. New global ideas would need to be created is the definition were changed or expanded. Of course the definition of a character might need to refer to a other registered character groups, characters or states.

if GUCID is a global unique character ID
You might sent the registration service the following XML document and get a GUCID back in return. Now any reference to this group in any species description in earth could cite this GUCID as part of a description. In this definition we could include GUCID references to the necessary components.

<CharacterGroup>
  <CharacterGroupName name="flowers"
GUCID=f???">
<Context>angiosperms</Context>
flowers</CharacterGroupName>
<LegalValue name="inflorescence_position" GUCID="1ej48dhk"></LegalValue>
<LegalValue name="inflorescence_type" GUCID=NSNJKNDJBY248N"></LegalValue>
… Many more here …
 <Definition>Sexual reproduction apparatus of a plant</Definition>
   <Synonym></Synonym>
   <BroaderTerm></BroaderTerm>
   <NarrowerTerm></NarrowerTerm>
   <RelatedTerm></RelatedTerm>
</CharacterGroup>

using a collection of character definitions defined outside you could have a stand alone description. Given two taxonomic descriptions, you could decide if they are using the same definitions for their characters by looking at the GUCIDs. I well formed character matrix could be constructed from the intersection of like characters in a collection of taxonomic descriptions. I well organized project might want to decide which characters to use ahead of the building the descriptions to make sure there are not too many conflicts in the definitions. Perhaps context would help with some of the conflicts.

Not that two projects could use the word "flower" to mean two completely different things, and a computer programs could know this because they would have different GUCIDs

There is no reason not to create character types and registration to handle gene sequences or whatever. I think this is already covered very well  in SDD. In fact, I think the character definition section of SDD could be used almost exactly the way it is now except that we would rely on the existence of the global character registry.


I know this is a bit radical and maybe heavy handed but i do not think it is all that difficult given the structures already in SDD.

I think we would still need the certainty mechanisms in SDD to support the certainty or prevalence of a character/state within an individual species or taxon. (almost always red flowers.)

>
>>If the taxa/concepts had their own schemas and were linked to the
>>package metadata with a GUID, maybe a DOI or some other globally unique
>>identifier, then the XML concept data sets could be used for other
>>systems like concept based classification or database management
>>systems.
....snip.....
if the taxon data sets (and maybe also their character definitions)we in
sperate XML documents, then we could use them as fodder for other concept
systems.


... snip .....
>incomplete SDD data sets?  More on dataset archives in the next email.

People serving SDD data sets thorugh the web, would presumably be aware of
data set integrity issues and make sure their SDD packages were complete.

I think the registration of the individual taxonomic descriptions is another very parallel issue. Relatively easy to do either as collections of treatments as are now in SDD or as standalone treatments with globally defined character sets.

Regards,
Bryan

--------------------------------------------------------------------
  P. Bryan Heidorn    Graduate School of Library and Information Science
  pheidorn@uiuc.edu   University of Illinois at Urbana-Champaign MC-493
  (V)217/ 244-7792    Rm. 221, 501 East Daniel St., Champaign, IL  61820-6212
  (F)217/ 244-3302    http://alexia.lis.uiuc.edu/~heidorn
  Calendar: http://calendar.yahoo.com/pbheidorn
  Visit the Biobrowser Web site at http://www.biobrowser.org, http://www.isrl.uiuc.edu/~telenature, http://www.isrl.uiuc.edu/~openkey