Fwd: Re: SEEK Project and TDWG-SDD
Date: Thu, 15 Apr 2004 20:03:07 -0500 To: TDWG - Structure of Descriptive Data TDWG-SDD@LISTSERV.NHM.KU.EDU From: "P. Bryan Heidorn" pheidorn@uiuc.edu Subject: Re: SEEK Project and TDWG-SDD Cc: pheidorn@uiuc.edu
My notes below are pushing it a bit far but i had time to think about this on the plane back from Kansas and the SEEK project today.
At 11:44 AM 4/15/2004, Jim Beach wrote:
On Thu, 15 Apr 2004 10:54:36 -0500, Julian H humphries@MAIL.UTEXAS.EDU wrote:
At 10:44 AM 4/15/2004, you wrote:
single discussion, but it struck me that TDWG-SDD has an opportunity to have much broader acceptance and support if your schema was not designed as a single data object--to contain both the metadata about the package (or work or whatever you refer to it as) *and* the descriptive data that describe the individual concepts.
Another novice (pre-novice) here, are you specifically referring to separating out taxonomic concept information (metadata) from the descriptive data?
No, I was thinking of seperating the metadata about the package "This is a data set of Magnolias from FNA, it was assembled by, organized by, dates, etc.) from the data describing the character states of the individual taxa. So a good question is what do you do with the character definitions! It seems the character state values without the character definitions would not be of much use for any system to interpret the meaning of the states. Two options, de-normalize the character definitions and put them in each concept schema, or two have a separate server, and an external reference in the data schema that, has the character definitions. Not sure how that choice would play out.
To this point we avoided the character naming issue, I think in part because of the controversy that is associated with his issue. The general external reference mechanism described in Gregor's later post, is an attempt to push off that controversy, in part to a system outside of SDD. That does have the advantage of simplifying SDD a bit which is greatly needed.
It might be worth considering how the character definition service might actually work. Building on Jim's ideas, we could imagine a mechanism that would allow anyone (with authority) to add definitions and globally unique identifiers for the characters. It may beyond our financial means now but there could be a service where one could send a character / or state definition, including a current context of application. It would be date stamped. Open to some very minimal revision permission. New global ideas would need to be created is the definition were changed or expanded. Of course the definition of a character might need to refer to a other registered character groups, characters or states.
if GUCID is a global unique character ID You might sent the registration service the following XML document and get a GUCID back in return. Now any reference to this group in any species description in earth could cite this GUCID as part of a description. In this definition we could include GUCID references to the necessary components.
<CharacterGroup> <CharacterGroupName name="flowers" GUCID=f???"> <Context>angiosperms</Context> flowers</CharacterGroupName> <LegalValue name="inflorescence_position" GUCID="1ej48dhk"></LegalValue> <LegalValue name="inflorescence_type" GUCID=NSNJKNDJBY248N"></LegalValue> Many more here <Definition>Sexual reproduction apparatus of a plant</Definition> <Synonym></Synonym> <BroaderTerm></BroaderTerm> <NarrowerTerm></NarrowerTerm> <RelatedTerm></RelatedTerm> </CharacterGroup>
using a collection of character definitions defined outside you could have a stand alone description. Given two taxonomic descriptions, you could decide if they are using the same definitions for their characters by looking at the GUCIDs. I well formed character matrix could be constructed from the intersection of like characters in a collection of taxonomic descriptions. I well organized project might want to decide which characters to use ahead of the building the descriptions to make sure there are not too many conflicts in the definitions. Perhaps context would help with some of the conflicts.
Not that two projects could use the word "flower" to mean two completely different things, and a computer programs could know this because they would have different GUCIDs
There is no reason not to create character types and registration to handle gene sequences or whatever. I think this is already covered very well in SDD. In fact, I think the character definition section of SDD could be used almost exactly the way it is now except that we would rely on the existence of the global character registry.
I know this is a bit radical and maybe heavy handed but i do not think it is all that difficult given the structures already in SDD.
I think we would still need the certainty mechanisms in SDD to support the certainty or prevalence of a character/state within an individual species or taxon. (almost always red flowers.)
If the taxa/concepts had their own schemas and were linked to the package metadata with a GUID, maybe a DOI or some other globally unique identifier, then the XML concept data sets could be used for other systems like concept based classification or database management systems.
....snip..... if the taxon data sets (and maybe also their character definitions)we in sperate XML documents, then we could use them as fodder for other concept systems.
... snip .....
incomplete SDD data sets? More on dataset archives in the next email.
People serving SDD data sets thorugh the web, would presumably be aware of data set integrity issues and make sure their SDD packages were complete.
I think the registration of the individual taxonomic descriptions is another very parallel issue. Relatively easy to do either as collections of treatments as are now in SDD or as standalone treatments with globally defined character sets.
Regards, Bryan
-------------------------------------------------------------------- P. Bryan Heidorn Graduate School of Library and Information Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign MC-493 (V)217/ 244-7792 Rm. 221, 501 East Daniel St., Champaign, IL 61820-6212 (F)217/ 244-3302 http://alexia.lis.uiuc.edu/~heidorn Calendar: http://calendar.yahoo.com/pbheidorn Visit the Biobrowser Web site at http://www.biobrowser.org, http://www.isrl.uiuc.edu/~telenature, http://www.isrl.uiuc.edu/~openkey
participants (1)
-
P. Bryan Heidorn