Fwd: Re: SEEK Project and TDWG-SDD

P. Bryan Heidorn heidorn at ALEXIA.LIS.UIUC.EDU
Thu Apr 15 20:12:26 CEST 2004


Date: Thu, 15 Apr 2004 20:03:07 -0500
To: TDWG - Structure of Descriptive Data <TDWG-SDD at LISTSERV.NHM.KU.EDU>
From: "P. Bryan Heidorn" <pheidorn at uiuc.edu>
Subject: Re: SEEK Project and TDWG-SDD
Cc: pheidorn at uiuc.edu

My notes below are pushing it a bit far but i had time to think about this 
on the plane back from Kansas and the SEEK project today.

At 11:44 AM 4/15/2004, Jim Beach wrote:
>On Thu, 15 Apr 2004 10:54:36 -0500, Julian H <humphries at MAIL.UTEXAS.EDU>
>wrote:
>
> >At 10:44 AM 4/15/2004, you wrote:
> >>single discussion, but it struck me that TDWG-SDD has an opportunity to
> >>have much broader acceptance and support if your schema was not designed
> >>as a single data object--to contain both the metadata about the package
> >>(or work or whatever you refer to it as) *and* the descriptive data that
> >>describe the individual concepts.
> >
> >Another novice (pre-novice) here, are you specifically referring to
> >separating out taxonomic concept information (metadata) from the
> >descriptive data?
>
>No, I was thinking of seperating the metadata about the package "This is a
>data set of Magnolias from FNA, it was assembled by, organized by, dates,
>etc.) from the data describing the character states of the individual
>taxa.  So a good question is what do you do with the character
>definitions! It seems the character state values without the character
>definitions would not be of much use for any system to interpret the
>meaning of the states.  Two options, de-normalize the character
>definitions and put them in each concept schema, or two have a separate
>server, and an external reference in the data schema that, has the
>character definitions. Not sure how that choice would play out.

To this point we avoided the character naming issue, I think in part 
because of the controversy that is associated with his issue. The general 
external reference mechanism described in Gregor's later post, is an 
attempt to push off that controversy, in part to a system outside of SDD. 
That does have the advantage of simplifying SDD a bit which is greatly needed.

It might be worth considering how the character definition service might 
actually work. Building on Jim's ideas, we could imagine a mechanism that 
would allow anyone (with authority) to add definitions
and globally unique identifiers for the characters. It may beyond our 
financial means now but there could be a service where one could send a 
character / or state definition, including a current context of 
application. It would be date stamped. Open to some very minimal revision 
permission. New global ideas would need to be created is the definition 
were changed or expanded. Of course the definition of a character might 
need to refer to a other registered character groups, characters or states.

if GUCID is a global unique character ID
You might sent the registration service the following XML document and get 
a GUCID back in return. Now any reference to this group in any species 
description in earth could cite this GUCID as part of a description. In 
this definition we could include GUCID references to the necessary components.

<CharacterGroup>
   <CharacterGroupName name="flowers"
GUCID=f???">
<Context>angiosperms</Context>
flowers</CharacterGroupName>
<LegalValue name="inflorescence_position" GUCID="1ej48dhk"></LegalValue>
<LegalValue name="inflorescence_type" GUCID=NSNJKNDJBY248N"></LegalValue>

 Many more here 

  <Definition>Sexual reproduction apparatus of a plant</Definition>
    <Synonym></Synonym>
    <BroaderTerm></BroaderTerm>
    <NarrowerTerm></NarrowerTerm>
    <RelatedTerm></RelatedTerm>
</CharacterGroup>

using a collection of character definitions defined outside you could have 
a stand alone description. Given two taxonomic descriptions, you could 
decide if they are using the same definitions for their characters by 
looking at the GUCIDs. I well formed character matrix could be constructed 
from the intersection of like characters in a collection of taxonomic 
descriptions. I well organized project might want to decide which 
characters to use ahead of the building the descriptions to make sure there 
are not too many conflicts in the definitions. Perhaps context would help 
with some of the conflicts.

Not that two projects could use the word "flower" to mean two completely 
different things, and a computer programs could know this because they 
would have different GUCIDs

There is no reason not to create character types and registration to handle 
gene sequences or whatever. I think this is already covered very well  in 
SDD. In fact, I think the character definition section of SDD could be used 
almost exactly the way it is now except that we would rely on the existence 
of the global character registry.


I know this is a bit radical and maybe heavy handed but i do not think it 
is all that difficult given the structures already in SDD.

I think we would still need the certainty mechanisms in SDD to support the 
certainty or prevalence of a character/state within an individual species 
or taxon. (almost always red flowers.)

> >
> >>If the taxa/concepts had their own schemas and were linked to the
> >>package metadata with a GUID, maybe a DOI or some other globally unique
> >>identifier, then the XML concept data sets could be used for other
> >>systems like concept based classification or database management
> >>systems.
>....snip.....
>if the taxon data sets (and maybe also their character definitions)we in
>sperate XML documents, then we could use them as fodder for other concept
>systems.
>
>
>... snip .....
> >incomplete SDD data sets?  More on dataset archives in the next email.
>
>People serving SDD data sets thorugh the web, would presumably be aware of
>data set integrity issues and make sure their SDD packages were complete.

I think the registration of the individual taxonomic descriptions is 
another very parallel issue. Relatively easy to do either as collections of 
treatments as are now in SDD or as standalone treatments with globally 
defined character sets.

Regards,
Bryan

--------------------------------------------------------------------
   P. Bryan Heidorn    Graduate School of Library and Information Science
   pheidorn at uiuc.edu   University of Illinois at Urbana-Champaign MC-493
   (V)217/ 244-7792    Rm. 221, 501 East Daniel St., Champaign, IL  61820-6212
   (F)217/ 244-3302    http://alexia.lis.uiuc.edu/~heidorn
   Calendar: http://calendar.yahoo.com/pbheidorn
   Visit the Biobrowser Web site at http://www.biobrowser.org, 
http://www.isrl.uiuc.edu/~telenature, http://www.isrl.uiuc.edu/~openkey

--===========_5288885=.ALT
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
<body>
<br>
Date: Thu, 15 Apr 2004 20:03:07 -0500<br>
To: TDWG - Structure of Descriptive Data
&lt;TDWG-SDD at LISTSERV.NHM.KU.EDU&gt;<br>
From: &quot;P. Bryan Heidorn&quot; &lt;pheidorn at uiuc.edu&gt;<br>
Subject: Re: SEEK Project and TDWG-SDD<br>
Cc: pheidorn at uiuc.edu<br><br>
My notes below are pushing it a bit far but i had time to think about
this on the plane back from Kansas and the SEEK project today.<br><br>
At 11:44 AM 4/15/2004, Jim Beach wrote:<br>
<blockquote type=cite class=cite cite>On Thu, 15 Apr 2004 10:54:36 -0500,
Julian H &lt;humphries at MAIL.UTEXAS.EDU&gt;<br>
wrote:<br><br>
&gt;At 10:44 AM 4/15/2004, you wrote:<br>
&gt;&gt;single discussion, but it struck me that TDWG-SDD has an
opportunity to<br>
&gt;&gt;have much broader acceptance and support if your schema was not
designed<br>
&gt;&gt;as a single data object--to contain both the metadata about the
package<br>
&gt;&gt;(or work or whatever you refer to it as) *and* the descriptive
data that<br>
&gt;&gt;describe the individual concepts.<br>
&gt;<br>
&gt;Another novice (pre-novice) here, are you specifically referring
to<br>
&gt;separating out taxonomic concept information (metadata) from 
the<br>
&gt;descriptive data?<br><br>
No, I was thinking of seperating the metadata about the package
&quot;This is a<br>
data set of Magnolias from FNA, it was assembled by, organized by,
dates,<br>
etc.) from the data describing the character states of the
individual<br>
taxa.&nbsp; So a good question is what do you do with the character<br>
definitions! It seems the character state values without the
character<br>
definitions would not be of much use for any system to interpret 
the<br>
meaning of the states.&nbsp; Two options, de-normalize the 
character<br>
definitions and put them in each concept schema, or two have a
separate<br>
server, and an external reference in the data schema that, has the<br>
character definitions. Not sure how that choice would play
out.</blockquote><br>
To this point we avoided the character naming issue, I think in part
because of the controversy that is associated with his issue. The general
external reference mechanism described in Gregor's later post, is an
attempt to push off that controversy, in part to a system outside of SDD.
That does have the advantage of simplifying SDD a bit which is greatly
needed.<br><br>
It might be worth considering how the character definition service might
actually work. Building on Jim's ideas, we could imagine a mechanism that
would allow anyone (with authority) to add definitions <br>
and globally unique identifiers for the characters. It may beyond our
financial means now but there could be a service where one could send a
character / or state definition, including a current context of
application. It would be date stamped. Open to some very minimal revision
permission. New global ideas would need to be created is the definition
were changed or expanded. Of course the definition of a character might
need to refer to a other registered character groups, characters or
states.<br><br>
if GUCID is a global unique character ID<br>
You might sent the registration service the following XML document and
get a GUCID back in return. Now any reference to this group in any
species description in earth could cite this GUCID as part of a
description. In this definition we could include GUCID references to the
necessary components.<br><br>
&lt;CharacterGroup&gt;<br>
&nbsp; &lt;<b>CharacterGroupName name=&quot;flowers&quot;</b> <br>
GUCID=f???&quot;&gt;<br>
&lt;Context&gt;angiosperms&lt;/Context&gt;<br>
flowers&lt;/CharacterGroupName&gt;<br>
&lt;LegalValue name=&quot;inflorescence_position&quot;
GUCID=&quot;1ej48dhk&quot;&gt;&lt;/LegalValue&gt;<br>
&lt;LegalValue name=&quot;inflorescence_type&quot;
GUCID=NSNJKNDJBY248N&quot;&gt;&lt;/LegalValue&gt;<br>

 Many more here 
<br>
&nbsp;&lt;Definition&gt;Sexual reproduction apparatus of a
plant&lt;/Definition&gt;<br>
&nbsp;&nbsp; &lt;Synonym&gt;&lt;/Synonym&gt;<br>
&nbsp;&nbsp; &lt;BroaderTerm&gt;&lt;/BroaderTerm&gt;<br>
&nbsp;&nbsp; &lt;NarrowerTerm&gt;&lt;/NarrowerTerm&gt;<br>
&nbsp;&nbsp; &lt;RelatedTerm&gt;&lt;/RelatedTerm&gt;<br>
&lt;/CharacterGroup&gt;<br><br>
using a collection of character definitions defined outside you could
have a stand alone description. Given two taxonomic descriptions, you
could decide if they are using the same definitions for their characters
by looking at the GUCIDs. I well formed character matrix could be
constructed from the intersection of like characters in a collection of
taxonomic descriptions. I well organized project might want to decide
which characters to use ahead of the building the descriptions to make
sure there are not too many conflicts in the definitions. Perhaps context
would help with some of the conflicts.<br><br>
Not that two projects could use the word &quot;flower&quot; to mean two
completely different things, and a computer programs could know this
because they would have different GUCIDs<br><br>
There is no reason not to create character types and registration to
handle gene sequences or whatever. I think this is already covered very
well&nbsp; in SDD. In fact, I think the character definition section of
SDD could be used almost exactly the way it is now except that we would
rely on the existence of the global character registry.<br><br>
<br>
I know this is a bit radical and maybe heavy handed but i do not think it
is all that difficult given the structures already in SDD.<br><br>
I think we would still need the certainty mechanisms in SDD to support
the certainty or prevalence of a character/state within an individual
species or taxon. (almost always red flowers.)<br><br>
<blockquote type=cite class=cite cite>&gt;<br>
&gt;&gt;If the taxa/concepts had their own schemas and were linked to
the<br>
&gt;&gt;package metadata with a GUID, maybe a DOI or some other globally
unique<br>
&gt;&gt;identifier, then the XML concept data sets could be used for
other<br>
&gt;&gt;systems like concept based classification or database
management<br>
&gt;&gt;systems.<br>
....snip.....<br>
if the taxon data sets (and maybe also their character definitions)we
in<br>
sperate XML documents, then we could use them as fodder for other
concept<br>
systems.<br><br>
<br>
... snip .....<br>
&gt;incomplete SDD data sets?&nbsp; More on dataset archives in the next
email.<br><br>
People serving SDD data sets thorugh the web, would presumably be aware
of<br>
data set integrity issues and make sure their SDD packages were
complete.</blockquote><br>
I think the registration of the individual taxonomic descriptions is
another very parallel issue. Relatively easy to do either as collections
of treatments as are now in SDD or as standalone treatments with globally
defined character sets.<br><br>
Regards,<br>
Bryan <br>
<x-sigsep><p></x-sigsep>
--------------------------------------------------------------------<br>
&nbsp; P. Bryan Heidorn&nbsp;&nbsp;&nbsp; Graduate School of Library and
Information Science<br>
&nbsp; pheidorn at uiuc.edu&nbsp;&nbsp; University of Illinois at
Urbana-Champaign MC-493<br>
&nbsp; (V)217/ 244-7792&nbsp;&nbsp;&nbsp; Rm. 221, 501 East Daniel St.,
Champaign, IL&nbsp; 61820-6212<br>
&nbsp; (F)217/ 244-3302&nbsp;&nbsp;&nbsp;
<a href="http://alexia.lis.uiuc.edu/~heidorn" eudora="autourl">http://alexia.lis.uiuc.edu/~heidorn<br>
</a>&nbsp; Calendar:
<a href="http://calendar.yahoo.com/pbheidorn" eudora="autourl">http://calendar.yahoo.com/pbheidorn<br>
</a>&nbsp; Visit the Biobrowser Web site at
<a href="http://www.biobrowser.org/" eudora="autourl">http://www.biobrowser.org</a>,
<a href="http://www.isrl.uiuc.edu/~telenature" eudora="autourl">http://www.isrl.uiuc.edu/~telenature</a>,
<a href="http://www.isrl.uiuc.edu/~openkey" eudora="autourl">http://www.isrl.uiuc.edu/~openkey<br>
</a></body>
</html>


More information about the tdwg-content mailing list