Trying it out

Fri Aug 11 08:20:59 CEST 2000

Bryan wrote:
>My research group has finally been able to devote a couple of hours
>together discussing the DDST V1.1 as it relates to the published treatments
>of the Flora of North America and records that we are creating for the
>EcoWatch Project. First, we quickly dismissed trying to fit FNA into the
>spec since we ran into to many issues. We'll return to FNA another day but
>we did decide that the PrairieWatch Butterflies of Illinois for EcoWatch
>were much easier to deal with since we have full editorial control. So over
>the next two weeks we'll be modifying the DTD for that project and working
>up 20 or so example treatments (all at the species level.).  We had some
>immediate questions and comments even before we convert one treatment.

I am very interested in this approach Bryan as we are design what might be
a similar thing in association with the Flora of Australia and the Virtual
Australian Herbarium for handling taxon-based descriptive information as
published without the character level atomization of Delta, LucID and
similar approaches.   I had not mentioned it before because I thought it
was a bit off topic for where this list is going.  But since you raise it...

The atomization of descriptive data elements is definitely the way to go,
but attempts to start databasing our Flora in the past have foundered
because the whole exercise gets too complicated too quickly.  Much like on
this list, people developed clinical universal standard character list
fixation syndrome and lost sight of forest, not for the trees, but for the
primary anadromy of subjuvenile leaf segments in taxon X  :)

We are now pursuing a compromise but hopefully achievable approach of
handling flora treatments as partially parsed 'blobs' at the taxon level
(where the most common levels are family, genus, species, infraspecies,
etc), with an XML schema that breaks things up into readily identifiable
elements from the published work - name author, reference, publication,
synonymy, description, figures, distribution, ecology, notes, keys, etc. -
without the need fro huge amounts of manual intervent, interpretation,
rescoring, etc.

Admittedly there are serious limitations to this approach, but we believe
it will serve a useful purpose while the world and SDD gets it act together
on a universal approach to descriptive data.  It looks like we will be
storing the data in our Oracle database and spewing it out via XML for
internet delivery and various publication options.  And a range of XML
tools are arriving that will enable us to pour marked up treatments
straight into the database.

Have not got time to reply in detail right now, but this weekend I will try
and tidy up some of our attempts at a data definition for the project and
post it to the list.

As indicated by Bob's response we are going to have to pay particular
attention not to reinvent standards and definitions that already exist out
there...  it will be helpfudl for us to be able to piggyback on what your
group is doing...

cheers

jim