Bryan wrote:
My research group has finally been able to devote a couple of hours together discussing the DDST V1.1 as it relates to the published treatments of the Flora of North America and records that we are creating for the EcoWatch Project. First, we quickly dismissed trying to fit FNA into the spec since we ran into to many issues. We'll return to FNA another day but we did decide that the PrairieWatch Butterflies of Illinois for EcoWatch were much easier to deal with since we have full editorial control. So over the next two weeks we'll be modifying the DTD for that project and working up 20 or so example treatments (all at the species level.). We had some immediate questions and comments even before we convert one treatment.
I am very interested in this approach Bryan as we are design what might be a similar thing in association with the Flora of Australia and the Virtual Australian Herbarium for handling taxon-based descriptive information as published without the character level atomization of Delta, LucID and similar approaches. I had not mentioned it before because I thought it was a bit off topic for where this list is going. But since you raise it...
The atomization of descriptive data elements is definitely the way to go, but attempts to start databasing our Flora in the past have foundered because the whole exercise gets too complicated too quickly. Much like on this list, people developed clinical universal standard character list fixation syndrome and lost sight of forest, not for the trees, but for the primary anadromy of subjuvenile leaf segments in taxon X :)
We are now pursuing a compromise but hopefully achievable approach of handling flora treatments as partially parsed 'blobs' at the taxon level (where the most common levels are family, genus, species, infraspecies, etc), with an XML schema that breaks things up into readily identifiable elements from the published work - name author, reference, publication, synonymy, description, figures, distribution, ecology, notes, keys, etc. - without the need fro huge amounts of manual intervent, interpretation, rescoring, etc.
Admittedly there are serious limitations to this approach, but we believe it will serve a useful purpose while the world and SDD gets it act together on a universal approach to descriptive data. It looks like we will be storing the data in our Oracle database and spewing it out via XML for internet delivery and various publication options. And a range of XML tools are arriving that will enable us to pour marked up treatments straight into the database.
Have not got time to reply in detail right now, but this weekend I will try and tidy up some of our attempts at a data definition for the project and post it to the list.
As indicated by Bob's response we are going to have to pay particular attention not to reinvent standards and definitions that already exist out there... it will be helpfudl for us to be able to piggyback on what your group is doing...
cheers
jim
participants (1)
-
Jim Croft