Trying it out

Thu Aug 10 10:59:50 CEST 2000

My research group has finally been able to devote a couple of hours
together discussing the DDST V1.1 as it relates to the published treatments
of the Flora of North America and records that we are creating for the
EcoWatch Project. First, we quickly dismissed trying to fit FNA into the
spec since we ran into to many issues. We'll return to FNA another day but
we did decide that the PrairieWatch Butterflies of Illinois for EcoWatch
were much easier to deal with since we have full editorial control. So over
the next two weeks we'll be modifying the DTD for that project and working
up 20 or so example treatments (all at the species level.) We had some
immediate questions and comments even before we convert one treatment.

Treatment not equal to file:
The second paragraph of the General Requirements of the DDST v1.1 states
that "One file will comprise one treatment...". There is no reason to
require in the specification that a treatment = a file. This is because
there is good reason to synthesize XML treatments on the fly from databases
and allow users to query the XML structure to extract just the elements
that they need. Most of the proposed spec works fine it we say that "A
treatment is a structured presentation of information. The DDST provides a
uniform view of descriptive data for taxonomy but it does not specify the
underlying storage and representation of the information." We'll examples
of why I say this below.

Taxon Name:
We start right off with the Treatment Name. For the butterfly project at
least, the treatments are all about species. Is it reasonable to give each
treatment a name that equals the Taxon Name | Taxon ID? That is what we're
doing for now. In general, should the DDST include treatment naming
conventions linked to the Taxon ID? Of course you get into trouble if the
Taxon Name changes but you would be OK if the ID were kept constant.
So for now our treatments will have names like <Treatment Name>Papilio
polyxene</Treatment Name> but I am not happy with the decision. Any better
ideas out there? If so it would be helpful if those ideas were in the spec?
In our case the treatment name will also be a file name for now but
eventually parts will be in a database and we'll need a unique <Treatment ID>.

Description:
I think it is fine that there is a description element to the treatment but
we were confused about what should be being described; the taxon or really
the treatment. We decided that decided to take the spec at face value and
really include the treatment which is fairly boring. All our treatment
descriptions will be pretty much the same. <DESCRIPTION>This treatment was
created as part of the Biological Information Browsing Environment project
funded by NSF, UIUC... Information selected for the treatment is intended
to support tasks associated with the PrairieWatch project....</DESCRIPTION>
This uniformity makes me believe that many aspects in this description
should be removed from the treatment and moved "up" one level to a higher
order entity such as a collection of treatments or project. Several other
elements of the treatment format support that notion. It means that DDST
v1.1 is really a Treatment Spec. We also need a Collection Spec. This means
that the treatments should all have an element called the Project ID.

Treatment build/revision number
Treatment build/revision number is a really good idea and we all agreed
that we should have had it in our treatments before.
<TREATMENT REVISION>0.1</TREATMENT REVISION>
We use the 0 when the treatment is under development and not released yet.
I do not know how to deal with dynamic treatment build revision numbers.
Suggestions are welcome.

Contributors List:
Notice that if the information about collectors in the list such as contact
information is repeated in each treatment there is a data integrity
problem. People move. It might be possible that some treatments would have
one set of old contact information and not be updated while some treatments
might get updated information.

It should be that when a person requests a treatment in DDST v1.1 format
that they will get back a contributor list with ID, Name and Contact
Details but in most cases the system providing the information would insert
the contact information "on the fly". There does not seem to be a reason to
force the contributor IDs to be numeric as stated in the spec. Alpha
numeric can carry at least a little information to make the codes readable.
The current standard makes a relatively weak standard that the contributor
codes are unique to the treatment. I think we need to use a broader
definition. The should be unique to a collection at least. That way it
would be easy to find all treatments in a collection that a particular
person was a contributor for. Better if they were unique for the world but
that is more than this group can bite off I think.  DDST should at least
say that the Contributor ID is unique to a Collection.
For us we have
<CONTRIBUTOR LIST>
                <CONTRIBUTOR>
                <CONTRIBUTOR ID>ML1</CONTRIBUTOR ID>
                <CONTRIBUTOR NAME>Mary L.</CONTRIBUTOR NAME>
                <CONTACT DETAILS>
                        <EMAIL>maryl at uiuc.edu</EMAIL>
                        <MAILING ADDRESS>
                        Address here
                        </MAILING ADDRESS>
                </CONTACT DETAILS>
        </CONTRIBUTOR><CONTRIBUTOR>
                <CONTRIBUTOR ID>PBH1</CONTRIBUTOR ID>
                <CONTRIBUTOR NAME>P. Bryan Heidorn</CONTRIBUTOR NAME>
                <CONTACT DETAILS>
                        <EMAIL>PHEIDORN at UIUC.EDU</EMAIL>
                        <MAILING ADDRESS>
                        Graduate School of Library and Information Science,
                        University of Illinois at Urbana-Champaign,
                        501 East Daniel St., Champaign, IL  61820-6212
                        </MAILING ADDRESS>
                </CONTACT DETAILS>
        </CONTRIBUTOR>
<CONTRIBUTOR LIST>

Attribution:
Must this be a contributor? If so this information should be handles as a
property of <CONTRIBUTOR ROLE=PRINCIPLE|COPRINCIPLE> or as another tag of
<CONTRIBUTOR>
    .... <ROLE>PRINCIPLE</ROLE>....
Suggestions?

List of Sources:
Again, usually the underlying information might be kept in different files
or databases but the view of a treatment would include the fields of the
spec. For us the spec will include bibliographic references and "personal
observations" Again is seems that source ID should at least be unique to an
entire collection, not just to the treatment.
<LIST OF SOURCES>
        <SOURCE>
                <SOURCE ID>KL2000</SOURCE ID>
                <DESCRIPTION>Koller and Smith (2000). Butterflies of the Great West.
Biology Press: New York.
                </DESCRIPTION>
        </SOURCE>
        <SOURCE>
                <SOURCE ID>MJ2000</SOURCE ID>
                <DESCRIPTION>Mike Jacobs, personal communication August 10, 2000.
                </DESCRIPTION>
        </SOURCE>
</SOURCE LIST>

Actually in XML we don't need the <SOURCE LIST> but if is OK for now.

Sorry, I'm way out of time. I'll have to get back to our other comments
later. This is enough for one message in any case.

Cheers,
Bryan
--
--------------------------------------------------------------------
  P. Bryan Heidorn    Graduate School of Library and Information Science
  pheidorn at uiuc.edu   University of Illinois at Urbana-Champaign
  (V)217/ 244-7792    501 East Daniel St., Champaign, IL  61820-6212
  (F)217/ 244-3302    http://alexia.lis.uiuc.edu/~heidorn