My research group has finally been able to devote a couple of hours together discussing the DDST V1.1 as it relates to the published treatments of the Flora of North America and records that we are creating for the EcoWatch Project. First, we quickly dismissed trying to fit FNA into the spec since we ran into to many issues. We'll return to FNA another day but we did decide that the PrairieWatch Butterflies of Illinois for EcoWatch were much easier to deal with since we have full editorial control. So over the next two weeks we'll be modifying the DTD for that project and working up 20 or so example treatments (all at the species level.) We had some immediate questions and comments even before we convert one treatment.
Treatment not equal to file: The second paragraph of the General Requirements of the DDST v1.1 states that "One file will comprise one treatment...". There is no reason to require in the specification that a treatment = a file. This is because there is good reason to synthesize XML treatments on the fly from databases and allow users to query the XML structure to extract just the elements that they need. Most of the proposed spec works fine it we say that "A treatment is a structured presentation of information. The DDST provides a uniform view of descriptive data for taxonomy but it does not specify the underlying storage and representation of the information." We'll examples of why I say this below.
Taxon Name: We start right off with the Treatment Name. For the butterfly project at least, the treatments are all about species. Is it reasonable to give each treatment a name that equals the Taxon Name | Taxon ID? That is what we're doing for now. In general, should the DDST include treatment naming conventions linked to the Taxon ID? Of course you get into trouble if the Taxon Name changes but you would be OK if the ID were kept constant. So for now our treatments will have names like <Treatment Name>Papilio polyxene</Treatment Name> but I am not happy with the decision. Any better ideas out there? If so it would be helpful if those ideas were in the spec? In our case the treatment name will also be a file name for now but eventually parts will be in a database and we'll need a unique <Treatment ID>.
Description: I think it is fine that there is a description element to the treatment but we were confused about what should be being described; the taxon or really the treatment. We decided that decided to take the spec at face value and really include the treatment which is fairly boring. All our treatment descriptions will be pretty much the same. <DESCRIPTION>This treatment was created as part of the Biological Information Browsing Environment project funded by NSF, UIUC... Information selected for the treatment is intended to support tasks associated with the PrairieWatch project....</DESCRIPTION> This uniformity makes me believe that many aspects in this description should be removed from the treatment and moved "up" one level to a higher order entity such as a collection of treatments or project. Several other elements of the treatment format support that notion. It means that DDST v1.1 is really a Treatment Spec. We also need a Collection Spec. This means that the treatments should all have an element called the Project ID.
Treatment build/revision number Treatment build/revision number is a really good idea and we all agreed that we should have had it in our treatments before. <TREATMENT REVISION>0.1</TREATMENT REVISION> We use the 0 when the treatment is under development and not released yet. I do not know how to deal with dynamic treatment build revision numbers. Suggestions are welcome.
Contributors List: Notice that if the information about collectors in the list such as contact information is repeated in each treatment there is a data integrity problem. People move. It might be possible that some treatments would have one set of old contact information and not be updated while some treatments might get updated information.
It should be that when a person requests a treatment in DDST v1.1 format that they will get back a contributor list with ID, Name and Contact Details but in most cases the system providing the information would insert the contact information "on the fly". There does not seem to be a reason to force the contributor IDs to be numeric as stated in the spec. Alpha numeric can carry at least a little information to make the codes readable. The current standard makes a relatively weak standard that the contributor codes are unique to the treatment. I think we need to use a broader definition. The should be unique to a collection at least. That way it would be easy to find all treatments in a collection that a particular person was a contributor for. Better if they were unique for the world but that is more than this group can bite off I think. DDST should at least say that the Contributor ID is unique to a Collection. For us we have <CONTRIBUTOR LIST> <CONTRIBUTOR> <CONTRIBUTOR ID>ML1</CONTRIBUTOR ID> <CONTRIBUTOR NAME>Mary L.</CONTRIBUTOR NAME> <CONTACT DETAILS> <EMAIL>maryl@uiuc.edu</EMAIL> <MAILING ADDRESS> Address here </MAILING ADDRESS> </CONTACT DETAILS> </CONTRIBUTOR><CONTRIBUTOR> <CONTRIBUTOR ID>PBH1</CONTRIBUTOR ID> <CONTRIBUTOR NAME>P. Bryan Heidorn</CONTRIBUTOR NAME> <CONTACT DETAILS> <EMAIL>PHEIDORN@UIUC.EDU</EMAIL> <MAILING ADDRESS> Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 501 East Daniel St., Champaign, IL 61820-6212 </MAILING ADDRESS> </CONTACT DETAILS> </CONTRIBUTOR> <CONTRIBUTOR LIST>
Attribution: Must this be a contributor? If so this information should be handles as a property of <CONTRIBUTOR ROLE=PRINCIPLE|COPRINCIPLE> or as another tag of <CONTRIBUTOR> .... <ROLE>PRINCIPLE</ROLE>.... Suggestions?
List of Sources: Again, usually the underlying information might be kept in different files or databases but the view of a treatment would include the fields of the spec. For us the spec will include bibliographic references and "personal observations" Again is seems that source ID should at least be unique to an entire collection, not just to the treatment. <LIST OF SOURCES> <SOURCE> <SOURCE ID>KL2000</SOURCE ID> <DESCRIPTION>Koller and Smith (2000). Butterflies of the Great West. Biology Press: New York. </DESCRIPTION> </SOURCE> <SOURCE> <SOURCE ID>MJ2000</SOURCE ID> <DESCRIPTION>Mike Jacobs, personal communication August 10, 2000. </DESCRIPTION> </SOURCE> </SOURCE LIST>
Actually in XML we don't need the <SOURCE LIST> but if is OK for now.
Sorry, I'm way out of time. I'll have to get back to our other comments later. This is enough for one message in any case.
Cheers, Bryan -- -------------------------------------------------------------------- P. Bryan Heidorn Graduate School of Library and Information Science pheidorn@uiuc.edu University of Illinois at Urbana-Champaign (V)217/ 244-7792 501 East Daniel St., Champaign, IL 61820-6212 (F)217/ 244-3302 http://alexia.lis.uiuc.edu/~heidorn