Trying it out

Kevin Thiele kevin.thiele at PI.CSIRO.AU
Mon Aug 14 15:28:20 CEST 2000


Bryan - great!

| Treatment not equal to file:
| The second paragraph of the General Requirements of the DDST v1.1 states
| that "One file will comprise one treatment...". There is no reason to
| require in the specification that a treatment = a file. This is because
| there is good reason to synthesize XML treatments on the fly from
databases
| and allow users to query the XML structure to extract just the elements
| that they need. Most of the proposed spec works fine it we say that "A
| treatment is a structured presentation of information. The DDST provides a
| uniform view of descriptive data for taxonomy but it does not specify the
| underlying storage and representation of the information." We'll examples
| of why I say this below.
|
| Taxon Name:
| We start right off with the Treatment Name. For the butterfly project at
| least, the treatments are all about species. Is it reasonable to give each
| treatment a name that equals the Taxon Name | Taxon ID? That is what we're
| doing for now. In general, should the DDST include treatment naming
| conventions linked to the Taxon ID? Of course you get into trouble if the
| Taxon Name changes but you would be OK if the ID were kept constant.
| So for now our treatments will have names like <Treatment Name>Papilio
| polyxene</Treatment Name> but I am not happy with the decision. Any better
| ideas out there? If so it would be helpful if those ideas were in the
spec?
| In our case the treatment name will also be a file name for now but
| eventually parts will be in a database and we'll need a unique <Treatment
ID>.
|
| Description:
| I think it is fine that there is a description element to the treatment
but
| we were confused about what should be being described; the taxon or really
| the treatment. We decided that decided to take the spec at face value and
| really include the treatment which is fairly boring. All our treatment
| descriptions will be pretty much the same. <DESCRIPTION>This treatment was
| created as part of the Biological Information Browsing Environment project
| funded by NSF, UIUC... Information selected for the treatment is intended
| to support tasks associated with the PrairieWatch
project....</DESCRIPTION>
| This uniformity makes me believe that many aspects in this description
| should be removed from the treatment and moved "up" one level to a higher
| order entity such as a collection of treatments or project. Several other
| elements of the treatment format support that notion. It means that DDST
| v1.1 is really a Treatment Spec. We also need a Collection Spec. This
means
| that the treatments should all have an element called the Project ID.

This is getting places.

Perhaps I'm stuck still in the way Lucid and DELTA do things. There, a
treatment is not a description of one taxon, but a set of descriptions for a
set of taxa - e.g. all the species in genus X, or all the trees of Victoria.
This is because the treatment is viewed as having a definite objective or
product - storing data that will become an interactive key to the trees of
Victoria, or a set of natural-language descriptions of the species in genus
X.
You're handling it very differently, with each treatment the description of
one taxon, the treatments then bundled into a project defined (presumably)
by the objective. The advantage of this is being able to bundle individual
treatments in different ways (for different projects); the disadvantage is
having to make sure that such cross-bundling works - ie you may end up
bundling together several taxon treatments with different character lists.
Perhaps what you're calling a 'treatment' is actually the Item Data block of
the DDST, and what I call a treatment is what you're suggesting as a
project.
We need to sort this out.

| Treatment build/revision number
| Treatment build/revision number is a really good idea and we all agreed
| that we should have had it in our treatments before.
| <TREATMENT REVISION>0.1</TREATMENT REVISION>
| We use the 0 when the treatment is under development and not released yet.
| I do not know how to deal with dynamic treatment build revision numbers.
| Suggestions are welcome.

I think this should be up to the contributor not the spec. The general
guideline that <1 is work-in-progress doesn't really work, since all
treatments are and always will be works n progress. The only requirement
surely is that treatment numbers should increment - but perhaps then we
should just use the arrow of time and date-stamp?

| | Contributors List:
| Notice that if the information about collectors in the list such as
contact
| information is repeated in each treatment there is a data integrity
| problem. People move. It might be possible that some treatments would have
| one set of old contact information and not be updated while some
treatments
| might get updated information.

This will always be a problem if you try to attribute data to a movng object
such as a person. I see no way of getting around this. But presumably a
combination of the contact details and date stamps on treatments could be
used to track our wandering contributor?

| It should be that when a person requests a treatment in DDST v1.1 format
| that they will get back a contributor list with ID, Name and Contact
| Details but in most cases the system providing the information would
insert
| the contact information "on the fly". There does not seem to be a reason
to
| force the contributor IDs to be numeric as stated in the spec. Alpha
| numeric can carry at least a little information to make the codes
readable.

True

| The current standard makes a relatively weak standard that the contributor
| codes are unique to the treatment. I think we need to use a broader
| definition. The should be unique to a collection at least.

Again we have a definitional problem and I think my treatment = your
collection.

| Attribution:
| Must this be a contributor? If so this information should be handles as a
| property of <CONTRIBUTOR ROLE=PRINCIPLE|COPRINCIPLE> or as another tag of
| <CONTRIBUTOR>
|     .... <ROLE>PRINCIPLE</ROLE>....
| Suggestions?

Yes, it could be done like this, but what would be wrong with doing it the
other way - seems somehow neater to me, and I can't see much inefficiency.

| List of Sources:
| Again, usually the underlying information might be kept in different files
| or databases but the view of a treatment would include the fields of the
| spec. For us the spec will include bibliographic references and "personal
| observations" Again is seems that source ID should at least be unique to
an
| entire collection, not just to the treatment.
| <LIST OF SOURCES>
|         <SOURCE>
|                 <SOURCE ID>KL2000</SOURCE ID>
|                 <DESCRIPTION>Koller and Smith (2000). Butterflies of the
Great West.
| Biology Press: New York.
|                 </DESCRIPTION>
|         </SOURCE>
|         <SOURCE>
|                 <SOURCE ID>MJ2000</SOURCE ID>
|                 <DESCRIPTION>Mike Jacobs, personal communication August
10, 2000.
|                 </DESCRIPTION>
|         </SOURCE>
| </SOURCE LIST>

Sources and attributions need to be thought through more. I see attribution
as a reference to the person who actually encoded the data, and source as
reference to the data source as far back as it can be pushed. Source may
thus be a reference to a printed work, and attribution a reference to the
person who interpreted the meaning of that description into the format; or
source may be a reference to a specimen (or collection of specimens) and
attribution be a reference to the person who observed that specimen. The
critical thing is to have a hierarchy of ownership for both source and
attribution; thus, for one character the (default) source may be a printed
work, but for individual taxa the contributor may have referred back to
specimens and this should be an overriding source.

Cheers - k




More information about the tdwg-content mailing list