Re: TDWG-SDD XML proposals of Kevin Thiele

7 Nov 2000

      ...
...
<taxonomy>
     <rank="species" value="alfari"/>
     <rank="genus" value="Azteca"/>
     <author value="Emery"/>
...
What is referred to as "species" value="alfari"/ is actually the specific
epithet. The "species" value in this instance would actually be "Azteca
alfari" You're probably familiar with this protocol of referring to the
scientific name of an organism which implies citing the genus name & the
specific epithet which together comprise the species names - as per the
Linnean system of binomial nomenclature. Should I have the 'wrong end of
the
stick' of the discourse at this stage then please advise me.
Well, the above use to be true but not any more.  The latest version of the
Zoological Code drops all reference to "epithet" and now calls it the
"specific name" (Article 5 and Glossary under "Name" - epithet doesn't even
appear in the Index).

More to the point, to be absolutely proper (under the current ICZN) the
above would look like this:

<taxonomy>
    <rank="species" specific_name="alfari"/>
    <rank="genus" generic_name="Azteca"/>
    <author value="Emery"/>

But this would be a particularly unclever way of doing it.  If you wanted to
change it I would suggest de-generalising it a bit to:

<taxonomy>
    <specific_name="alfari"/>
    <generic_name="Azteca"/>
    <author value="Emery"/>

The rank element doesn't really contribute much since you'll need to parse
the "value" attribute to figure out what's going.  Might as well just get
the text of the "specific_name" element directly rather than searching for
the "rank" element that equals "species" and then getting the text of its
"value" attribute.  Yes, this change makes it more specific and less general
(and is a bad thing in strict IT terms), but why make life harder than it
has to be?  This way of doing it might make it harder to develop a Schema
definition, but I would argue strongly that the data model should come after
we know what we want to do, not before (<soap_box_on> and that's partly why
this discussion has drawn out for sooooo long - the BioLink team built a
complete XML representation of the BioLink database, some 400 fields in 50
tables, in 3 weeks and are now using it to move data between BioLink
databases and between Platypus and BioLink - and, surprise surprise, we
don't have a DTD or Schema because you don't actually need one to make this
stuff work <soap_box_off/>).

If you need to add a subgenus to the name just add a new element:

<taxonomy>
    <specific_name="alfari"/>
    <subgeneric_name="Alfaridris"/>
    <generic_name="Azteca"/>
    <author value="Emery"/>

I don't think this would cause any more confusion during processing or
parsing than:

<taxonomy>
   <rank="species" value="alfari"/>
   <rank="subgenus" value="Alfaridris"/>
   <rank="genus" value="Azteca"/>
   <author value="Emery"/>

If the software can't handle the first then it's unlikely to be able to do
much with the second either.  Yes, the Schema definition would be broken by
the first method and not the second, but this is a secondary consideration
to someone who wants to actually process this bit of pseudo-XML.  If I don't
know how to handle the element called "subgeneric_name" then I won't know
how to handle a rank with a value of "subgenus" either.  The point is that
the application and the data are tightly integrated (more than we would like
them to be) and if the two get out of synch things won't go smoothly.

Yours in confusion, Steve Shattuck

Re: TDWG-SDD XML proposals of Kevin Thiele

unknown＠example.com