Re: TDWG-SDD XML proposals of Kevin Thiele
<taxonomy> <rank="species" value="alfari"/> <rank="genus" value="Azteca"/> <author value="Emery"/>
What is referred to as "species" value="alfari"/ is actually the specific epithet. The "species" value in this instance would actually be "Azteca alfari" You're probably familiar with this protocol of referring to the scientific name of an organism which implies citing the genus name & the specific epithet which together comprise the species names - as per the Linnean system of binomial nomenclature. Should I have the 'wrong end of
the
stick' of the discourse at this stage then please advise me.
Well, the above use to be true but not any more. The latest version of the Zoological Code drops all reference to "epithet" and now calls it the "specific name" (Article 5 and Glossary under "Name" - epithet doesn't even appear in the Index).
More to the point, to be absolutely proper (under the current ICZN) the above would look like this:
<taxonomy> <rank="species" specific_name="alfari"/> <rank="genus" generic_name="Azteca"/> <author value="Emery"/>
But this would be a particularly unclever way of doing it. If you wanted to change it I would suggest de-generalising it a bit to:
<taxonomy> <specific_name="alfari"/> <generic_name="Azteca"/> <author value="Emery"/>
The rank element doesn't really contribute much since you'll need to parse the "value" attribute to figure out what's going. Might as well just get the text of the "specific_name" element directly rather than searching for the "rank" element that equals "species" and then getting the text of its "value" attribute. Yes, this change makes it more specific and less general (and is a bad thing in strict IT terms), but why make life harder than it has to be? This way of doing it might make it harder to develop a Schema definition, but I would argue strongly that the data model should come after we know what we want to do, not before (<soap_box_on> and that's partly why this discussion has drawn out for sooooo long - the BioLink team built a complete XML representation of the BioLink database, some 400 fields in 50 tables, in 3 weeks and are now using it to move data between BioLink databases and between Platypus and BioLink - and, surprise surprise, we don't have a DTD or Schema because you don't actually need one to make this stuff work <soap_box_off/>).
If you need to add a subgenus to the name just add a new element:
<taxonomy> <specific_name="alfari"/> <subgeneric_name="Alfaridris"/> <generic_name="Azteca"/> <author value="Emery"/>
I don't think this would cause any more confusion during processing or parsing than:
<taxonomy> <rank="species" value="alfari"/> <rank="subgenus" value="Alfaridris"/> <rank="genus" value="Azteca"/> <author value="Emery"/>
If the software can't handle the first then it's unlikely to be able to do much with the second either. Yes, the Schema definition would be broken by the first method and not the second, but this is a secondary consideration to someone who wants to actually process this bit of pseudo-XML. If I don't know how to handle the element called "subgeneric_name" then I won't know how to handle a rank with a value of "subgenus" either. The point is that the application and the data are tightly integrated (more than we would like them to be) and if the two get out of synch things won't go smoothly.
Yours in confusion, Steve Shattuck
participants (1)
-
unknown@example.com