some quick additions to my previous mail in haste:
I am referring to the new Darwin Core as referred to be Tony, not the ontology/tdwg vocabulary which predates the latest Darwin Core.
When dealing with hybrid formulas and informal conceptual hints like sensu strictu/latu a full namestring is also useful. For determination derived artifacts like cf. or aff. darwin core has an identificationQualifier term: http://rs.tdwg.org/dwc/terms/index.htm#identificationQualifier
And there really is no need for a canonicalName term as I suggested below as we have the 3 parts (Genus, specificEpithet, infraspecificEpithet) already as atoms.
This is an issue we are struggling with now. Getting from the data items and flags to a correctly laid out name string is not at all trvial.
For botanical names, if the third term of the name is not a ssp, then you need the rank: A-us b-us var. d-us
There may also be a hybrid mark, which may appear .... actually, I need to confirm this: I think it may appear right at the front, or it may appear in front of the terminal epithet - I'm not sure whther it replaces the rank code or has to appear on one side of it: X A-us b-us var. d-us A-us b-us var. X d-us A-us b-us X var. d-us
To correctly compose botanical names, there is a rule that is different from the zoological rule: for autonyms, the botanists prefer that the authority string appear after the "root" name, not after the whole name: zoological - Vombatus ursinus ursinus Mike botanical - Pinus L. pinus
And so you need to know a) is the name an autonym? and b) is it botanical?
Cultivar names may be introduced with a psudeo-rank of "cv." or by putting the cultivar name in quotes. Cultivar names are not italicised. And this is not even to begin discussing hybrids and grafts. Oh - and I believe that sometimes zoologists like the family name in square brackets in front of the name. And there's also nomenclatural status/qualifier: "nom. cons." etc.
And so on and so forth. Lord only knows how virologists name their taxa.
The difficulty is: we want our data to be useable by web applications, which is why we produce JSON. It's not sensible to expect that every javascript programmer is going to get this stuff right. We cannot simply give enough data that - if you know all the rules - you can get it correct. What we have concluded is that our data needs to have an item in it that will permit a programmer to easily render the name correctly, and that this needs to be separate from the fields as data.
There are a couple of options so far:
* an array or RDF "list" of components, each component being an object with a string and some sort of indicator as to whether it should be italicised or not
* a format string into which the components of the name are substituted. For instance: the format string for a subspecies might be "{G} {s} {e}" (e for epithet), wheras that for a form or variety would be "{G} {s} {r} {e}". We would wind up with - hopefully - a manageable list of formats.
* an XHTML literal (rdf:parseType="literal"), making use of span elements and css classes to permit finer control over formatting. XHTML is the applcable standard for formatted text. We would use <I> tags where appropriate, so that with no css at all the scientific name still comes out correctly. Thus: <span class="boaContent name scientificName trinomialName"> <I class="G scientificNameComponent">Vombatus</I> <I class="S scientificNameComponent">ursinus</I> <I class="SSP epithet scientificNameComponent">ursinus</I> </span>
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.