Rich
Your two statements below don't jibe well in this case. Putting random concatenations of higher taxa into dwc:higherClassification would make for a real mess. Having only the basic named Linnaean ranks does ignore all of the intermediate ranks but it supports conformity at least for those in a way that higherClassification cannot as you lose the associated rank term. It also supports what I think is a fairly substantial bloc of data that exists in a denormal form with only (or nearly only) the basic Linnaean ranks in named rank columns. Concatenating these into dwc:higherClassification would be lossy in this case.
My real concern, however, would be in trying to subsequently line up multiple datasets where there were omissions in some higher ranks so that the concatenations were abbreviated. In other words
Bivalvia:Mytildae:Mytilus edulis Mollusca:Mytiloidea:Mytilus: Mytilus edulis Animalia: Mollusca:Mytiloidea:Mytildae:Mytilus: Mytilus edulis
See http://code.google.com/p/gbif-ecat/wiki/Nom5ExampleMytilusedulis for a real world example and, ignoring the other inherent messes, imagine trying to deal with this with no higher rank columns for context and all those nulls removed (no fair keeping the delimiters for them either).
DR
On 25/11/2010, at 11:56 AM, Richard Pyle wrote:
One golden rule of data management that I often tell people is that it's often better to be consistent, then correct. That is, something that's consistently incorrect can be corrected easily.
Right -- you mean in the sense of Family, Order, Class, etc. Personally, I think it would be "ideal" to eliminate these individual fields and just use dwc: higherClassification for this purpose. People with normalised data can represent it properly via parentNameUsage[ID] -- with the understanding that all names with a rank lower than genus would include the genus name as uninomial.