On 29/10/2010, at 2:58 AM, joel sachs wrote:
Hey Rich, Hilmar, Paul, and everyone - I liked the definition from a couple of weeks ago: "An occurrence is a tuple consiting of time, place, individual, and some optional properties." What's that lacking?
I joined this list just recently, and missed that post. I like 'Tuple'.
Actually, I should get on with what I am actually supposed to be doing, and add the DwC predicates to the data at biodiversity.org.au .
Speaking of which - (Looking back on what I have written below, it's very disorganised. Just a brain dump, really. ) :
Currently, I have used the TDWG rdf vocabulary as far as I am able to work it out. For instance: http://biodiversity.org.au/apni.name/33407.rdf (aka: http://biodiversity.org.au/apni.name/33407 , urn:lsid:biodiversity.org.au:apni.name:33407 )
Of course, not having a owl:domain predicate does make things difficult to untangle: when I read the DwC vocabulary in with protege, I just have a list of predicate names. Luckily, the quick reference guide (http://rs.tdwg.org/dwc/terms/index.htm) does organise the properties into the classes they apply to. The only DwC classes that our data involves at this stage would be Taxon and ResourceRelationship.
----------------------
As per the TDWG vocabulary, we make a fairly strong distinction between taxonomic and nomenclatural components. A TaxonName is not a TaxonConcept. I'm finding that the Taxon predicates in the DwC vocabulary seem to be a mix of things that variously belong to names and taxa. My impression is that the distinction is there, in fact - it is modelled by a DwC taxon having or not having a nameAccordingTo rather than by an explicit class. If there is no AccordingTo, then we are discussing the "nominal taxon" - what the name means in the absence of any specific information about what it means.
But as we are so careful to distinguish between name and taxon, I think I will take the (safer) position that a Name is not the same thing as its nominal taxon. That is, I will not declare that biodiversity.org.au names are DwC taxa, even though they have properties from DwC.
(Perhaps our data should genrate an id for these nominal taxa - it's easy enough, just use the name objectid as the taxon objectid and "[afd|apni].taxon.nominal" as the LSID namespace. In principle, everyone who uses a name is also asserting that their taxon "is congruent to" the nominal taxon. Every synonym relationship is also an assertion of synonymy to the nominal taxon. But that's an awful lot of unnecessary detail to make explicit - over-engineering things is one of my failings. Forget I said it.)
----------------------
DwC properties variously use "taxonID" and also "nameUsageID'. Now, I believe I understand the distinction: not all usages of a name are of taxonomic interest (my favourite example is a bottle of weedkiller that happens to mention a scientific name.) Our databases only contain name usages that are taxa, so the distinction does not arise - a name usage is simply a taxon.
However, not all of our names are scientific names. We have cultivar names, and we have vernacular names. Al usages of these are TDWG TaxonConcepts - they have synonomy relationships and so on. However, the DwC property for declaring that a taxon record has a name seems to be "scientificNameID". This would seem to be inappropriate for taxa that don't have scientific names. I think that the correct way for me to go is to not declare these taxa as DwC taxa at all. That is, the absence of a "nameID" property seems to indicate that DwC is only "interested" in scientific names - scientific taxa if you will.
To continue: These properties apply to our taxa (TaxonConcepts) without difficulty: scientificNameID parentNameUsageID nameAccordingToID
These apply to our taxon names: acceptedNameUsageID originalNameUsageID namePublishedInID scientificNameAuthorship
One of the wiki pages seemed to indicated that Taxa would have both a nameAccordingToID and also the namePublishedInID (the two being equal indicting that the taxon is the original one), but I think we will continue to not do this on the grounds that it's best to assert things only once to avoid data inconsistencies.
----------------------
scientificName higherClassification kingdom | phylum | class | order | family | genus | subgenus| specificEpithet | infraspecificEpithet
The various properties for name parts are ... problematic from the point of view of our data. These properties sort of di double duty: they are places for putting parts of names (ie, strictly nomenclatural), and they also are places to put taxonomy.
With respect to holding name parts, there seems to be no property in which to put - for instance - a subfamily name. The closest thing is "infraspecificEpithet", which contains the terminal epithet, but obviously that's not right for supergeneric names. TCS and the TDWG vocabulary have "uninomial". It might be nice to have this property, and to declare the other bits as being subproperties.
With respect to taxonomy, if you want to use these for holding taxonomic relationships, then you don't need "order", you need "orderNameUsageID" or "orderTaxonID".
Of course, what's really going on here is that these fields are simply a denormalisation of the data. Let's face it: in my data, I do indeed have the scientific name string in the taxon record even though *technically* it's duplicating the data. So I think the conclusion is that these properties *on taxon records* are denormalisation, whereas these properties *on name records* are primary data. This is fine for me, but only because I have a separate TaxonName class.
----------------------
taxonRank | verbatimTaxonRank
Simple enough - "taxonRank" is controlled, "verbatim" is not. It's yet another mapping exercise for me, but them's the breaks. The whole "rank" issue is so fraught that one of our datasets here uses numeric codes. Which is fine, until you fill up all of the slots. What the world really needs is a dotted decimal notation, where negative numbers are allowed. Family, subfamily, and superfamily would be "5", "5.1", "5.-1". If you ever need a sub-superfamily, then it's "5.-1.1" . But maybe that's over-engineering things again.
In any case. According to the wiki page, the controlled vocabulary seems to be just a list of strings. I would have expected them to be typed named individuals, permitting you to have an abbreviation, and the english and latin name. A difficulty is that in order to render a botanical name correctly, you need the rank abbreviation string: "Evolvulus alsinoides var. sericeus". At present, there is no DWC property for that.
----------------------
In summary - shouldn't be too difficult. At least, to get the basics up.
------ If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
------