Roger writes:
Why is region/geography a special case and not covered like any other kind of context with a subjectTag? It could point to a polygon or TDWG geographic region or whatever.
The principle question to me is whether we simply want to have tag = distribution text = occurring in higher altitude value = archibiota value = visible-only-in-summer value = high-altitude-distribution value = italy that is, the consumer has to figure out what the meaning of the categorical data is and what the relevance of them is with regard to distribution, or whether specific attributes structure this information so that a consumer can easily find information it is interested in, like tag = distribution text = occurring in higher altitude status = archibiota geoArea = italy modifier = visible-only-in-summer altitude = high-altitude-distribution Structurally the first is clearly possible, but it seems to require a lot of semantic analysis to interpret, and my feeling is that it is liable to misinterpretation.
The same could be argued for associatedTaxon. (I prefer associatedTaxon to organismInteraction - the two taxa concerned may not interact we may just be saying they occur in the same habitat or that Taxon A has some shared characteristic with taxon B - though of course every atom in the universe does interact with every other in some way I suppose.)
There could be a taxon association, and that is interesting raw information on a specimen level. However, when talking about properties/traits of organisms, we are talking of knowledge, and the fact that somewhere, sometime on earth two organisms have been seen together is rather uninteresting. I want to learn about pathogens, pollinators, not the fox that happened to be present while the plant was pollinated. That is exactly what I would hope to express by using organimsInteraction instead.
What is the difference between tagging something with a geographic region and tagging it with a taxon?
Or tagging it with a color code, or tagging it with a nomenclatural code, or tagging it with a museum name? Clearly all are categories, but it seems to me they are of a different kind, i.e. the role why they are added is different.
If this is boiled down far enough we don't need the InfoItem as a container and can use common vocabularies (DublinCore) for most of this stuff.
<tdwg:SpeciesProfile rdf:about="http://my.guid.could.be.lsid.org"> <tdwg:aboutTaxon rdf:about="urn:lsid:of:some:taxon" /> <tdwg:associatedTaxon rdf:about="urn:lsid:of:some:other:taxon" /> <dc:description> This is some text about how good this is to eat and other stuff. </dc:descriptiono> <dc:subject rdf:about="http://my.controlled.list.of.terms#cooking" /> <dc:subject rdf:about="http://my.controlled.list.of.terms#Brazil" /> </tdwg:SpeciesProfile >
My problem with this is that it is unclear whether the taxon occurs in Brazil, occurs in Argentine and is imported and cooked in Brazil, or whether it occurs and is cooked in Germany, but the cooking recipe originates from Brazil. In constrast, having <dc:distribution rdf:resource="http://my.controlled.list.of.terms#Brazil" /> seems to make this clear, provide semantics are defined for distribution. I am mostly interested in analyzing taxon-specific information for identification and phylogenetics, and it seems to me that the first kind of communication would be worthless for such purposes.
If we want to express Taxon-Taxon interactions Kevin Richards and I already came up with something to use for the HerbIMI LSID Authority
http://rs.tdwg.org/ontology/voc/TaxonOccurrenceInteraction
Note that this defines interactions between *occurrences* not taxa and the occurrences provide the context. I am not sure that this is the place to get into defining interactions other than in the most general way.
The values of interaction kind should be defined elsewhere. However, I would prefer the concept that such values exist visible and defined. I consider placing them as tags on the interaction class, but other solutions are possible. I propose to have a special type for it because we have an interaction of I don't want to reject Roger's idea of remerging classes. Separating out datatypes to me is a vehicle to be able to come up with sharper definitions of the semantics of the various class attributes, expressing which is a measurement which an aggregation, what is aggregable, what is a context and what is scope under which aggregation was performed, what is a subclassing of the aggregation concept, what is subclassing of value concept, what is a frequency, what a probability of a statement. This is the major concern I have about the generality of SPM 0.2. The list was full of ideas for which purposes value and context could be used but when receiving the data no generic decoding seemed to be possible to me. In SDD we distinguished between original measurements (SampleData) and aggregations (SummaryData - which often already occur on the single specimen level), between an aggregation Scope, and a Modification (subclassing) of values and characters. The solution chosen is heavily skewed towards acceptance. Originally we had frequency, probability, value modification and character modification, all as values and text. However, it was considered too complex so that now the modifier is overloaded with all these (but the modifier concepts carry a classification that allows making these distinctions in the end). The solutions in SDD are particular, and it would be good to make them more general as a result of the current discussion - but I don't think the issues we tried to solve do not exist. All this is largely irrelevant for free-form text, but what we are discussing here is simply not free-form text, but exactly this.
If you look at the DublinCore definition of "subject" it says: "The topic of the resource. Typically, the topic will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element." The coverage element says "Spatial characteristics of the intellectual content of the resource."
Could SPM just boil down to a "controlled vocabulary" for DublinCore metadata tags on chunks of text plus a predicate to indicate the taxon we are talking about? We could just do an applicability statement on how to tag them in HTML!
In DublinCore the taxon would simply be a value of subject. The problem that results with this is we would end up with: subject = pathogen, pollinator, taxon1, taxon2, taxon3, coverage = Germany, Italy, UK, summer, 1950-2007 which is ok for roughly finding something that might be interesting to read (which is what DC is good for) but almost worthless if you want to figure out what pathogens taxon2 has in Germany. That is what we need the container / envelope for, keeping things together. Furthermore, my intuition is that it is significantly easier to process data if in advance I know something is a status value, a geographic area, a taxon, a tag - all of which may come from an external rather than TDWG vocabulary - rather than having to figure this out using owl. But that is a principle question. In current SPM I found it very hard to figure out which is the context and what context means. Context in an actual observation / specimen is quite clear, but I find it difficult to have "invasive" as value and "Germany" as context. Others might want to have "frequently" and "rarely" as values and "Germany" as context. Or both... I cannot say whether in the future software will simply effortlessly figure out what kind of category a value is (taxon, geoarea, frequency, distributionstatus, conservationstatus, etc.) and analyse the implications. But the brazilian cooking example to me indicates that without some guidance, drastically alternative interpretations are possible. What I am after with defining multiple classes like FreeFormText, Markuptext, Distribution, Interaction, QuantitativeMeasurement, CategoricalMeasurement, MolecularSequence is to give enough guidance to be able to explain how in a particular case the attributes relate to each other - and provide an appropriate context for extension with further attributes. My current understand is that if we do not explain this outside of RDF/OWL we would be forced to model it through reification, which we all seem to strive to avoid... Gregor