[Re: [tdwg-tag] class design, generalization, L(O)D]
I hit reply-all and my response also went to the wrong list.
Because I am an RDF/OWL novice, I will not say much about how I think Jonathan/Bob's comments apply to the discussion about the scope of Individual. But I think that the suggestion that there may be stricter and broader uses of an Individual class probably describes the root of the disagreement pretty well. I have a particular, narrow use-case in mind (facilitating resampling and inferring duplicates using some kind of taxonomically homogeneous entity). Rich wants a broader interpretation based on ideas of what an individual means in various contexts. I think this difference in outlook is reflected in this statement from Rich's last response:
'We can argue about the properties and tokens later; first we need to nail down the "essence" of an Individual.'
I actually disagree strongly with this statement. I have tried to stay out of the current thread about the future course of the TDWG ontology because it isn't my something that I know much about. But I think I am leaning to the side of those who suggest that we create use cases first and then see how the ontology can be developed to facilitate those use cases. I am actively using the class Individual in RDF way I have defined it in my proposal. I know of at least one other person who plans to do so as well. It is not clear to me what the use case is for doing what Rich wants: combining what I've called the "token" aspect of individuals with the "resampling" aspect. It may be that there is such a use, but I'd like to see how it will work - in particular how it will work in RDF without "breaking" the use that I need for the Individual class. If it is possible to combine the two aspects of individuals, perhaps that might be done in a "lax" definition (using Paul's term) that reflects the "essence" of an individual. Unfortunately, figuring out what the "essence" is of an individual is more difficult than showing how one plans to use the term.
Steve
Bob Morris wrote:
Jonathan Reese, an employee of the Science Commons and TDWG member (and who knows way more about semantic web than I do) recently sent me this. I copy it here with his permission. Each of the paragraphs seems to me to be germane in different ways to the discussions about what should be an Individual. For those not deep into RDF, for the word "axiom", you could loosely understand "rule", although that term also has technical meaning that is sometimes a little different. Jonathan raises an important use case in the second paragraph, which is data quality control. That's a topic of interest to many, but especially those following the new Annotation Interest Group. Originally, this was part of a discussion we had about my favorite hobby horse, rdfs:domain. He is not on my side. When people who know more than I do about something are skeptical of my arguments about it, I usually suspend disbelief and temporarily adopt their position.
Jonathan's first point is pretty much what Paul Murray observed yesterday in response to a question of Kevin Richards.
"(a) subclassing is the way in RDFS or OWL you would connect the more specific to the less specific, so that you can apply general theorems to a more specific entity. That is, a well-documented data set would be rendered using classes and properties that were very specific so as to not lose information, and then could be merged with a badly-documented data set by relaxing to more general classes and properties using subclass and subproperty knowledge.
(b) axioms (i.e. specificity) are valuable not only for expressing operational and inferential semantics, but also for "sanity checking" e.g. consistency, satisfiability, Clark/Parsia integrity checks ( http://clarkparsia.com/pellet/icv/ ), and similar. Being able to detect ill-formed inputs is incredibly valuable.
People talk past one another because there are many distinct use cases for RDF and assumptions are rarely surfaced. For L(O)D, you're interested in making lots of links with little effort. Semantics is the enemy because it drives up costs. For semantic web, on the other hand, you're interested in semantics, i.e. understanding and documenting the import of what's asserted and making a best effort to only assert things that are true, even in the presence of open world assumption and data set extensibility. Semantics is expensive because it requires real thought and often a lot of reverse engineering. People coming from these two places will never be able to get along."
---Jonathan Rees in email to Bob Morris
Bob Morris
On 16/11/2010, at 9:05 AM, Steve Baskauf wrote:
Again, my coming from a computing background rather than a scientific one:
'We can argue about the properties and tokens later; first we need to nail down the "essence" of an Individual.'
I actually disagree strongly with this statement. I have tried to stay out of the current thread about the future course of the TDWG ontology because it isn't my something that I know much about. But I think I am leaning to the side of those who suggest that we create use cases first and then see how the ontology can be developed to facilitate those use cases.
I suspect that a useful question is not so much "what is an individual", but "what kinds of thing might we want to treat as an individual".
It seems clear that you have a SingleIndividual - a "monogenetic" (whatever the correct term is) free-living multicelluar organism: a tree, an ant. And you have CompositeIndividual - a colony of spiders, a bee's nest or anthill, a breeding pair, a family of humans. You also have things like an algal bloom or a disease outbreak, where the individuals are single-celled but you sample populations of them.
* Individuals can be part-of composite individuals - either lifelong or not. ** Tokens taken from an individual are also tokens of any individuals it is part-of (?). ** An individual can serve as its own token - a living or preserved whole specimen - much as a word like 'foo' can serve as its own name. * Individuals can be known to be genetically related to other individuals. (parent/child/hybrid, colonies being split) * Individuals can be found in association with other individuals (aphids in an ants nest, parasites, a pride of lions that follows some particular herd). * Composite individuals are not necessarily single species: aphids in an ant's nest again * Taxa, perhaps, are not the same as clades. Again: an ants nest with aphids - we can identify an ants nest as a nest of western red ants without thereby saying that all individuals in the nest are of the same species. * An individual is usually bounded by place and (almost always) time - exceptions include things like permanent bird colonies
So, a taxonomy of "Individual". Your use cases seem (to me) to be:
* An individual is a thing that may have several specimens (tokens) taken from it, potentially from several different CollectionEvents. * Individuals may be identified (Actually ... it's the tokens that are identified.) * Some kinds of individuals are "monogenetic" (or whatever the correct term is) and can be identified as belonging to a taxon (clade?) with a scientific name
You could treat a coral outcrop as a single composite individual, even having a taxon named "warm south-pacific nodular atolls" - assuming that that kind of coral outcrop is a common one. You could treat a jar of coral fragments as coming from that individual, and to treat the individual fragments as specimens of sub-individuals that you can identify to species. That seems reasonable.
The problem would seem to be that you could also do the same trick with a jar of seashells collected at some beach - to treat "the population of sea-shells on shelley beach" as an individual. That seems a little ... illegitimate, as sea-shells are free living in a way that coral is not. I'd suggest that it's probably not worth trying to stop people abusing the notation in this way. Some things will have to remain judgment calls on the part of the dataset curator.
------ If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
------
Paul,
What you have described here is actually very close to what I have asked for as the definition of Individual (http://bioimages.vanderbilt.edu/pages/full-model.jpg for the diagrammatic view, defined at http://code.google.com/p/darwincore/issues/detail?id=69 , comment 10).
Paul Murray wrote:
... So, a taxonomy of "Individual". Your use cases seem (to me) to be:
- An individual is a thing that may have several specimens (tokens) taken from it, potentially from several different CollectionEvents.
Except that tokens need not only be specimens, they may be anything that provides evidence that the Occurrence happened. There may also be no tokens if the Occurrence is an observation. The Darwin Core class would be Event rather than CollectingEvent.
- Individuals may be identified (Actually ... it's the tokens that are identified.)
Yes, exactly! As I am defining Individuals, one learns about them through Occurrences. You infer the individual's taxonomic identity either through examination of the evidence ("tokens" such as specimens, images, DNA sequences) or the organism (colony, etc.) itself, in which case it has an Occurrence record that is an observation with no token). As I have suggested earlier, a property of Identification that would be very useful would be one that links the Identification to the evidence (tokens) on which it is based. We infer taxonomic identity through tokens, knowing that if the several tokens are from the same Individual, all Identifications based on any of these tokens apply to the Individual.
- Some kinds of individuals are "monogenetic" (or whatever the correct term is) and can be identified as belonging to a taxon (clade?) with a scientific name
I'm not sure what monogenic means, but the definition simply says a single taxon and does not specify what that is. One does not need to know what the taxon is, as one or more Identifications can be applied at a later time.
You could treat a coral outcrop as a single composite individual, even having a taxon named "warm south-pacific nodular atolls" - assuming that that kind of coral outcrop is a common one. You could treat a jar of coral fragments as coming from that individual, and to treat the individual fragments as specimens of sub-individuals that you can identify to species. That seems reasonable.
We have agreed that this is allowable under the definition under discussion. If one discovers that the individual is composed of multiple taxa at a lower level (such as species), those "sub-individuals" can be given separate identifiers and assigned Identifications at that taxonomic level.
The problem would seem to be that you could also do the same trick with a jar of seashells collected at some beach - to treat "the population of sea-shells on shelley beach" as an individual. That seems a little ... illegitimate, as sea-shells are free living in a way that coral is not. I'd suggest that it's probably not worth trying to stop people abusing the notation in this way. Some things will have to remain judgment calls on the part of the dataset curator.
Well, this gets at the difference between what I wanted originally, and what we have now in allowing Individuals to be at higher taxonomic levels. In order to allow for reasoning that asserts that an Identification which is applied to one Individual also applies to an Individual which is discovered to be a duplicate, one has to have a way to know that the Individual is taxonomically homogeneous at a low enough taxonomic level for taxonomists to consider them "duplicates". This is a bit hard to define, but as a practical matter taxonomists create "duplicates" and distribute them to other herbaria and museums as such routinely. It is a judgement call they make all the time. Whatever criteria they use would be my criteria for what an Individual would be allowed to be. However, I have relented on this point so that Individuals can be defined at higher taxonomic levels as long as there is some way (like a term such as individualScope) that can be used to indicate when an Individual is scoped at the level where a taxonomist would call it a "duplicate". Such Individuals would probably include coral colonies but not jars of sea shells.
Steve
participants (2)
-
Paul Murray
-
Steve Baskauf