Fwd: [Fwd: Re: If you need something for referring to a population, then it is probably best to do it as a related class]
Hi Pete,
Hi Steve,
I may have overloaded the term specimen to make the explanation easier to follow. A specimen could be an individual or it could be part of an individual. To some extent you need to think about how these models will be used. If you subscribe to the model that a species is whatever a taxonomists says it is then it is difficult to make statements like. X% of the world's species will be extinct by 2050. If you mean a species as defined by the concept documented at this URI which is supported by these specimens, images, and DNA then you are on firmer ground.
I would have stopped here! If someone wants to point to a series of objects to indicate his own concept of a "species", then that's ok. That's a convention we can work with.
Everything else you talk about below seem to be a confused mashup of "species" as a taxon and "species" as ranks.
Species in the natural world do a pretty good job recognizing those individuals that are appropriate mates. In other words members of their own species.
Really? Maybe in animals but everywhere else I don't think so! And what about asexual organisms? I guess they do not deserve the status of "species". So, the biological species concept you are referring to can't be universally applied = not a good concept to use for ALL organisms, right?
Are we modeling species or variations in human conceptualizations of species?
Is there anything else out there besides the human conceptualization of species? Do we have an absolute concept of what a species represent? Last time I checked we have been arguing about this for the past century and more. Species defined by the 25+ species concepts out there can't possibly be real. They are defined by human conceptualization = artificial. The only think we can clearly try to discern are clades and often those clades may correspond with someone's concept of a species. In other words, you can take the rank of species and point it at a node that define a taxon. Or you can take the rank of species and point it to an assemblage of lineages, regardless if that assemblage represents a monophyletic group or not. Traditionally recognized species are most likely polyphyletic anyways.
I stick with this. Assuming you don't have a hybrid individual. That individual is one species. The fact that human may disagree on what species it is a human issue.
What do you really mean by individuals? Are species created by the gods and we just can't figure out what they meant? Human issue? We, humans came up with this impossible to define concept. Individuals are just that: individuals. The individuals = species hypothesis has been argued in the literature and is not an accepted notion by far (note: I am not saying I agree or disagree, this is just another non-absolute concept). But maybe you don't mean it that way.
Again, Are we modeling species or variations in human conceptualizations of species?
Your statement is recursive. You have an idea of species that is above all of the other variations on the theme. I don't understand what you mean and that's why I think there is a confusion between the usage of "species" as a taxon and "species" as an arbitrary rank.
Which of these is of primary importance to decision makers and non-taxonomist biologists? Part of the problem with various publications relating to ontologies and taxonomy is that their species models entail a specific phylogenetic hypothesis.
Even in the lack of a phylogenetic framework, every species is a hypothesis. When we lump and split we generate hypotheses. A new name points to a new hypothesis. Adding a new genus to a family or a new species to a genus refines and modifies the original hypothesis.
In the real world taxa are not as clean as some would like to make them out to be.
In fact, the only way we can discover a little more clarity is through phylogenies. When it comes to low level taxa, population studies help a lot too.
Each individual is a unique combination of thousands of separate gene lineages which often do not follow clean monophyletic paths.
But as you stated above you equate individuals with species. if that's correct, then you admit that individuals can be polyphyletic, and as such do we really care about polyphyletic entities? As such you admit that species as you define them are artificial.
I would argue that most of those who work with species related data see them as useful typological constructs which in general follow the biological species model.
I actually don't and I can name many others who don't. The rank of species may be useful to many as communication tool but the way it is applied vary a lot. The biological species concept is not widely accept and certainly not followed by the vast majority as you indicate.
The only think we can really achieve from an informatics standpoint is to reconcile objects (specimens, images, sequences, descriptions, etc.etc.) with the many names that may have been associated with those and leave the rest (philosophical masturbation) to the users. Reconciling names with concepts, given the nature of the concepts and the idiosyncratic way that names have been historically used, is a Quixoterian challenge.
Cheers, Nico
Aedes triseriatus owl:sameAs Ochlerotatus triseriatus
Others seem to see them as phylogenetic end nodes which entail a specific phylogenetic history.
Aedes triseriatus distinctFrom Ochlerotatus triseriatus
If you are primarily interested in understanding issues of ecology, disease, diversity and conservation the former model is more appropriate than the later.
Respectfully,
- Pete
On Wed, May 4, 2011 at 8:35 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote: I beg to differ.
Peter DeVries wrote:
About occurrences.
It is the specimen (individual) that gets identified and it can have several identifications.
I do not believe it is a good idea to equate "specimen" with individual. I have reached the conclusion that a specimen CAN be an individual if one chooses to type a particular "thing" as both a specimen and and an Individual but I'm not comfortable with saying that they are owl:sameAs . It is my opinion, and I think one that was supported to some extent in the discussion of last Oct/Nov that when we "identify" some kind of evidence like a a branch cut from a tree we are really making a statement about what taxon we think the tree represents, not the branch per se. This is an important distinction, because if we are identifying the tree, then any branch that is dcterms:partOf the tree shares that identification. This may seem like an esoteric point, but it is one that has a fairly significant bearing on how one chooses to construct a model. I will stop there because I don't want to re-plough the same ground again. The relevant posts are summarized and referenced on the DSW pages explaining the IndividualOrganism and Specimen classes.
There is an issue with occurrence records that we saw with one of the initial TDWG BioBlitz RDF Versions.
There were multiple identifications but there was no way of indicating which was the preferred.
Again, I beg to differ. It was my understanding (again from the extensive conversation about this in Oct/Nov) that the creation of multiple Identifications were "preference agnostic". Each Identification represented an expressed opinion about the concept/TNU that the determiner asserted that the individual represented. They were not flagged as "right" or "wrong". In fact, it is possible that there could be several Identifications by different determiners that asserted the same concept/TNU, indicating that the determiners agreed with each other.
Assuming that there was only one individual organism identified there is really one one species (or hybrid).
Aaaaaaaaaaaaaah! Please Nico C., don't take this one up!
Different human identifiers make assertions about what that species is, those different identifications are then tied to the individual organism.
Yes I agree entirely.
In other words it is the specimen that is identified not the occurrence.
Yes, the occurrence was not identified. But I would say the individual, not the specimen (as explained above).
If your additional identification RDF links to the specimen then it will be linked to the occurrence so someone can browse from the occurrence and look at the identification history of the specimen.
Does, this answer your question?
Not really, but I think the issue here is that the way you are conceptualizing specimens, individuals, and identifications is NOT the same as DSW defines them. So I retract my earlier statement that the taxonconcept.org ontology and DSW are really the same thing with different term names .
Steve
- Pete
On Mon, May 2, 2011 at 8:16 AM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote: OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence". In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email (http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (Boloria selene). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is Boloria selene . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"?
I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species Bororia selene. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce.
I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for Bororia selene (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurre... is described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this.
The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything.
Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties.
Steve
Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I was thinking that it would be best to create a separate class that can be used for populations of a species.
This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities
http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar
See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Knowledge Base View (http://bit.ly bit.ly/gMFqR1
The model mints URI's for the following related entities. See RDF. or KB View
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
Here is how a subset of these would relate to the new #Population Tag and related semantic entities.
This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.
<txn:SpeciesIndividualTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Individual%22%3E dcterms:titleA Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Individual</dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesIndividualTag>
Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.
<txn:SpeciesPopulationTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Population%22%3E dcterms:titleA Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Population</dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesPopulationTag>
This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars.
<owl:Class rdf:about="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation%22%3E <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Population%22/%3E skos:prefLabelThe population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource="http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individ... <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf%22/%3E </owl:Class>
Respectfully,
- Pete
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept & GeoSpecies Knowledge Bases
A Semantic Web, Linked Open Data Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept & GeoSpecies Knowledge Bases A Semantic Web, Linked Open Data Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept & GeoSpecies Knowledge Bases A Semantic Web, Linked Open Data Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Well no,
we don't have to grudgingly choose between artificial and natural. One can allow for the obvious that species names and definitions are formulated and written down by humans (though other animals and machines are sometimes capable of this too), and vary over time, but at the same time nature has a strong corrective word in shaping these definitions, and only if they are close enough to some real natural order are our inferences also reliable enough. By modeling species as human conceptualizations, we are not necessarily throwing out the baby with the bathwater, i.e. saying they are not real. See pp. 66-67 here:
http://docs.google.com/a/richard-boyd.net/viewer?a=v&pid=sites&srcid...
Regards,
Nico
On 5/4/2011 12:58 PM, Nico Cellinese wrote:
Hi Pete,
Hi Steve,
I may have overloaded the term specimen to make the explanation easier to follow. A specimen could be an individual or it could be part of an individual. To some extent you need to think about how these models will be used. If you subscribe to the model that a species is whatever a taxonomists says it is then it is difficult to make statements like. X% of the world's species will be extinct by 2050. If you mean a species as defined by the concept documented at this URI which is supported by these specimens, images, and DNA then you are on firmer ground.
I would have stopped here! If someone wants to point to a series of objects to indicate his own concept of a "species", then that's ok. That's a convention we can work with.
Everything else you talk about below seem to be a confused mashup of "species" as a taxon and "species" as ranks.
Species in the natural world do a pretty good job recognizing those individuals that are appropriate mates. In other words members of their own species.
Really? Maybe in animals but everywhere else I don't think so! And what about asexual organisms? I guess they do not deserve the status of "species". So, the biological species concept you are referring to can't be universally applied = not a good concept to use for ALL organisms, right?
Are we modeling species or variations in human conceptualizations of species?
Is there anything else out there besides the human conceptualization of species? Do we have an absolute concept of what a species represent? Last time I checked we have been arguing about this for the past century and more. Species defined by the 25+ species concepts out there can't possibly be real. They are defined by human conceptualization = artificial. The only think we can clearly try to discern are clades and often those clades may correspond with someone's concept of a species. In other words, you can take the rank of species and point it at a node that define a taxon. Or you can take the rank of species and point it to an assemblage of lineages, regardless if that assemblage represents a monophyletic group or not. Traditionally recognized species are most likely polyphyletic anyways.
I stick with this. Assuming you don't have a hybrid individual. That individual is one species. The fact that human may disagree on what species it is a human issue.
What do you really mean by individuals? Are species created by the gods and we just can't figure out what they meant? Human issue? We, humans came up with this impossible to define concept. Individuals are just that: individuals. The individuals = species hypothesis has been argued in the literature and is not an accepted notion by far (note: I am not saying I agree or disagree, this is just another non-absolute concept). But maybe you don't mean it that way.
Again, Are we modeling *species* or variations in human conceptualizations of *species*?
Your statement is recursive. You have an idea of species that is above all of the other variations on the theme. I don't understand what you mean and that's why I think there is a confusion between the usage of "species" as a taxon and "species" as an arbitrary rank.
Which of these is of primary importance to decision makers and non-taxonomist biologists? Part of the problem with various publications relating to ontologies and taxonomy is that their species models entail a specific phylogenetic hypothesis.
Even in the lack of a phylogenetic framework, every species is a hypothesis. When we lump and split we generate hypotheses. A new name points to a new hypothesis. Adding a new /genus/ to a /family/ or a new /species /to a /genus/ refines and modifies the original hypothesis.
In the real world taxa are not as clean as some would like to make them out to be.
In fact, the only way we can discover a little more clarity is through phylogenies. When it comes to low level taxa, population studies help a lot too.
Each individual is a unique combination of thousands of separate gene lineages which often do not follow clean monophyletic paths.
But as you stated above you equate individuals with species. if that's correct, then you admit that individuals can be polyphyletic, and as such do we really care about polyphyletic entities? As such you admit that species as you define them are artificial.
I would argue that most of those who work with species related data see them as useful typological constructs which in general follow the biological species model.
I actually don't and I can name many others who don't. The rank of species may be useful to many as communication tool but the way it is applied vary a lot. The biological species concept is not widely accept and certainly not followed by the vast majority as you indicate.
The only think we can really achieve from an informatics standpoint is to reconcile objects (specimens, images, sequences, descriptions, etc.etc.) with the many names that may have been associated with those and leave the rest (philosophical masturbation) to the users. Reconciling names with concepts, given the nature of the concepts and the idiosyncratic way that names have been historically used, is a Quixoterian challenge.
Cheers, Nico
Hi Nico,
I would have stopped here! If someone wants to point to a series of objects to indicate his own concept of a "species", then that's ok. That's a convention we can work with.
You might be right about the point above,
I will repeat the earlier sentence with add emphasis, since you seemed to have misinterpreted my meaning.
I would argue that most of those who work with species related data see them as *useful typological constructs* which *in general* follow the biological species model.
As a practical matter the issue is this:
Are we modeling this because the main queries will be to find variations in identifications or are we modeling this so that facts that are tied to *useful shared typological constructs are findable.* * * The model supports links to alternative concepts. The uniprot and bio2rdf, and DBpedia URI's can be considered closely related concepts.
The way this works ideally is that the identifier of this insect (from TDWG) makes the assertion that
this observation http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc6552.html
represents and instance of this concept http://lod.taxonconcept.org/ses/z9oqP#Species
Now that data can be combined with other data sets making related assertions and the role that that this shared typological construct plays in the environment and it relationship to other shared typological constructs can be analyzed.
Analyzed in a much more reliable and repeatable way that when data is assigned to one of a number of different name strings which may or may not identify the same concept.
Under the alternative model, are identifications of *Aedes triseriatus* and *Ochlerotatus triseriatus* identifications of the same thing? Who makes that determination?
Respectfully,
- Pete
Everything else you talk about below seem to be a confused mashup of
"species" as a taxon and "species" as ranks.
Species in the natural world do a pretty good job recognizing those individuals that are appropriate mates. In other words members of their own species.
Really? Maybe in animals but everywhere else I don't think so! And what about asexual organisms? I guess they do not deserve the status of "species". So, the biological species concept you are referring to can't be universally applied = not a good concept to use for ALL organisms, right?
Are we modeling species or variations in human conceptualizations of species?
Is there anything else out there besides the human conceptualization of species? Do we have an absolute concept of what a species represent? Last time I checked we have been arguing about this for the past century and more. Species defined by the 25+ species concepts out there can't possibly be real. They are defined by human conceptualization = artificial. The only think we can clearly try to discern are clades and often those clades may correspond with someone's concept of a species. In other words, you can take the rank of species and point it at a node that define a taxon. Or you can take the rank of species and point it to an assemblage of lineages, regardless if that assemblage represents a monophyletic group or not. Traditionally recognized species are most likely polyphyletic anyways.
I stick with this. Assuming you don't have a hybrid individual. That individual is one species. The fact that human may disagree on what species it is a human issue.
What do you really mean by individuals? Are species created by the gods and we just can't figure out what they meant? Human issue? We, humans came up with this impossible to define concept. Individuals are just that: individuals. The individuals = species hypothesis has been argued in the literature and is not an accepted notion by far (note: I am not saying I agree or disagree, this is just another non-absolute concept). But maybe you don't mean it that way.
Again, Are we modeling *species* or variations in human conceptualizations of *species*?
Your statement is recursive. You have an idea of species that is above all of the other variations on the theme. I don't understand what you mean and that's why I think there is a confusion between the usage of "species" as a taxon and "species" as an arbitrary rank.
Which of these is of primary importance to decision makers and non-taxonomist biologists? Part of the problem with various publications relating to ontologies and taxonomy is that their species models entail a specific phylogenetic hypothesis.
Even in the lack of a phylogenetic framework, every species is a hypothesis. When we lump and split we generate hypotheses. A new name points to a new hypothesis. Adding a new *genus* to a *family* or a new *species *to a *genus* refines and modifies the original hypothesis.
In the real world taxa are not as clean as some would like to make them out to be.
In fact, the only way we can discover a little more clarity is through phylogenies. When it comes to low level taxa, population studies help a lot too.
Each individual is a unique combination of thousands of separate gene lineages which often do not follow clean monophyletic paths.
But as you stated above you equate individuals with species. if that's correct, then you admit that individuals can be polyphyletic, and as such do we really care about polyphyletic entities? As such you admit that species as you define them are artificial.
I would argue that most of those who work with species related data see them as useful typological constructs which in general follow the biological species model.
I actually don't and I can name many others who don't. The rank of species may be useful to many as communication tool but the way it is applied vary a lot. The biological species concept is not widely accept and certainly not followed by the vast majority as you indicate.
The only think we can really achieve from an informatics standpoint is to reconcile objects (specimens, images, sequences, descriptions, etc.etc.) with the many names that may have been associated with those and leave the rest (philosophical masturbation) to the users. Reconciling names with concepts, given the nature of the concepts and the idiosyncratic way that names have been historically used, is a Quixoterian challenge.
Cheers, Nico
*Aedes triseriatus* owl:sameAs *Ochlerotatus triseriatus*
Others seem to see them as phylogenetic end nodes which entail a specific phylogenetic history.
*Aedes triseriatus* distinctFrom *Ochlerotatus triseriatus*
If you are primarily interested in understanding issues of ecology, disease, diversity and conservation the former model is more appropriate than the later.
Respectfully,
- Pete
On Wed, May 4, 2011 at 8:35 AM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:
I beg to differ.
Peter DeVries wrote:
About occurrences.
It is the specimen (individual) that gets identified and it can have several identifications.
I do not believe it is a good idea to equate "specimen" with individual. I have reached the conclusion that a specimen CAN be an individual if one chooses to type a particular "thing" as both a specimen and and an Individual but I'm not comfortable with saying that they are owl:sameAs . It is my opinion, and I think one that was supported to some extent in the discussion of last Oct/Nov that when we "identify" some kind of evidence like a a branch cut from a tree we are really making a statement about what taxon we think the tree represents, not the branch per se. This is an important distinction, because if we are identifying the tree, then any branch that is dcterms:partOf the tree shares that identification. This may seem like an esoteric point, but it is one that has a fairly significant bearing on how one chooses to construct a model. I will stop there because I don't want to re-plough the same ground again. The relevant posts are summarized and referenced on the DSW pages explaining the IndividualOrganism and Specimen classes.
There is an issue with occurrence records that we saw with one of the initial TDWG BioBlitz RDF Versions.
There were multiple identifications but there was no way of indicating which was the preferred.
Again, I beg to differ. It was my understanding (again from the extensive conversation about this in Oct/Nov) that the creation of multiple Identifications were "preference agnostic". Each Identification represented an expressed opinion about the concept/TNU that the determiner asserted that the individual represented. They were not flagged as "right" or "wrong". In fact, it is possible that there could be several Identifications by different determiners that asserted the same concept/TNU, indicating that the determiners agreed with each other.
Assuming that there was only one individual organism identified there is really one one species (or hybrid).
Aaaaaaaaaaaaaah! Please Nico C., don't take this one up!
Different human identifiers make assertions about what that species is, those different identifications are then tied to the individual organism.
Yes I agree entirely.
In other words it is the specimen that is identified not the occurrence.
Yes, the occurrence was not identified. But I would say the individual, not the specimen (as explained above).
If your additional identification RDF links to the specimen then it will be linked to the occurrence so someone can browse from the occurrence and look at the identification history of the specimen.
Does, this answer your question?
Not really, but I think the issue here is that the way you are conceptualizing specimens, individuals, and identifications is NOT the same as DSW defines them. So I retract my earlier statement that the taxonconcept.org ontology and DSW are really the same thing with different term names .
Steve
- Pete
On Mon, May 2, 2011 at 8:16 AM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:
OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence"http://lod.taxonconcept.org/ses/mCcSp#Occurrence. In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email ( http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (*Boloria selene*). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is *Boloria selene* . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"?
I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species *Bororia selene*. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce.
I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for *Bororia selene* (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurre... described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this.
The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything.
Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties.
Steve
Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I was thinking that it would be best to create a separate class that can be used for populations of a species.
This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities
http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar
See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Base View (http://bit.ly bit.ly/gMFqR1 http://bit.ly%20bit.ly/gMFqR1
The model mints URI's for the following related entities. See RDF. or KB View
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
Here is how a subset of these would relate to the new #Population Tag and related semantic entities.
This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.
<txn:SpeciesIndividualTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Individual%22%3E dcterms:titleA Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Individual </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesIndividualTag>
Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.
<txn:SpeciesPopulationTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Population%22%3E dcterms:titleA Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Population </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesPopulationTag>
This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars.
<owl:Class rdf:about=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation%22%3E <rdf:type rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Population%22/%3E skos:prefLabelThe population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource=" http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individ... "/> <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf%22/%3E </owl:Class>
Respectfully,
- Pete
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://lod.geospecies.org/ Knowledge Bases
A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Alas, I don't have time to dive-in to this conversation in full (I still owe too many things to too many people), though I have been very tempted!
Very quickly:
The model supports links to alternative concepts. The uniprot and bio2rdf,
and DBpedia
URI's can be considered closely related concepts. The way this works ideally is that the identifier of this insect (from
TDWG) makes the assertion that
this
observation http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc 6552.html
represents an instance of this
concept http://lod.taxonconcept.org/ses/z9oqP#Species
But if I understand you correctly, alternate concepts don't exist within taxonconcept.org; but only as links to other repositories of concepts, that may or may not be congruent with those represented in taxonconcept.org. If that's the case, then what happens when the person who identifies the observation [http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc6552.html] doesn't agree with the concept represented in [http://lod.taxonconcept.org/ses/z9oqP#Species] -- or any other concept represented in taxonconcept.org? Do they have to hunt around through the other repositories to find the right one?
Let me give an example. The type specimen of Centropyge fisheri was collected in Hawaii (e.g., http://pbs.bishopmuseum.org/images/JER/detail.asp?ID=-1377454029 ). The type specimen of C. flavicauda was collected in the South China Sea, and is known throughout the rest of the tropical Pacific (e.g., http://pbs.bishopmuseum.org/images/JER/detail.asp?ID=-1339602635).
Many taxonomists have treated these two species as distinct and valid; and hence two separate taxon concepts representing populations in Hawaii, and in the broader Pacific, respectively. Other taxonomists have considered them to be conspecific, and thus only one species throughout the tropical Pacific, including Hawaii. The name "fisheri" has priority, so the concept labeled as "Centropyge fisheri, sensu stricto" refers to the species concept consisting of individuals from Hawaii, and the concept labeled as "Centropyge fisheri, sensu lato" refers to the species concept consisting of individuals throughout the tropical Pacific (including Hawaii).
If I understand you correctly, there would be only one of these two concepts represented in taxonconcept.org. For the sake of argument, let's say it was the sensu lato concept (which is the more modern interpretation, lumping the two historically distinct species). What if someone made an observation in Johnston Atoll, and they are a splitter (i.e. recognizing Hawaii C. fisheri as a distinct species from Pacific C. flavicauda), and wanted to identify their specimen to the concept that *excludes* the Hawaii population (i.e., C. flavicauda)? Would they be able to do so? Or would they have to look through uniprot and bio2rdf, DBpedia, etc. to find a species-level concept that matches the one they want to represent the observation as?
Apologies if I have completely misunderstood this conversation...but at the very least, perhaps a concrete example (with pictures!) might help to disambiguate some of this thread.
Aloha, Rich
First of all, I apologize to Pete for misinterpreting one of his main statements, but I guess I was misled but everything else he stated above it.
Secondly, I wanted to ask the same question that Rich just did. The model Pete's proposes seems to work well in a very straightforward context like the bee example. In more complex instances like when concepts overlap in part or nothing is so obvious, I am anxious to see how this model scales. The representation of concepts may not be so seamlessly recovered by this logic model. Unless I am missing something, in which case at least I am not alone :-)
Alas, I don't have time to dive-in to this conversation in full (I still owe too many things to too many people)
How is that GNUB coming along? ;-)
Nico
Very quickly:
The model supports links to alternative concepts. The uniprot and bio2rdf,
and DBpedia
URI's can be considered closely related concepts. The way this works ideally is that the identifier of this insect (from
TDWG) makes the assertion that
this
observation http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc 6552.html
represents an instance of this
concept http://lod.taxonconcept.org/ses/z9oqP#Species
But if I understand you correctly, alternate concepts don't exist within taxonconcept.org; but only as links to other repositories of concepts, that may or may not be congruent with those represented in taxonconcept.org. If that's the case, then what happens when the person who identifies the observation [http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc6552.html] doesn't agree with the concept represented in [http://lod.taxonconcept.org/ses/z9oqP#Species] -- or any other concept represented in taxonconcept.org? Do they have to hunt around through the other repositories to find the right one?
Let me give an example. The type specimen of Centropyge fisheri was collected in Hawaii (e.g., http://pbs.bishopmuseum.org/images/JER/detail.asp?ID=-1377454029 ). The type specimen of C. flavicauda was collected in the South China Sea, and is known throughout the rest of the tropical Pacific (e.g., http://pbs.bishopmuseum.org/images/JER/detail.asp?ID=-1339602635).
Many taxonomists have treated these two species as distinct and valid; and hence two separate taxon concepts representing populations in Hawaii, and in the broader Pacific, respectively. Other taxonomists have considered them to be conspecific, and thus only one species throughout the tropical Pacific, including Hawaii. The name "fisheri" has priority, so the concept labeled as "Centropyge fisheri, sensu stricto" refers to the species concept consisting of individuals from Hawaii, and the concept labeled as "Centropyge fisheri, sensu lato" refers to the species concept consisting of individuals throughout the tropical Pacific (including Hawaii).
If I understand you correctly, there would be only one of these two concepts represented in taxonconcept.org. For the sake of argument, let's say it was the sensu lato concept (which is the more modern interpretation, lumping the two historically distinct species). What if someone made an observation in Johnston Atoll, and they are a splitter (i.e. recognizing Hawaii C. fisheri as a distinct species from Pacific C. flavicauda), and wanted to identify their specimen to the concept that *excludes* the Hawaii population (i.e., C. flavicauda)? Would they be able to do so? Or would they have to look through uniprot and bio2rdf, DBpedia, etc. to find a species-level concept that matches the one they want to represent the observation as?
Apologies if I have completely misunderstood this conversation...but at the very least, perhaps a concrete example (with pictures!) might help to disambiguate some of this thread.
Aloha, Rich
Alas, I don't have time to dive-in to this conversation in full (I still owe too many things to too many people)
How is that GNUB coming along? ;-)
EXACTLY! :-)
Rich
Hi RIch,
These were the very issue that we had talked about modeling last fall and I thought we were planning to work on after the holidays.
Check your old email I have your prototype fish list.
Perhaps SKOS:narrower?
http://lod.taxonconcept.org/Pomacanthidae.html
http://lod.taxonconcept.org/Pomacanthidae.htmlRespectfully,
- Pete
On Wed, May 4, 2011 at 4:46 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Alas, I don't have time to dive-in to this conversation in full (I still owe too many things to too many people), though I have been very tempted!
Very quickly:
The model supports links to alternative concepts. The uniprot and
bio2rdf, and DBpedia
URI's can be considered closely related concepts. The way this works ideally is that the identifier of this insect (from
TDWG) makes the assertion that
this
observation http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc 6552.html
represents an instance of this
concept http://lod.taxonconcept.org/ses/z9oqP#Species
But if I understand you correctly, alternate concepts don't exist within taxonconcept.org; but only as links to other repositories of concepts, that may or may not be congruent with those represented in taxonconcept.org. If that's the case, then what happens when the person who identifies the observation [http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc6552.html ] doesn't agree with the concept represented in [http://lod.taxonconcept.org/ses/z9oqP#Species] -- or any other concept represented in taxonconcept.org? Do they have to hunt around through the other repositories to find the right one?
Let me give an example. The type specimen of Centropyge fisheri was collected in Hawaii (e.g., http://pbs.bishopmuseum.org/images/JER/detail.asp?ID=-1377454029 ). The type specimen of C. flavicauda was collected in the South China Sea, and is known throughout the rest of the tropical Pacific (e.g., http://pbs.bishopmuseum.org/images/JER/detail.asp?ID=-1339602635).
Many taxonomists have treated these two species as distinct and valid; and hence two separate taxon concepts representing populations in Hawaii, and in the broader Pacific, respectively. Other taxonomists have considered them to be conspecific, and thus only one species throughout the tropical Pacific, including Hawaii. The name "fisheri" has priority, so the concept labeled as "Centropyge fisheri, sensu stricto" refers to the species concept consisting of individuals from Hawaii, and the concept labeled as "Centropyge fisheri, sensu lato" refers to the species concept consisting of individuals throughout the tropical Pacific (including Hawaii).
If I understand you correctly, there would be only one of these two concepts represented in taxonconcept.org. For the sake of argument, let's say it was the sensu lato concept (which is the more modern interpretation, lumping the two historically distinct species). What if someone made an observation in Johnston Atoll, and they are a splitter (i.e. recognizing Hawaii C. fisheri as a distinct species from Pacific C. flavicauda), and wanted to identify their specimen to the concept that *excludes* the Hawaii population (i.e., C. flavicauda)? Would they be able to do so? Or would they have to look through uniprot and bio2rdf, DBpedia, etc. to find a species-level concept that matches the one they want to represent the observation as?
Apologies if I have completely misunderstood this conversation...but at the very least, perhaps a concrete example (with pictures!) might help to disambiguate some of this thread.
Aloha, Rich
Hi Pete,
Yes, I know - and you are among the MANY people to whom I owe some follow-up work (my inbox of unanswered email has crept up past the 8,000 mark). But I entered this fray in part to re-kindle that conversation we had started months ago. That notwithstanding, I don't remember an answer to the question I posed just now, concerning taxonconcept.org as a concept store. I had been under the impression that it would establish C. fisheri s.s. and C. fisheri s.l. as two distinct Concepts, with two distinct GUIDs, and a mapping of relationships between them and with other entities (e.g., cross-referencing specimens, nomenclature, TaxonNameUsage instances, etc.) The thing that caught my attention in your recent email was the notion that one would need to leave taxonconcept.org to see alternative concept definitions.
Aloha,
Rich
From: Peter DeVries [mailto:pete.devries@gmail.com] Sent: Wednesday, May 04, 2011 12:25 PM To: Richard Pyle Cc: Nico Cellinese; tdwg-content@lists.tdwg.org Subject: Re: [tdwg-content] Fwd: [Fwd: Re: If you need something for referring to a population, then it is probably best to do it as a related class]
Hi RIch,
These were the very issue that we had talked about modeling last fall and I thought we were planning to work on after the holidays.
Check your old email I have your prototype fish list.
Perhaps SKOS:narrower?
http://lod.taxonconcept.org/Pomacanthidae.html
Respectfully,
- Pete
On Wed, May 4, 2011 at 4:46 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Alas, I don't have time to dive-in to this conversation in full (I still owe too many things to too many people), though I have been very tempted!
Very quickly:
The model supports links to alternative concepts. The uniprot and bio2rdf,
and DBpedia
URI's can be considered closely related concepts. The way this works ideally is that the identifier of this insect (from
TDWG) makes the assertion that
this
observation http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc%0d%0a6552. html 6552.html
represents an instance of this
concept http://lod.taxonconcept.org/ses/z9oqP#Species
But if I understand you correctly, alternate concepts don't exist within taxonconcept.org; but only as links to other repositories of concepts, that may or may not be congruent with those represented in taxonconcept.org. If that's the case, then what happens when the person who identifies the observation [http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc6552.html] doesn't agree with the concept represented in [http://lod.taxonconcept.org/ses/z9oqP#Species] -- or any other concept represented in taxonconcept.org? Do they have to hunt around through the other repositories to find the right one?
Let me give an example. The type specimen of Centropyge fisheri was collected in Hawaii (e.g., http://pbs.bishopmuseum.org/images/JER/detail.asp?ID=-1377454029 ). The type specimen of C. flavicauda was collected in the South China Sea, and is known throughout the rest of the tropical Pacific (e.g., http://pbs.bishopmuseum.org/images/JER/detail.asp?ID=-1339602635).
Many taxonomists have treated these two species as distinct and valid; and hence two separate taxon concepts representing populations in Hawaii, and in the broader Pacific, respectively. Other taxonomists have considered them to be conspecific, and thus only one species throughout the tropical Pacific, including Hawaii. The name "fisheri" has priority, so the concept labeled as "Centropyge fisheri, sensu stricto" refers to the species concept consisting of individuals from Hawaii, and the concept labeled as "Centropyge fisheri, sensu lato" refers to the species concept consisting of individuals throughout the tropical Pacific (including Hawaii).
If I understand you correctly, there would be only one of these two concepts represented in taxonconcept.org. For the sake of argument, let's say it was the sensu lato concept (which is the more modern interpretation, lumping the two historically distinct species). What if someone made an observation in Johnston Atoll, and they are a splitter (i.e. recognizing Hawaii C. fisheri as a distinct species from Pacific C. flavicauda), and wanted to identify their specimen to the concept that *excludes* the Hawaii population (i.e., C. flavicauda)? Would they be able to do so? Or would they have to look through uniprot and bio2rdf, DBpedia, etc. to find a species-level concept that matches the one they want to represent the observation as?
Apologies if I have completely misunderstood this conversation...but at the very least, perhaps a concrete example (with pictures!) might help to disambiguate some of this thread.
Aloha, Rich
Hi Rich,
Some of this depends on if the bulk of data associated with C. flavicauda are under than name or recorded as C. fisheri.
You can always merge two concepts using owl:sameAs, so if someone considered them one species they could download them and run a sameAs.
The problem with keeping them separate is that you might have data sets that are really about C. flavicauda, but were recorded as C. fisheri.
There was a similar example involving birds that David Remsen mentioned but I looked around and all the major records were with the entities as separate species, so I kept them separate.
In the case of subspecies, and in the absence of a more elegant solution, I would record them as instances of a species concept *in the form of Genus epithet subspecific epithet.* * * That way you capture the full intent, without causing additional problem.
The example I use is this. Two Cougar cubs are born on a mountain.
One of the cubs is seen on the North side of the mountain and recorded by scientist A as *Puma concolor couguar*, the other is seen on the South side of the mountain and recorded by scientist B as* Puma concolor.*
**Since every instance of a true subspecies is also an instance of the species it is safe to say that both are occurrences of the species concept for the Cougar.
However when treated as separate strings, as they are in many systems, these are later interpreted as instances of two different things. e.g. two ITIS numbers.
Also, I thought I might restate what I mean by the individual is one kind of thing.
Think of it this way.
An all omniscient being that could see all that individuals DNA, lineage and extent of reproductive isolation could in most cases be able to determine what kind of thing it is and what other individuals are also instances of that kind. She would also be able to recognize what individuals are not instances of that kind of thing.
We as humans can only hypothesize what the things are and also make assertions about what individuals are instances of that thing.
So Nico and I are not really that far apart I just think that for some types of queries and questions it would be best to consider the observation of a specimen, or individual to be one of type of thing at a time.
That does not mean that you can't model it in a way that states we are treating this as this type of thing, but others believe it is a different kind of thing.
I also don't seem to understand why if someone can find some missing utility in existing vocabularies, and mints one starting with txn, it is seen by some as an act of heresy, while the minting a new vocabulary starting with dsw is not.
Heretical enough to be written out of the sacred scrolls?
Respectfully,
- Pete
On Wed, May 4, 2011 at 6:03 PM, Richard Pyle deepreef@bishopmuseum.orgwrote:
Hi Pete,
Yes, I know – and you are among the MANY people to whom I owe some follow-up work (my inbox of unanswered email has crept up past the 8,000 mark). But I entered this fray in part to re-kindle that conversation we had started months ago. That notwithstanding, I don’t remember an answer to the question I posed just now, concerning taxonconcept.org as a concept store. I had been under the impression that it would establish C. fisheri s.s. and C. fisheri s.l. as two distinct Concepts, with two distinct GUIDs, and a mapping of relationships between them and with other entities (e.g., cross-referencing specimens, nomenclature, TaxonNameUsage instances, etc.) The thing that caught my attention in your recent email was the notion that one would need to leave taxonconcept.org to see alternative concept definitions.
Aloha,
Rich
*From:* Peter DeVries [mailto:pete.devries@gmail.com] *Sent:* Wednesday, May 04, 2011 12:25 PM *To:* Richard Pyle *Cc:* Nico Cellinese; tdwg-content@lists.tdwg.org *Subject:* Re: [tdwg-content] Fwd: [Fwd: Re: If you need something for referring to a population, then it is probably best to do it as a related class]
Hi RIch,
These were the very issue that we had talked about modeling last fall and I thought we were planning to work on after the holidays.
Check your old email I have your prototype fish list.
Perhaps SKOS:narrower?
http://lod.taxonconcept.org/Pomacanthidae.html
Respectfully,
- Pete
On Wed, May 4, 2011 at 4:46 PM, Richard Pyle deepreef@bishopmuseum.org wrote:
Alas, I don't have time to dive-in to this conversation in full (I still owe too many things to too many people), though I have been very tempted!
Very quickly:
The model supports links to alternative concepts. The uniprot and
bio2rdf, and DBpedia
URI's can be considered closely related concepts. The way this works ideally is that the identifier of this insect (from
TDWG) makes the assertion that
this
observation http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc 6552.htmlhttp://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc%0d%0a6552.html
represents an instance of this
concept http://lod.taxonconcept.org/ses/z9oqP#Species
But if I understand you correctly, alternate concepts don't exist within taxonconcept.org; but only as links to other repositories of concepts, that may or may not be congruent with those represented in taxonconcept.org. If that's the case, then what happens when the person who identifies the observation [http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc6552.html ] doesn't agree with the concept represented in [http://lod.taxonconcept.org/ses/z9oqP#Species] -- or any other concept represented in taxonconcept.org? Do they have to hunt around through the other repositories to find the right one?
Let me give an example. The type specimen of Centropyge fisheri was collected in Hawaii (e.g., http://pbs.bishopmuseum.org/images/JER/detail.asp?ID=-1377454029 ). The type specimen of C. flavicauda was collected in the South China Sea, and is known throughout the rest of the tropical Pacific (e.g., http://pbs.bishopmuseum.org/images/JER/detail.asp?ID=-1339602635).
Many taxonomists have treated these two species as distinct and valid; and hence two separate taxon concepts representing populations in Hawaii, and in the broader Pacific, respectively. Other taxonomists have considered them to be conspecific, and thus only one species throughout the tropical Pacific, including Hawaii. The name "fisheri" has priority, so the concept labeled as "Centropyge fisheri, sensu stricto" refers to the species concept consisting of individuals from Hawaii, and the concept labeled as "Centropyge fisheri, sensu lato" refers to the species concept consisting of individuals throughout the tropical Pacific (including Hawaii).
If I understand you correctly, there would be only one of these two concepts represented in taxonconcept.org. For the sake of argument, let's say it was the sensu lato concept (which is the more modern interpretation, lumping the two historically distinct species). What if someone made an observation in Johnston Atoll, and they are a splitter (i.e. recognizing Hawaii C. fisheri as a distinct species from Pacific C. flavicauda), and wanted to identify their specimen to the concept that *excludes* the Hawaii population (i.e., C. flavicauda)? Would they be able to do so? Or would they have to look through uniprot and bio2rdf, DBpedia, etc. to find a species-level concept that matches the one they want to represent the observation as?
Apologies if I have completely misunderstood this conversation...but at the very least, perhaps a concrete example (with pictures!) might help to disambiguate some of this thread.
Aloha, Rich
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
Peter DeVries wrote:
I also don't seem to understand why if someone can find some missing utility in existing vocabularies, and mints one starting with txn, it is seen by some as an act of heresy, while the minting a new vocabulary starting with dsw is not.
Heretical enough to be written out of the sacred scrolls?
Nobody else has come right out and said this, but I'm going to go ahead and say it because I really don't think the paranoia contributes to this discussion. It isn't exactly clear to me who you think is the TDWG Illuminati is. You made the statement "TDWG Illuminati determined that *indeed* the current DarwinCore was not good for the semantic web and formed a group to create one" and I asked you what group you were talking about. You did not answer that question. Given the statement below I assume you think it includes me. I have already told you that nobody in TDWG or anywhere else asked or suggested to Cam Webb and I that we develop Darwin-SW. Cam (whom I've never actually met in person) suggested to me that we give it a try and we did. Thus far I have not yet heard anyone, including me, suggest that it was heresy for you to create the txn ontology. Likewise, I have not heard anyone officially associated with TDWG give any kind of "blessing" to dsw. Actually, the fact that no one has come out on the list and said that some aspect of dsw was heresy doesn't actually mean that people aren't thinking that it is. I was kind of expecting that somebody might.
It really borders on humorous that you suggest that I'm somehow a part of some TDWG conspiracy. I have been to precisely one TDWG meeting and with one exception, that is the only time I've ever personally met anybody who regularly contributes to this list. That one exception is Nico Cellinese, whom I've met on one other occasion. In fact, the person whom I talked to the most at the meeting (other than Alexey Zinovjev who came with me to the meeting and was also a TDWG newcomer) was actually YOU. I'm also pretty sure that the only person other than Nico who regularly contributes to this list that I've ever interacted with in any sort of collaborative way is Bob Morris on the Live Plant Image Group, and he as been largely silent in this discussion. Actually, he did make one comment about dsw and I would characterize it as cautionary. That hardly qualifies as a conspiracy to promote DSW.
If you would care to notice, DSW is not my first attempt at writing RDF. My first attempt was the examples in my Biodiversity Informatics paper (https://journals.ku.edu/index.php/jbi/article/view/3664) and quite frankly, at this point I think those examples were not very good. There were several actual mistakes that I made and I think that the overall approach that I was taking in modeling Occurrences was flawed. If it turns out that people in the TDWG community find themselves agreeing with the DSW model (which I do not consider a certainty), it would not be because of a conspiracy. It would be because I've probably spent dozens of hours (maybe even hundreds of hours) reading and trying to understand the points of view expressed by people in this community on the tdwg-content list and in papers and web posts that they've created. With the exception of the IndividualOrganism class (which I'll take some credit for promoting) pretty much everything that I contributed to DSW were ideas that I've absorbed from the TDWG community, which were then molded by Cam's contributions to the collaboration. If you will recall, last November Rich Pyle and I had what I suppose could be considered a somewhat bruising exchange on the list about the scope of the Individual class. Although I did not agree with him at the time, I learned a lot from that exchange and in retrospect, I can see that his opinion was not wrong, it was just framed by the desire to meet different objectives with the class. Cam and I actually attempted (in a somewhat feeble way) to incorporate Rich's perspective in the "alternative version" of DSW.
So my point is that if you want to promote the taxonconcept.org ontology as an ontology for general use by the community (which is certainly your right), then you need to be willing to subject it to critical analysis by the people you want to use it. When you get criticism, you need to see that as an opportunity to improve your work, not as a conspiracy to destroy it. Cam and I have requested a critical analysis of DSW from the community and I don't really think we've gotten enough of it yet to suit me. If DSW has flaws (as it most certainly does), we will try to address those flaws and learn from the experience. All you are going to accomplish by promoting a conspiracy theory is to cause people to not take you seriously. That would be a shame because you have a lot of great ideas and have some of the most experience in the TDWG community at actually implementing LOD "in the wild". You should take the fact that I took the time to wade through the taxonconcept.org RDF to try to understand it and subject it to critical analysis as a compliment, not a threat. I have already acknowledged that a lot of what I know about RDF are things that I learned from looking at your examples.
Steve
Hi Steve,
I am not annoyed at you. I don't have a problem with you creating your ontology or our lively debates.
What is annoying is the pattern of the same idea being dismissed and reappearing under a different name and being accepted.
It is clear that decisions are made, but it is not clear how or by whom.
I jokingly refer to this as the TDWG Illuminati because the decision process seems so mysterious.
Why don't you look through what you wrote in the thread this week and then look over the GBIF KOS report.
Initially my work was characterized as
"in the GeoSpecies project104 based on a small purpose-built ontology105 of mosquito-borne human pathogens."
Now it is not mentioned at all.
Do the authors of that report read the same TDWG emails that you do?
Are the directions of this group based on the merits of each individual's arguments and open debate or is something else going on?
Respectfully,
- Pete
On Wed, May 4, 2011 at 10:13 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu
wrote:
Peter DeVries wrote:
I also don't seem to understand why if someone can find some missing utility in existing vocabularies, and mints one starting with txn, it is seen by some as an act of heresy, while the minting a new vocabulary starting with dsw is not.
Heretical enough to be written out of the sacred scrolls?
Nobody else has come right out and said this, but I'm going to go ahead and say it because I really don't think the paranoia contributes to this discussion. It isn't exactly clear to me who you think is the TDWG Illuminati is. You made the statement "TDWG Illuminati determined that * indeed* the current DarwinCore was not good for the semantic web and formed a group to create one" and I asked you what group you were talking about. You did not answer that question. Given the statement below I assume you think it includes me. I have already told you that nobody in TDWG or anywhere else asked or suggested to Cam Webb and I that we develop Darwin-SW. Cam (whom I've never actually met in person) suggested to me that we give it a try and we did. Thus far I have not yet heard anyone, including me, suggest that it was heresy for you to create the txn ontology. Likewise, I have not heard anyone officially associated with TDWG give any kind of "blessing" to dsw. Actually, the fact that no one has come out on the list and said that some aspect of dsw was heresy doesn't actually mean that people aren't thinking that it is. I was kind of expecting that somebody might.
It really borders on humorous that you suggest that I'm somehow a part of some TDWG conspiracy. I have been to precisely one TDWG meeting and with one exception, that is the only time I've ever personally met anybody who regularly contributes to this list. That one exception is Nico Cellinese, whom I've met on one other occasion. In fact, the person whom I talked to the most at the meeting (other than Alexey Zinovjev who came with me to the meeting and was also a TDWG newcomer) was actually YOU. I'm also pretty sure that the only person other than Nico who regularly contributes to this list that I've ever interacted with in any sort of collaborative way is Bob Morris on the Live Plant Image Group, and he as been largely silent in this discussion. Actually, he did make one comment about dsw and I would characterize it as cautionary. That hardly qualifies as a conspiracy to promote DSW.
If you would care to notice, DSW is not my first attempt at writing RDF. My first attempt was the examples in my Biodiversity Informatics paper ( https://journals.ku.edu/index.php/jbi/article/view/3664) and quite frankly, at this point I think those examples were not very good. There were several actual mistakes that I made and I think that the overall approach that I was taking in modeling Occurrences was flawed. If it turns out that people in the TDWG community find themselves agreeing with the DSW model (which I do not consider a certainty), it would not be because of a conspiracy. It would be because I've probably spent dozens of hours (maybe even hundreds of hours) reading and trying to understand the points of view expressed by people in this community on the tdwg-content list and in papers and web posts that they've created. With the exception of the IndividualOrganism class (which I'll take some credit for promoting) pretty much everything that I contributed to DSW were ideas that I've absorbed from the TDWG community, which were then molded by Cam's contributions to the collaboration. If you will recall, last November Rich Pyle and I had what I suppose could be considered a somewhat bruising exchange on the list about the scope of the Individual class. Although I did not agree with him at the time, I learned a lot from that exchange and in retrospect, I can see that his opinion was not wrong, it was just framed by the desire to meet different objectives with the class. Cam and I actually attempted (in a somewhat feeble way) to incorporate Rich's perspective in the "alternative version" of DSW.
So my point is that if you want to promote the taxonconcept.org ontology as an ontology for general use by the community (which is certainly your right), then you need to be willing to subject it to critical analysis by the people you want to use it. When you get criticism, you need to see that as an opportunity to improve your work, not as a conspiracy to destroy it. Cam and I have requested a critical analysis of DSW from the community and I don't really think we've gotten enough of it yet to suit me. If DSW has flaws (as it most certainly does), we will try to address those flaws and learn from the experience. All you are going to accomplish by promoting a conspiracy theory is to cause people to not take you seriously. That would be a shame because you have a lot of great ideas and have some of the most experience in the TDWG community at actually implementing LOD "in the wild". You should take the fact that I took the time to wade through the taxonconcept.org RDF to try to understand it and subject it to critical analysis as a compliment, not a threat. I have already acknowledged that a lot of what I know about RDF are things that I learned from looking at your examples.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
Pete- A bit of clarification-
- The KOS report was a GBIF effort, not a TDWG effort. It was commissioned to advise and (its authors hope) influence GBIF, not TDWG. - The authorship contains several people who have little or nothing to do with TDWG. One of them, Natasha Noy, is one of the principal architects of Protege and of NCBO BioPortal, and to the best of my knowledge never even heard of TDWG (and barely of GBIF) before I asked her to serve on the GBIF KOS working group. A few others have drifted in and out of TDWG when they have been on projects that find TDWG relevant. - The final KOS report in fact references your work as I believe does an earlier draft. The final document at http://www2.gbif.org/gbif_kos_whitepaper_v1.pdf carries exactly the text you cite. Regrettably, it is possible to parse that clause as suggesting that the entirety of GeoSpecies is based on that small ontology, whereas the intent in that clause was to refer to the example, not the project per se. Since I was the principal drafter, the fault is mine and I apologize. - At a glance it appears to me that five of the eight GBIF KOS authors subscribe to tdwg-content, but among them, I have never seen remarks on Linked Data postings and the related threads except by Hilmar and me. Again, the KOS document is not a TDWG document (unless TDWG chooses to endorse it), so I am not surprised. - The TDWG standards process is not secret. It is laid out in http://wiki.tdwg.org/twiki/pub/Process/WebHome/TDWG_Process_2006-08-16.pdf Cumbersome, maybe; mysterious, not in my opinion.
Bob Morris
On Thu, May 5, 2011 at 12:39 AM, Peter DeVries pete.devries@gmail.com wrote:
Hi Steve, I am not annoyed at you. I don't have a problem with you creating your ontology or our lively debates. What is annoying is the pattern of the same idea being dismissed and reappearing under a different name and being accepted. It is clear that decisions are made, but it is not clear how or by whom. I jokingly refer to this as the TDWG Illuminati because the decision process seems so mysterious. Why don't you look through what you wrote in the thread this week and then look over the GBIF KOS report. Initially my work was characterized as "in the GeoSpecies project104 based on a small purpose-built ontology105 of mosquito-borne human pathogens." Now it is not mentioned at all. Do the authors of that report read the same TDWG emails that you do? Are the directions of this group based on the merits of each individual's arguments and open debate or is something else going on? Respectfully,
- Pete
On Wed, May 4, 2011 at 10:13 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Peter DeVries wrote:
I also don't seem to understand why if someone can find some missing utility in existing vocabularies, and mints one starting with txn, it is seen by some as an act of heresy, while the minting a new vocabulary starting with dsw is not. Heretical enough to be written out of the sacred scrolls?
Nobody else has come right out and said this, but I'm going to go ahead and say it because I really don't think the paranoia contributes to this discussion. It isn't exactly clear to me who you think is the TDWG Illuminati is. You made the statement "TDWG Illuminati determined that indeed the current DarwinCore was not good for the semantic web and formed a group to create one" and I asked you what group you were talking about. You did not answer that question. Given the statement below I assume you think it includes me. I have already told you that nobody in TDWG or anywhere else asked or suggested to Cam Webb and I that we develop Darwin-SW. Cam (whom I've never actually met in person) suggested to me that we give it a try and we did. Thus far I have not yet heard anyone, including me, suggest that it was heresy for you to create the txn ontology. Likewise, I have not heard anyone officially associated with TDWG give any kind of "blessing" to dsw. Actually, the fact that no one has come out on the list and said that some aspect of dsw was heresy doesn't actually mean that people aren't thinking that it is. I was kind of expecting that somebody might.
It really borders on humorous that you suggest that I'm somehow a part of some TDWG conspiracy. I have been to precisely one TDWG meeting and with one exception, that is the only time I've ever personally met anybody who regularly contributes to this list. That one exception is Nico Cellinese, whom I've met on one other occasion. In fact, the person whom I talked to the most at the meeting (other than Alexey Zinovjev who came with me to the meeting and was also a TDWG newcomer) was actually YOU. I'm also pretty sure that the only person other than Nico who regularly contributes to this list that I've ever interacted with in any sort of collaborative way is Bob Morris on the Live Plant Image Group, and he as been largely silent in this discussion. Actually, he did make one comment about dsw and I would characterize it as cautionary. That hardly qualifies as a conspiracy to promote DSW.
If you would care to notice, DSW is not my first attempt at writing RDF. My first attempt was the examples in my Biodiversity Informatics paper (https://journals.ku.edu/index.php/jbi/article/view/3664) and quite frankly, at this point I think those examples were not very good. There were several actual mistakes that I made and I think that the overall approach that I was taking in modeling Occurrences was flawed. If it turns out that people in the TDWG community find themselves agreeing with the DSW model (which I do not consider a certainty), it would not be because of a conspiracy. It would be because I've probably spent dozens of hours (maybe even hundreds of hours) reading and trying to understand the points of view expressed by people in this community on the tdwg-content list and in papers and web posts that they've created. With the exception of the IndividualOrganism class (which I'll take some credit for promoting) pretty much everything that I contributed to DSW were ideas that I've absorbed from the TDWG community, which were then molded by Cam's contributions to the collaboration. If you will recall, last November Rich Pyle and I had what I suppose could be considered a somewhat bruising exchange on the list about the scope of the Individual class. Although I did not agree with him at the time, I learned a lot from that exchange and in retrospect, I can see that his opinion was not wrong, it was just framed by the desire to meet different objectives with the class. Cam and I actually attempted (in a somewhat feeble way) to incorporate Rich's perspective in the "alternative version" of DSW.
So my point is that if you want to promote the taxonconcept.org ontology as an ontology for general use by the community (which is certainly your right), then you need to be willing to subject it to critical analysis by the people you want to use it. When you get criticism, you need to see that as an opportunity to improve your work, not as a conspiracy to destroy it. Cam and I have requested a critical analysis of DSW from the community and I don't really think we've gotten enough of it yet to suit me. If DSW has flaws (as it most certainly does), we will try to address those flaws and learn from the experience. All you are going to accomplish by promoting a conspiracy theory is to cause people to not take you seriously. That would be a shame because you have a lot of great ideas and have some of the most experience in the TDWG community at actually implementing LOD "in the wild". You should take the fact that I took the time to wade through the taxonconcept.org RDF to try to understand it and subject it to critical analysis as a compliment, not a threat. I have already acknowledged that a lot of what I know about RDF are things that I learned from looking at your examples.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept & GeoSpecies Knowledge Bases A Semantic Web, Linked Open Data Project
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Pete,
There are no “TDWG Illuminati”; there is only a TDWG Executive Committee, which is elected every year as specified in the TDWG Constitution ( http://www.tdwg.org/about-tdwg/constitution/). There is also a relatively simple TDWG process for establishing standards. It begins with interest and task groups, and carries on through proposing and reviewing standards ( http://www.tdwg.org/about-tdwg/process/). That process is relatively similar to other standards development processes in bottom-up, volunteer-driven, IT-oriented communities. The only serious deviations from our process that I’m aware of recently are that the Executive Committee has not demanded that Interest and Task Group conveners update their charters annually, and has been slow in giving feedback to the conveners. Those lapses are both due to a lack of effort (available time), NOT subjecting in-group (illuminati) to one set of rules and out-group people to another. To paraphrase a familiar maxim: never ascribe to malice, collusion, or conspiracy what can be explained by lack of organization, lack of effort, and miscommunication.
Also note that neither DsW: nor txn: have any standing in TDWG until they go through the process. They are both “ignored” equally. How could it be otherwise?
As you can see, getting people to agree about things that are both complicated and subtly different (like all the semantic web technologies) is VERY hard.
There are a couple of ways forward from the current state of affairs, though: establish a project (number of participants > 1) to test alternative approaches; establish an interest/task group to draft a specification; even summarize the points made in this most recent round of posts. I welcome ALL of this discussion, but email discussion on its own doesn’t create best practice specifications or standards.
-Stan
On 5/4/11 9:39 PM, "Peter DeVries" pete.devries@gmail.com wrote:
Hi Steve,
I am not annoyed at you. I don't have a problem with you creating your ontology or our lively debates.
What is annoying is the pattern of the same idea being dismissed and reappearing under a different name and being accepted.
It is clear that decisions are made, but it is not clear how or by whom.
I jokingly refer to this as the TDWG Illuminati because the decision process seems so mysterious.
Why don't you look through what you wrote in the thread this week and then look over the GBIF KOS report.
Initially my work was characterized as
"in the GeoSpecies project104 based on a small purpose-built ontology105 of mosquito-borne human pathogens."
Now it is not mentioned at all.
Do the authors of that report read the same TDWG emails that you do?
Are the directions of this group based on the merits of each individual's arguments and open debate or is something else going on?
Respectfully,
- Pete
On Wed, May 4, 2011 at 10:13 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
Peter DeVries wrote:
I also don't seem to understand why if someone can find some missing utility in existing vocabularies, and mints one starting with txn, it is seen by some as an act of heresy, while the minting a new vocabulary starting with dsw is not.
Heretical enough to be written out of the sacred scrolls?
Nobody else has come right out and said this, but I'm going to go ahead and say it because I really don't think the paranoia contributes to this discussion. It isn't exactly clear to me who you think is the TDWG Illuminati is. You made the statement "TDWG Illuminati determined that indeed the current DarwinCore was not good for the semantic web and formed a group to create one" and I asked you what group you were talking about. You did not answer that question. Given the statement below I assume you think it includes me. I have already told you that nobody in TDWG or anywhere else asked or suggested to Cam Webb and I that we develop Darwin-SW. Cam (whom I've never actually met in person) suggested to me that we give it a try and we did. Thus far I have not yet heard anyone, including me, suggest that it was heresy for you to create the txn ontology. Likewise, I have not heard anyone officially associated with TDWG give any kind of "blessing" to dsw. Actually, the fact that no one has come out on the list and said that some aspect of dsw was heresy doesn't actually mean that people aren't thinking that it is. I was kind of expecting that somebody might.
It really borders on humorous that you suggest that I'm somehow a part of some TDWG conspiracy. I have been to precisely one TDWG meeting and with one exception, that is the only time I've ever personally met anybody who regularly contributes to this list. That one exception is Nico Cellinese, whom I've met on one other occasion. In fact, the person whom I talked to the most at the meeting (other than Alexey Zinovjev who came with me to the meeting and was also a TDWG newcomer) was actually YOU. I'm also pretty sure that the only person other than Nico who regularly contributes to this list that I've ever interacted with in any sort of collaborative way is Bob Morris on the Live Plant Image Group, and he as been largely silent in this discussion. Actually, he did make one comment about dsw and I would characterize it as cautionary. That hardly qualifies as a conspiracy to promote DSW.
If you would care to notice, DSW is not my first attempt at writing RDF. My first attempt was the examples in my Biodiversity Informatics paper (https://journals.ku.edu/index.php/jbi/article/view/3664) and quite frankly, at this point I think those examples were not very good. There were several actual mistakes that I made and I think that the overall approach that I was taking in modeling Occurrences was flawed. If it turns out that people in the TDWG community find themselves agreeing with the DSW model (which I do not consider a certainty), it would not be because of a conspiracy. It would be because I've probably spent dozens of hours (maybe even hundreds of hours) reading and trying to understand the points of view expressed by people in this community on the tdwg-content list and in papers and web posts that they've created. With the exception of the IndividualOrganism class (which I'll take some credit for promoting) pretty much everything that I contributed to DSW were ideas that I've absorbed from the TDWG community, which were then molded by Cam's contributions to the collaboration. If you will recall, last November Rich Pyle and I had what I suppose could be considered a somewhat bruising exchange on the list about the scope of the Individual class. Although I did not agree with him at the time, I learned a lot from that exchange and in retrospect, I can see that his opinion was not wrong, it was just framed by the desire to meet different objectives with the class. Cam and I actually attempted (in a somewhat feeble way) to incorporate Rich's perspective in the "alternative version" of DSW.
So my point is that if you want to promote the taxonconcept.org http://taxonconcept.org ontology as an ontology for general use by the community (which is certainly your right), then you need to be willing to subject it to critical analysis by the people you want to use it. When you get criticism, you need to see that as an opportunity to improve your work, not as a conspiracy to destroy it. Cam and I have requested a critical analysis of DSW from the community and I don't really think we've gotten enough of it yet to suit me. If DSW has flaws (as it most certainly does), we will try to address those flaws and learn from the experience. All you are going to accomplish by promoting a conspiracy theory is to cause people to not take you seriously. That would be a shame because you have a lot of great ideas and have some of the most experience in the TDWG community at actually implementing LOD "in the wild". You should take the fact that I took the time to wade through the taxonconcept.org http://taxonconcept.org RDF to try to understand it and subject it to critical analysis as a compliment, not a threat. I have already acknowledged that a lot of what I know about RDF are things that I learned from looking at your examples.
Steve
Birth certificates. I want to see birth certificates! :)
jim (member of the TDWG Obscurati, TDWG Obfuscati, TDWG Obliterati, etc.)
On Thu, May 5, 2011 at 3:53 PM, Blum, Stan SBlum@calacademy.org wrote:
There are no “TDWG Illuminati”
Thanks Jim for injecting some humor. :-)
OK, I am sorry. I overreacted.
I spend a lot of time explaining how different things work, and making examples etc on this list and I sometimes wonder if this is really in my best interest?
Would I be better off spending time on groups that are clearer about what they like and don't like?
If I take the time to change something to address some issue, will it even matter in the end?
Does it make sense to take the time to build some consensus?
It seem that it would be best to do this within the TDWG umbrella but at what cost compared to finishing my degree etc.
Respectfully,
- Pete
On Thu, May 5, 2011 at 1:07 AM, Jim Croft jim.croft@gmail.com wrote:
Birth certificates. I want to see birth certificates! :)
jim (member of the TDWG Obscurati, TDWG Obfuscati, TDWG Obliterati, etc.)
On Thu, May 5, 2011 at 3:53 PM, Blum, Stan SBlum@calacademy.org wrote:
There are no “TDWG Illuminati”
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On 05/05/2011, at 1:13 PM, Steve Baskauf wrote:
Peter DeVries wrote:
I also don't seem to understand why if someone can find some missing utility in existing vocabularies, and mints one starting with txn, it is seen by some as an act of heresy, while the minting a new vocabulary starting with dsw is not.
I might jump in and make a technical point about OWL/RDF here.
One of the strengths of OWL is that you can define equivalencies between your own vocabulary and someone else's. You can declare that your own "Taxon" notion is narrower or broader than some other notion of taxon. The power of this is that if someone attempts to reason over some well-known vocabulary and you define semantic relationships, their rules will also reason over your dataset - as far as is possible.
However, what you can do, and what I think would be a very sensible thing to do, is to keep your vocabulary terms and your semantic equivalencies in separate ontology documents.
For instance:
Lets say you have a predicate "isVoucherFor", and it's quite obvious that it is pretty much the same thing as http://rs.tdwg.org/ontology/voc/Specimen#isVoucherFor, but slightly more specific. For instance, your "isVoucherFor" might apply only to specimens that are RegisteredSpecimens in your vocabulary. It has a range declaration on it. In this case, it seems obvious that your isVoucherFor is a subproperty of the tdwg isVoucherFor.
I'm suggesting suggest that the declaration of your 'isVoucherFor' probably should include your range specifier (it's part of the way your vocabulary works), but the declaration of the subproperty relationship should be put in some other file/ontology, which would import both your vocabulary and the TDWG vocabulary (or DwC vocabulary, in this case). Any RDF files containing data should probably import your vocabulary but not the equaivalency semantics.
This allows you to "plug in" sets of equivalencies. A sparql query can drag in some data using your vocabulary, and can separately drag in your rules. Or not. That way:
* if the other vocabulary changes in ways that break things, you can define new equivalencies without disturbing an existing ruleset that people might be using * people who want to use your terms but have their own ideas as to what the equivalencies are can do their own reasoning * if someone's data uses your terms but using the equivalencies results in inconsistencies, a querent can plug in a stripped-down set of semantics and reason over that * the act of simply reading and processing your data does not cause the other vocabulary to be dragged into the engine
The ongoing problem is management: what rules are applicable? What rule sets are there? The TDWG site itself (or any webserver anywhere) can serve as an aggregator for bioinformatic RDF/OWL semantics simply by hosting empty ontologies that import others . Thus "Old-tdwg-and-DwC.owl", "DwC-and-taxonconcept(lax).owl", "rulesets-that-pertain-to-occurence.owl", "absolutely-everything-ever.owl", and so on.
The work , of course, is in thrashing out the equivalencies - an ongoing job.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Thanks for this helpful suggestion, Paul. I have a question related to what you said. As a general practice, DSW "imported" class terms from DwC to use as the basic classes in the ontology. This seemed like a relatively "safe" thing to do, since the RDF definitions of the class terms in DwC don't really say anything restrictive about how they are used. We then basically tried to describe (using properties) how those classes were connected and put in some restrictions intended to prevent people from calling apples oranges (range/domain restrictions coupled with disjoin declarations). However, we went beyond that in the Taxon class. We declared that dwc:Taxon was equivalent to tc:Taxon (Taxon in the TDWG ontology, which is declared within the TDWG ontology to be the same thing as tc:TaxonConcept). Our motivation was to try to add some clarity to what exactly the dwc:Taxon class was (that wasn't exactly clear to us) and also to avoid having to try to describe the properties of the Taxon class since they were already described (to some extent) within the TDWG ontology. However, from what you have said below, it sounds like it might have been a better idea to have declared the equivalency of dwc:Taxon and tc:Taxon outside of the main ontology document to allow further development of the Taxon class within DSW without causing a conflict with the description of Taxon within the TDWG ontology.
I guess the wisdom of doing this partially depends on the future of the TDWG ontology, which is not within our (Cam and my) hands. We didn't really want to get into describing Taxon (not our area of specialty) and the tc:Taxon class and terms were in use by some people (you, for example). But if the TDWG ontology is permanently "frozen" (a.k.a. abandoned), then maybe tying Taxon to it was not wise. I would be interested in your opinion on this.
Steve
Paul Murray wrote:
On 05/05/2011, at 1:13 PM, Steve Baskauf wrote:
Peter DeVries wrote:
I also don't seem to understand why if someone can find some missing utility in existing vocabularies, and mints one starting with txn, it is seen by some as an act of heresy, while the minting a new vocabulary starting with dsw is not.
I might jump in and make a technical point about OWL/RDF here.
One of the strengths of OWL is that you can define equivalencies between your own vocabulary and someone else's. You can declare that your own "Taxon" notion is narrower or broader than some other notion of taxon. The power of this is that if someone attempts to reason over some well-known vocabulary and you define semantic relationships, their rules will also reason over your dataset - as far as is possible.
However, what you can do, and what I think would be a very sensible thing to do, is to keep your vocabulary terms and your semantic equivalencies in separate ontology documents.
For instance:
Lets say you have a predicate "isVoucherFor", and it's quite obvious that it is pretty much the same thing as http://rs.tdwg.org/ontology/voc/Specimen#isVoucherFor, but slightly more specific. For instance, your "isVoucherFor" might apply only to specimens that are RegisteredSpecimens in your vocabulary. It has a range declaration on it. In this case, it seems obvious that your isVoucherFor is a subproperty of the tdwg isVoucherFor.
I'm suggesting suggest that the declaration of your 'isVoucherFor' probably should include your range specifier (it's part of the way your vocabulary works), but the declaration of the subproperty relationship should be put in some other file/ontology, which would import both your vocabulary and the TDWG vocabulary (or DwC vocabulary, in this case). Any RDF files containing data should probably import your vocabulary but not the equaivalency semantics.
This allows you to "plug in" sets of equivalencies. A sparql query can drag in some data using your vocabulary, and can separately drag in your rules. Or not. That way:
- if the other vocabulary changes in ways that break things, you can
define new equivalencies without disturbing an existing ruleset that people might be using
- people who want to use your terms but have their own ideas as to
what the equivalencies are can do their own reasoning
- if someone's data uses your terms but using the equivalencies
results in inconsistencies, a querent can plug in a stripped-down set of semantics and reason over that
- the act of simply reading and processing your data does not cause
the other vocabulary to be dragged into the engine
The ongoing problem is management: what rules are applicable? What rule sets are there? The TDWG site itself (or any webserver anywhere) can serve as an aggregator for bioinformatic RDF/OWL semantics simply by hosting empty ontologies that import others . Thus "Old-tdwg-and-DwC.owl", "DwC-and-taxonconcept(lax).owl", "rulesets-that-pertain-to-occurence.owl", "absolutely-everything-ever.owl", and so on.
The work , of course, is in thrashing out the equivalencies - an ongoing job.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email.
On 06/05/2011, at 8:59 PM, Steve Baskauf wrote:
However, we went beyond that in the Taxon class. We declared that dwc:Taxon was equivalent to tc:Taxon (Taxon in the TDWG ontology, which is declared within the TDWG ontology to be the same thing as tc:TaxonConcept). Our motivation was to try to add some clarity to what exactly the dwc:Taxon class was (that wasn't exactly clear to us) and also to avoid having to try to describe the properties of the Taxon class since they were already described (to some extent) within the TDWG ontology.
I believe it is the case that if you declare two classes to be equivalent, then any restrictions migrate across both ways.
Hang on ... I'll write myself a little test case in protege. I declare tdwg and dwc Name and Taxon classes, and declare that the dwc Name and Taxon class are different to each other
(NB: I don't mean to imply that this actually is the case in DwC)
and equivalent to the respective tdwg classes - that in DwC, a name is not a taxon is not a name. When we do something that is ok in the tdwg space - declare that some individual is both a name and a taxon, then the HermiT reasoner straight away complains that the ontology is inconsistent.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
If you say that dwc_Taxon "is equivalent to" tdwg_Taxon, then you are saying that anything you say about the one also applies to the other. The consequence of this is that you might not be able to pull in certain sets of data from hetrogenous sources. Bill's dataset may be a bit woolly about the Name/Taxon distinction, Bob's may use DwC and be quite strict. If the DwC vocabulary includes an equivalence, then if anything anywhere declares this disjoint relationship, the whole lot is inconsistent. If Bill's data contains information about genetics and Bob's contains data about distribution and you want to combine them .... you are out of luck.
One way to manage this situation is as I suggested: by rigging things up so that a user can select sets of inference rules that work together without having them automatically dragged in by the imports - by making just the declarations without any other rules available separately. This is more useful than it might appear, as those declarations will include any annotations that you want to put on your items: descriptions, labels and whatnot.
The alternative is to declare that a dwc_Taxon is a subclass of a tdwg_Taxon. Formally: if something is a dwc_Taxon then that means it is also a tdwg_Taxon. This makes the ontology consistent:
The price of this, however, is that everywhere you use Taxon, you have to decide which you mean, most particularly on range and domain declarations. If you define a DwC predicate that has a domain of dwc_Taxon (ie: it only applies to taxa that are DwC taxa), then if it is applied to anything then we infer that the thing that it is applied to is a DwC taxon. But heck ... if there's a DwC term that applies to "a taxon" (hasDwCDistributionitems), wouldn't it happily apply to things that are tdwg taxon objects, too? Sure it would! Mostly.
So, I'd suggest the safe way to go would be
a) yes, import the tdwg classes by all means b) declare your classes to be subclasses of the tdwg ones c) declare that your predicates have a domain/range of the TDWG classes
(An *even safer* way to go is to create a completely generic "TaxonThingy" superclass and to declare (in your vocabulary) that tdwg_Taxon is a subclass of that ... but this is so safe that it doesn't accomplish anything useful. You may as well not have range and domain declarations at all.)
I'm generally sympathetic to this approach except for c) and not only because of my familiar whine that mindlessly putting domains on predicates constrains extensibility to no salutary effect other than to constrain extensibility. That being a stated goal of Steve and Cam, a more important issue might be that the tdwg ontologies in their present form have an ambiguous future. To my knowledge, the TDWG adoption process has never even been begun for them, and indeed http://code.google.com/p/tdwg-ontology/ seems to have been dormant since December 2009, and the earlier (2007!) http://rs.tdwg.org/ontology/voc/TaxonConcept seems to carry a warning that the google site is more authoritative. IMO, one may consider the vocabulary at least socially, if not actually deprecated. Steve raised this issue in the earlier post in this subject.
So the question remains: what advantage of subclassing tc:TaxonConcept accrues to users of the specification? If there is actually data coded to tc:TaxonConcept, then applications that wish to integrate with such data, e.g. LOD apps, can make the subclass declaration as part of the application, and integration with what, if anything, replaces tc:TaxonConcept will not be impeded.
Bob Morris
On Sun, May 8, 2011 at 9:34 PM, Paul Murray pmurray@anbg.gov.au wrote:
[snip]
So, I'd suggest the safe way to go would be
a) yes, import the tdwg classes by all means b) declare your classes to be subclasses of the tdwg ones c) declare that your predicates have a domain/range of the TDWG classes
(An *even safer* way to go is to create a completely generic "TaxonThingy" superclass and to declare (in your vocabulary) that tdwg_Taxon is a subclass of that ... but this is so safe that it doesn't accomplish anything useful. You may as well not have range and domain declarations at all.)
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On 09/05/2011, at 1:08 PM, Bob Morris wrote:
So the question remains: what advantage of subclassing tc:TaxonConcept accrues to users of the specification? If there is actually data coded to tc:TaxonConcept, then applications that wish to integrate with such data, e.g. LOD apps, can make the subclass declaration as part of the application, and integration with what, if anything, replaces tc:TaxonConcept will not be impeded.
Well, the advantage is that applications that wish to integrate with such data need not make the declaration as part of the application. This particularly applies to things that don't understand taxonomy, things that browse the semantic web in general. It allows things like that to look at a whole world of URIs that identify tc taxa, a whole world of uris that identify DwC taxa, and go "hang on, these have something in common".
Without these declarations, predicate names are little more than database column names in a data dump somewhere: you need a human to understand the equivalences, to supply the fact that the two taxon concepts are the same sorts of things.
As for data, we use the tc: vocabulary for a couple of million or so data records at biodiversity.org.au .
http://biodiversity.org.au/taxon/Fabaceae.rdf http://biodiversity.org.au/taxon/Fabaceae.html
I'm not really looking forward to recoding my data extraction so as to use the new DwC, when it's ready - although that will have to be done at some point.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Well, the advantage is that applications that wish to integrate with such data need not make the declaration as part of the application. This particularly applies to things that don't understand taxonomy, things that browse the semantic web in general. It allows things like that to look at a whole world of URIs that identify tc taxa, a whole world of uris that identify DwC taxa, and go "hang on, these have something in common".
Another purpose of these kinds of declaration is to answer questions like: "Ok, what *kinds of thing* might be meaningfully said about the scientific name Fabaceae"?", or "What existing vocabulary terms describe teeth?"
This might be particularly applicable when people start minting their own vocabulary terms.
If the herpetologists get together and produce a vocabulary for describing reptile scales, by linking their vocabulary to existing well-known generic ones (eg, the "old" tdwg terms) it becomes possible to build editors that are "smart", that understand that if you are defining a DwC specimen record, then "scale shape" is a predicate that you might want to apply to it.
The herpetology vocabulary can contain specific things that would not appear in TDWG: for instance, the class "scaled thing". The moment you declare that a thing has a herp:scaleShape, then we infer that it is a herp:ScaledThing, and that therefore it might/must also have values for certain other properties. In addition, an application can group the properties relating to scaledThings together, rather than having all possible properties as a alphabetically sorted list. That is: the vocabulary does something useful beyond being simply a list of field names.
Perhaps it might be put like this: it's these declarations that put the semantics in the semantic web. Without them, you simply have linked data.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Paul
I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y). And this is built into standard semantic web reasoners - which is a bonus. But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar). Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
Kevin
-----Original Message----- From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Paul Murray Sent: Monday, 9 May 2011 3:49 p.m. To: tdwg-content@lists.tdwg.org List Subject: Re: [tdwg-content] Heretics and illuminati, oh my! [SEC=UNCLASSIFIED]
Well, the advantage is that applications that wish to integrate with such data need not make the declaration as part of the application. This particularly applies to things that don't understand taxonomy, things that browse the semantic web in general. It allows things like that to look at a whole world of URIs that identify tc taxa, a whole world of uris that identify DwC taxa, and go "hang on, these have something in common".
Another purpose of these kinds of declaration is to answer questions like: "Ok, what *kinds of thing* might be meaningfully said about the scientific name Fabaceae"?", or "What existing vocabulary terms describe teeth?"
This might be particularly applicable when people start minting their own vocabulary terms.
If the herpetologists get together and produce a vocabulary for describing reptile scales, by linking their vocabulary to existing well-known generic ones (eg, the "old" tdwg terms) it becomes possible to build editors that are "smart", that understand that if you are defining a DwC specimen record, then "scale shape" is a predicate that you might want to apply to it.
The herpetology vocabulary can contain specific things that would not appear in TDWG: for instance, the class "scaled thing". The moment you declare that a thing has a herp:scaleShape, then we infer that it is a herp:ScaledThing, and that therefore it might/must also have values for certain other properties. In addition, an application can group the properties relating to scaledThings together, rather than having all possible properties as a alphabetically sorted list. That is: the vocabulary does something useful beyond being simply a list of field names.
Perhaps it might be put like this: it's these declarations that put the semantics in the semantic web. Without them, you simply have linked data.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
Paul
I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y). And this is built into standard semantic web reasoners - which is a bonus. But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar). Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
(another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans.
Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties. At least ... I think it is. I should write a test case. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
The various elements like #Observation, #Image, #Identification within the TaxonConcept species concepts are similar to puns.
This is designed to follow OWL2
Note that the Honey Bee http://lod.taxonconcept.org/ses/z9oqP#Species
http://lod.taxonconcept.org/ses/z9oqP#SpeciesHas
<dcterms:hasPart rdf:resource=" http://lod.taxonconcept.org/ses/z9oqP#Image%22/%3E <dcterms:hasPart rdf:resource=" http://lod.taxonconcept.org/ses/z9oqP#Occurrence%22/%3E <dcterms:hasPart rdf:resource=" http://lod.taxonconcept.org/ses/z9oqP#Individual%22/%3E <dcterms:hasPart rdf:resource=" http://lod.taxonconcept.org/ses/z9oqP#Identification%22/%3E <dcterms:hasPart rdf:resource=" http://lod.taxonconcept.org/ses/z9oqP#Taxonomy%22/%3E <dcterms:hasPart rdf:resource=" http://lod.taxonconcept.org/ses/z9oqP#NCBI_Taxonomy%22/%3E <dcterms:hasPart rdf:resource=" http://lod.taxonconcept.org/ses/z9oqP#OriginalDescription%22/%3E
In part this was designed so that the species concept could be an owl class while the occurrence of a Honey Bee is an instance of the class txn:Occurrence. It is tied to the species concept because an Honey Bee occurrence record is also an instance of http://lod.taxonconcept.org/ses/z9oqP#Occurrence
See
HTML http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc6552.html RDF http://ocs.taxonconcept.org/ocs/0da685c9-9cdc-4dff-baf3-38d1bdbc6552.rdf
Similarly now and identification of a Honey Bee is both an instance of txn:Identification and an instance of http://lod.taxonconcept.org/ses/z9oqP#Identification
This SPARQL Query describe all the identification events for the Honey Bee.
describe ?s where { ?s a < http://lod.taxonconcept.org/ses/z9oqP#Identification%3E }
< http://lsd.taxonconcept.org/isparql/view/?query=describe%20%3Fs%20where%20%7...
or via bit.ly http://bit.ly/iDRIaY
- Pete
On Mon, May 9, 2011 at 1:09 AM, Paul Murray pmurray@anbg.gov.au wrote:
On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
Paul
I had the same thought (ie the x is of type dwc:Taxon, y is of type
tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y).
And this is built into standard semantic web reasoners - which is a
bonus.
But this was debated (taking into account Bob Morris' issue) with respect
to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar).
Do you think this is reasonable or are we just losing too much semantic
web benefits by not specifying the domain constraint?
A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
(another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans.
Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties. At least ... I think it is. I should write a test case. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Being either fearless or a fool (is there actually a difference?), I shall tread into this subject area at which I am a mere novice. So be kind...
I think that there may be several "solutions" to this problem. The one that is "correct" probably depends on what one is trying to accomplish. So I will try to describe in the most succinct way what Cam and I were trying to accomplish with DSW, and how that fits in with this thread. Cam and I basically wanted to do two things: 1. Make it possible to use GUIDs RIGHT NOW (not five years from now). 2. Create an extremely stripped down ontology that would be non-controversial enough that people might actually use it, but which wouldn't do anything so bad that it would inhibit future development in the Semantic Web context (i.e. it could be extended in the future by clever people to do clever Semantic Web stuff).
Amazingly, the GUID Applicability Statement has achieved the status of Standard-hood! (http://www.tdwg.org/standards/150/) Hooray! I sort of missed the announcement, but ran across the fact the other day when I was surfing through the TDWG website. Since the GUID A.S. is now a TDWG Standard, I would say it would now officially be a best-practice to follow it. In particular, Recommendation 11 states "Objects in the biodiversity domain that are identified by a GUID should be typed using the TDWG ontology or other well-known vocabularies in accordance with the TDWG common architecture." This is somewhat problematic, given that the TDWG ontology (with the possible exception of the Taxon/TaxonConcept part) is effectively ("socially"?) deprecated. What is the alternative "other well-known vocabulary"? There is none, at least none having any kind of official status with TDWG.
I recently discovered (or maybe re-discovered) the Technical Architecture Group (TAG) Technical Roadmaps from 2006-2008: http://www.tdwg.org/uploads/media/TAG_Roadmap_01.doc http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2007_final.pdf http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2008.pdf I might have seen them before, but if so it was at the point where I was really not knowledgeable enough to comprehend them. I found it very instructive to read about what the TAG had in mind when it set out to create the TDWG Ontology. In particular (from the 2007 Roadmap): "From the point of view of exchanging data - such as in the federation of a number of natural history collections - there is no need for a standards architecture. The federation is a closed system where a single exchange format can be agreed on. ... This model has worked well in the past but it does not meet the primary use case that is emerging. Biodiversity research is typically carried out by combining data of different kinds from multiple sources. The providers of data do not know who will use their data or how it will be combined with data from other sources. The consumer needs some level of commonality across all the data received so that it can be combined for analysis without the need to write computer software for every new combination." [This brings to mind the very "different kinds" of resources Cam is documenting in Borneo and the "multiple sources" that will be handling the metadata once those resources are sent off to herbaria, labs, and arboreta.]
and from the 2008 Roadmap: "If GUIDs are used to uniquely identify 'pieces' of data we need to have a shared understanding of what we mean by a 'piece of data' i.e. what kind of thing is it that a particular id applies to, a specimen, a person, an observation, a complete data set. We also need to have a shared understanding of at least some of the properties we use to describe these things."
Having been barely aware of TDWG's existence in 2008, I am blissfully ignorant of whatever disagreements may have occurred regarding LSIDs, reification, or whatever, and really don't want to know about them. All I can say as an outside observer is that it appears that the failure of the initial efforts to get GUIDs and the TDWG Ontology off the ground was because the system envisioned was too complicated to maintain, too complicated to gain a consensus, and to complicated to actually explain to anybody. Now that GUIDs seem to have a new lease on life, it seems like the greatest chance of successfully implementing them is to start by keeping things absolutely as simple as possible. To Cam and me, Darwin Core seemed to be the only candidate for something relatively simple and relatively universally accepted on which one could base an ontology that could be used to type GUIDs and to use to express "a shared understanding of at least some of the properties used to describe" biodiversity resources. Although I was somewhat skeptical that there was a "community consensus" about what the DwC classes meant and how they were related to each other, the exhaustive discussion on this list in Oct/Nov convinced me that maybe there WAS a consensus, or at least enough of a consensus to move forward. Although some people may at the present time be interested in figuring out how to do things like "define 'Fish' as an owl class as well as as a Taxon object", I would assert that is outside the core mission of TDWG as stated: to "develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms evidenced by the historical record". It is fun to talk about, but to me not the primary consideration in designing a community data exchange model. This outlook explains to some extent why I asked questions about the complexity of taxonconcept.org and its orientation toward facilitating semantic queries. There is nothing wrong with that, but it doesn't seem to be the direction that TDWG has said it wants to go. Perhaps when we have "gotten there" (i.e. have a functioning system using GUIDs for clearly typed resources), we might want to embark further down the road to the Semantic Web.
Aside from just importing the DwC classes into the DSW ontology and connecting them with object properties, Cam and I did a little nasty thing with them. It has been said that declaring ranges and domains for terms doesn't prevent people from using the terms to express relationships among the "wrong" types of things. Rather, it simply asserts that those things are instances of the classes used in the range and domain declarations for the term. That is sort of true, but by declaring many of the core DwC classes to be disjoint, we actually ARE preventing people from using the wrong object properties with instances of the wrong classes. If Joe Curator rdf:type's a determination as a dwc:Identification, but then uses dsw:atEvent (which has the domain dwc:Occurrence) as a property of the determination, then a reasoner will infer that the determination is a type dwc:Occurrence as well as the explicitly declared type dwc:Identification. Because dwc:Identification and dwc:Occurrence are disjoint classes, the reasoner will have a fit. Cam and I are being Naughty (sensu Bob Morris) because we are inhibiting the extensibility of dsw:atEvent, but Joe Curator is being Naughty (sensu Baskauf and Webb) because Cam and I believe in the statement from the 2007 Roadmap: "The consumer needs some level of commonality across all the data received...". Joe Curator is not being consistent with the "shared understanding of at least some of the properties" to the extent that DSW reflects the "shared understanding" of the TDWG community. We are basically trying to enforce a sort of orthodoxy on the use of DwC classes as rdf:types and on the connections between the dwc:classes so that people can have some reasonable expectation that they are talking the same language as their partners whose data are also being aggregated in the same federated database.
It seems to me that this "enforcement of orthodoxy" may be very much at odds with the free-wheeling spirit of the Semantic Web community where Anybody Can Say Anything About Anything. But when I look over those old TAG roadmaps, I see little having to do with clever Semantic inferencing. I see a lot about providers and consumers understanding what each other are talking about. To some extent, Darwin Core can provide most of the necessary commonality between providers and consumers. There were (in our opinion) three areas where it could not. One was the lack of a class to link repeated sampling events and determinations (dwc:IndividualOrganism or TaxonomicallyHomogeneousEntity if you prefer) and another was a class that allowed for the separation of evidence from the Occurrence documented by it (called by us the dsw:Token class). The other area was the dwc:Taxon class which did not seem clear enough in its definition nor to possess enough complexity to express the kinds of relationships commonly discussed on this list. dwc:Taxon needs to be "fixed" before it is Ready For Prime Time (i.e. usable in rdf:type declarations)
So I guess having read the various responses to my query and thinking about the history of the TDWG Ontology, I would say that it may not really be important how dwc:Taxon could be tied to tc:Taxon because the two classes probably don't need to be tied together anyway. As it currently stands, dwc:Taxon (outside of DSW) has no semantic meaning other than what people want to believe that it means because it's not tied to any other classes by object properties of its instances. The mish-mash of terms describing names and taxa listed under dwc:Taxon add to the confusion - since the DwC vocabulary purposefully does not declare domains for the terms listed under a class they really could be used as properties for an instance of any class anywhere. In contract, tc:Taxon does have properties that are described clearly in the TDWG Ontology. The only reason that we declared the two classes to be equivalent was to signal that we felt that some of the DwC terms listed under dwc:Taxon in the DwC vocabulary could be used as data properties for the things in the tc:Taxon class that people like Paul were describing with properties from the TDWG Ontology. Tying them together doesn't (at the moment) mess up anything that anybody is doing with dwc:Taxon because (outside of DSW) there isn't anything to actually DO with dwc:Taxon in RDF. However, the point is well taken that if someone in the future did decide to define properties specifically intended for use with dwc:Taxon, those properties would be hopelessly tangled with tc:Taxon properties.
It seems to me like the real road forward (if one believes as I do that DwC is the only practical alternative to use for typing GUIDs) would be to: 1. decide that the TDWG Ontology in its dead form adequately describes taxa, names, and their properties (use it as-is). OR 2. decide that although the TDWG Ontology doesn't do everything that people want it to do at the present time, it could resurrected and modified to do what people want (use it and hope for the future). OR 3. decide to just create the additional classes, e.g. dwc:Name (or dsw:Name if you prefer not to adulterate the "pure" Darwin Core), and object properties for dwc:Taxon and dwc:Name that are needed to get the job done (i.e. just dump the TDWG Ontology as unfixable and make up new stuff).
In any of these three alternatives, there isn't actually any reason to tie the two classes together that I can see. Of these three, I think the third option would probably be preferable, although it might put Paul (and any others currently using the TDWG Ontology to describe Taxon instances) in the unpleasant position of having to redo their RDF.
Steve
Paul Murray wrote:
On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
Paul
I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y). And this is built into standard semantic web reasoners - which is a bonus. But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar). Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
(another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans.
Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties. At least ... I think it is. I should write a test case. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content .
I wonder if it would be a good idea to have a session (hackathon?) at this years TDWG meeting to look at / prove / experiment with, the various ways of working with semantic web data and ontologies we discuss here?
This would soon show any benefits/disadvantages etc of the various approaches.
Is anyone lined up / keen to promote such a session?
Kevin
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: Tuesday, 10 May 2011 8:54 a.m. To: Paul Murray Cc: tdwg-content@lists.tdwg.org List Subject: Re: [tdwg-content] Heretics and illuminati, oh my! [SEC=UNCLASSIFIED]
Being either fearless or a fool (is there actually a difference?), I shall tread into this subject area at which I am a mere novice. So be kind...
I think that there may be several "solutions" to this problem. The one that is "correct" probably depends on what one is trying to accomplish. So I will try to describe in the most succinct way what Cam and I were trying to accomplish with DSW, and how that fits in with this thread. Cam and I basically wanted to do two things: 1. Make it possible to use GUIDs RIGHT NOW (not five years from now). 2. Create an extremely stripped down ontology that would be non-controversial enough that people might actually use it, but which wouldn't do anything so bad that it would inhibit future development in the Semantic Web context (i.e. it could be extended in the future by clever people to do clever Semantic Web stuff).
Amazingly, the GUID Applicability Statement has achieved the status of Standard-hood! (http://www.tdwg.org/standards/150/) Hooray! I sort of missed the announcement, but ran across the fact the other day when I was surfing through the TDWG website. Since the GUID A.S. is now a TDWG Standard, I would say it would now officially be a best-practice to follow it. In particular, Recommendation 11 states "Objects in the biodiversity domain that are identified by a GUID should be typed using the TDWG ontology or other well-known vocabularies in accordance with the TDWG common architecture." This is somewhat problematic, given that the TDWG ontology (with the possible exception of the Taxon/TaxonConcept part) is effectively ("socially"?) deprecated. What is the alternative "other well-known vocabulary"? There is none, at least none having any kind of official status with TDWG.
I recently discovered (or maybe re-discovered) the Technical Architecture Group (TAG) Technical Roadmaps from 2006-2008: http://www.tdwg.org/uploads/media/TAG_Roadmap_01.doc http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2007_final.pdf http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2008.pdf I might have seen them before, but if so it was at the point where I was really not knowledgeable enough to comprehend them. I found it very instructive to read about what the TAG had in mind when it set out to create the TDWG Ontology. In particular (from the 2007 Roadmap): "From the point of view of exchanging data - such as in the federation of a number of natural history collections - there is no need for a standards architecture. The federation is a closed system where a single exchange format can be agreed on. ... This model has worked well in the past but it does not meet the primary use case that is emerging. Biodiversity research is typically carried out by combining data of different kinds from multiple sources. The providers of data do not know who will use their data or how it will be combined with data from other sources. The consumer needs some level of commonality across all the data received so that it can be combined for analysis without the need to write computer software for every new combination." [This brings to mind the very "different kinds" of resources Cam is documenting in Borneo and the "multiple sources" that will be handling the metadata once those resources are sent off to herbaria, labs, and arboreta.]
and from the 2008 Roadmap: "If GUIDs are used to uniquely identify 'pieces' of data we need to have a shared understanding of what we mean by a 'piece of data' i.e. what kind of thing is it that a particular id applies to, a specimen, a person, an observation, a complete data set. We also need to have a shared understanding of at least some of the properties we use to describe these things."
Having been barely aware of TDWG's existence in 2008, I am blissfully ignorant of whatever disagreements may have occurred regarding LSIDs, reification, or whatever, and really don't want to know about them. All I can say as an outside observer is that it appears that the failure of the initial efforts to get GUIDs and the TDWG Ontology off the ground was because the system envisioned was too complicated to maintain, too complicated to gain a consensus, and to complicated to actually explain to anybody. Now that GUIDs seem to have a new lease on life, it seems like the greatest chance of successfully implementing them is to start by keeping things absolutely as simple as possible. To Cam and me, Darwin Core seemed to be the only candidate for something relatively simple and relatively universally accepted on which one could base an ontology that could be used to type GUIDs and to use to express "a shared understanding of at least some of the properties used to describe" biodiversity resources. Although I was somewhat skeptical that there was a "community consensus" about what the DwC classes meant and how they were related to each other, the exhaustive discussion on this list in Oct/Nov convinced me that maybe there WAS a consensus, or at least enough of a consensus to move forward. Although some people may at the present time be interested in figuring out how to do things like "define 'Fish' as an owl class as well as as a Taxon object", I would assert that is outside the core mission of TDWG as stated: to "develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms evidenced by the historical record". It is fun to talk about, but to me not the primary consideration in designing a community data exchange model. This outlook explains to some extent why I asked questions about the complexity of taxonconcept.org and its orientation toward facilitating semantic queries. There is nothing wrong with that, but it doesn't seem to be the direction that TDWG has said it wants to go. Perhaps when we have "gotten there" (i.e. have a functioning system using GUIDs for clearly typed resources), we might want to embark further down the road to the Semantic Web.
Aside from just importing the DwC classes into the DSW ontology and connecting them with object properties, Cam and I did a little nasty thing with them. It has been said that declaring ranges and domains for terms doesn't prevent people from using the terms to express relationships among the "wrong" types of things. Rather, it simply asserts that those things are instances of the classes used in the range and domain declarations for the term. That is sort of true, but by declaring many of the core DwC classes to be disjoint, we actually ARE preventing people from using the wrong object properties with instances of the wrong classes. If Joe Curator rdf:type's a determination as a dwc:Identification, but then uses dsw:atEvent (which has the domain dwc:Occurrence) as a property of the determination, then a reasoner will infer that the determination is a type dwc:Occurrence as well as the explicitly declared type dwc:Identification. Because dwc:Identification and dwc:Occurrence are disjoint classes, the reasoner will have a fit. Cam and I are being Naughty (sensu Bob Morris) because we are inhibiting the extensibility of dsw:atEvent, but Joe Curator is being Naughty (sensu Baskauf and Webb) because Cam and I believe in the statement from the 2007 Roadmap: "The consumer needs some level of commonality across all the data received...". Joe Curator is not being consistent with the "shared understanding of at least some of the properties" to the extent that DSW reflects the "shared understanding" of the TDWG community. We are basically trying to enforce a sort of orthodoxy on the use of DwC classes as rdf:types and on the connections between the dwc:classes so that people can have some reasonable expectation that they are talking the same language as their partners whose data are also being aggregated in the same federated database.
It seems to me that this "enforcement of orthodoxy" may be very much at odds with the free-wheeling spirit of the Semantic Web community where Anybody Can Say Anything About Anything. But when I look over those old TAG roadmaps, I see little having to do with clever Semantic inferencing. I see a lot about providers and consumers understanding what each other are talking about. To some extent, Darwin Core can provide most of the necessary commonality between providers and consumers. There were (in our opinion) three areas where it could not. One was the lack of a class to link repeated sampling events and determinations (dwc:IndividualOrganism or TaxonomicallyHomogeneousEntity if you prefer) and another was a class that allowed for the separation of evidence from the Occurrence documented by it (called by us the dsw:Token class). The other area was the dwc:Taxon class which did not seem clear enough in its definition nor to possess enough complexity to express the kinds of relationships commonly discussed on this list. dwc:Taxon needs to be "fixed" before it is Ready For Prime Time (i.e. usable in rdf:type declarations)
So I guess having read the various responses to my query and thinking about the history of the TDWG Ontology, I would say that it may not really be important how dwc:Taxon could be tied to tc:Taxon because the two classes probably don't need to be tied together anyway. As it currently stands, dwc:Taxon (outside of DSW) has no semantic meaning other than what people want to believe that it means because it's not tied to any other classes by object properties of its instances. The mish-mash of terms describing names and taxa listed under dwc:Taxon add to the confusion - since the DwC vocabulary purposefully does not declare domains for the terms listed under a class they really could be used as properties for an instance of any class anywhere. In contract, tc:Taxon does have properties that are described clearly in the TDWG Ontology. The only reason that we declared the two classes to be equivalent was to signal that we felt that some of the DwC terms listed under dwc:Taxon in the DwC vocabulary could be used as data properties for the things in the tc:Taxon class that people like Paul were describing with properties from the TDWG Ontology. Tying them together doesn't (at the moment) mess up anything that anybody is doing with dwc:Taxon because (outside of DSW) there isn't anything to actually DO with dwc:Taxon in RDF. However, the point is well taken that if someone in the future did decide to define properties specifically intended for use with dwc:Taxon, those properties would be hopelessly tangled with tc:Taxon properties.
It seems to me like the real road forward (if one believes as I do that DwC is the only practical alternative to use for typing GUIDs) would be to: 1. decide that the TDWG Ontology in its dead form adequately describes taxa, names, and their properties (use it as-is). OR 2. decide that although the TDWG Ontology doesn't do everything that people want it to do at the present time, it could resurrected and modified to do what people want (use it and hope for the future). OR 3. decide to just create the additional classes, e.g. dwc:Name (or dsw:Name if you prefer not to adulterate the "pure" Darwin Core), and object properties for dwc:Taxon and dwc:Name that are needed to get the job done (i.e. just dump the TDWG Ontology as unfixable and make up new stuff).
In any of these three alternatives, there isn't actually any reason to tie the two classes together that I can see. Of these three, I think the third option would probably be preferable, although it might put Paul (and any others currently using the TDWG Ontology to describe Taxon instances) in the unpleasant position of having to redo their RDF.
Steve
Paul Murray wrote:
On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
Paul
I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y).
And this is built into standard semantic web reasoners - which is a bonus.
But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar).
Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
(another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans.
Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties. At least ... I think it is. I should write a test case.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
_______________________________________________
tdwg-content mailing list
tdwg-content@lists.tdwg.orgmailto:tdwg-content@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-content
.
--
Steven J. Baskauf, Ph.D., Senior Lecturer
Vanderbilt University Dept. of Biological Sciences
postal mail address:
VU Station B 351634
Nashville, TN 37235-1634, U.S.A.
delivery address:
2125 Stevenson Center
1161 21st Ave., S.
Nashville, TN 37235
office: 2128 Stevenson Center
phone: (615) 343-4582, fax: (615) 343-6707
http://bioimages.vanderbilt.edu
________________________________ Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
Kevin,
I'm not too keen on the hack-a-thon idea. It will in all probability just consume important contact time with activity that is best integrated into real projects and reported in a forum such as this or at the TDWG meeting. Much in the way that Steve and Cam and Pete and Paul are doing now.
We (you and I ) have drafted a proposal to put to TDWG executive for a 2-3 day workshop prior to this years meeting to establish context for these issues within the framework of a TDWG Technical Architecture. Do we need to evolve a TDWG level understanding of the requirement for semantic interoperability within our standards space? Would it be useful to spend time and effort to formally model the TDWG domain? Is there a role for TAG? Can we improve the process?
Hopefully, we can find an opportunity to get a small group together between now and then do a little planning around agenda, background requirements, preparation workload and specialist inputs.
greg
On Tue, 2011-05-10 at 07:26, Kevin Richards wrote:
I wonder if it would be a good idea to have a session (hackathon?) at this years TDWG meeting to look at / prove / experiment with, the various ways of working with semantic web data and ontologies we discuss here?
This would soon show any benefits/disadvantages etc of the various approaches.
Is anyone lined up / keen to promote such a session?
Kevin
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: Tuesday, 10 May 2011 8:54 a.m. To: Paul Murray Cc: tdwg-content@lists.tdwg.org List Subject: Re: [tdwg-content] Heretics and illuminati, oh my! [SEC=UNCLASSIFIED]
Being either fearless or a fool (is there actually a difference?), I shall tread into this subject area at which I am a mere novice. So be kind...
I think that there may be several "solutions" to this problem. The one that is "correct" probably depends on what one is trying to accomplish. So I will try to describe in the most succinct way what Cam and I were trying to accomplish with DSW, and how that fits in with this thread. Cam and I basically wanted to do two things:
- Make it possible to use GUIDs RIGHT NOW (not five years from now).
- Create an extremely stripped down ontology that would be
non-controversial enough that people might actually use it, but which wouldn't do anything so bad that it would inhibit future development in the Semantic Web context (i.e. it could be extended in the future by clever people to do clever Semantic Web stuff).
Amazingly, the GUID Applicability Statement has achieved the status of Standard-hood! (http://www.tdwg.org/standards/150/) Hooray! I sort of missed the announcement, but ran across the fact the other day when I was surfing through the TDWG website. Since the GUID A.S. is now a TDWG Standard, I would say it would now officially be a best-practice to follow it. In particular, Recommendation 11 states "Objects in the biodiversity domain that are identified by a GUID should be typed using the TDWG ontology or other well-known vocabularies in accordance with the TDWG common architecture." This is somewhat problematic, given that the TDWG ontology (with the possible exception of the Taxon/TaxonConcept part) is effectively ("socially"?) deprecated. What is the alternative "other well-known vocabulary"? There is none, at least none having any kind of official status with TDWG.
I recently discovered (or maybe re-discovered) the Technical Architecture Group (TAG) Technical Roadmaps from 2006-2008: http://www.tdwg.org/uploads/media/TAG_Roadmap_01.doc http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2007_final.pdf http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2008.pdf I might have seen them before, but if so it was at the point where I was really not knowledgeable enough to comprehend them. I found it very instructive to read about what the TAG had in mind when it set out to create the TDWG Ontology. In particular (from the 2007 Roadmap): "From the point of view of exchanging data - such as in the federation of a number of natural history collections - there is no need for a standards architecture. The federation is a closed system where a single exchange format can be agreed on. ... This model has worked well in the past but it does not meet the primary use case that is emerging. Biodiversity research is typically carried out by combining data of different kinds from multiple sources. The providers of data do not know who will use their data or how it will be combined with data from other sources. The consumer needs some level of commonality across all the data received so that it can be combined for analysis without the need to write computer software for every new combination." [This brings to mind the very "different kinds" of resources Cam is documenting in Borneo and the "multiple sources" that will be handling the metadata once those resources are sent off to herbaria, labs, and arboreta.]
and from the 2008 Roadmap: "If GUIDs are used to uniquely identify 'pieces' of data we need to have a shared understanding of what we mean by a 'piece of data' i.e. what kind of thing is it that a particular id applies to, a specimen, a person, an observation, a complete data set. We also need to have a shared understanding of at least some of the properties we use to describe these things."
Having been barely aware of TDWG's existence in 2008, I am blissfully ignorant of whatever disagreements may have occurred regarding LSIDs, reification, or whatever, and really don't want to know about them. All I can say as an outside observer is that it appears that the failure of the initial efforts to get GUIDs and the TDWG Ontology off the ground was because the system envisioned was too complicated to maintain, too complicated to gain a consensus, and to complicated to actually explain to anybody. Now that GUIDs seem to have a new lease on life, it seems like the greatest chance of successfully implementing them is to start by keeping things absolutely as simple as possible. To Cam and me, Darwin Core seemed to be the only candidate for something relatively simple and relatively universally accepted on which one could base an ontology that could be used to type GUIDs and to use to express "a shared understanding of at least some of the properties used to describe" biodiversity resources. Although I was somewhat skeptical that there was a "community consensus" about what the DwC classes meant and how they were related to each other, the exhaustive discussion on this list in Oct/Nov convinced me that maybe there WAS a consensus, or at least enough of a consensus to move forward. Although some people may at the present time be interested in figuring out how to do things like "define 'Fish' as an owl class as well as as a Taxon object", I would assert that is outside the core mission of TDWG as stated: to "develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms evidenced by the historical record". It is fun to talk about, but to me not the primary consideration in designing a community data exchange model. This outlook explains to some extent why I asked questions about the complexity of taxonconcept.org and its orientation toward facilitating semantic queries. There is nothing wrong with that, but it doesn't seem to be the direction that TDWG has said it wants to go. Perhaps when we have "gotten there" (i.e. have a functioning system using GUIDs for clearly typed resources), we might want to embark further down the road to the Semantic Web.
Aside from just importing the DwC classes into the DSW ontology and connecting them with object properties, Cam and I did a little nasty thing with them. It has been said that declaring ranges and domains for terms doesn't prevent people from using the terms to express relationships among the "wrong" types of things. Rather, it simply asserts that those things are instances of the classes used in the range and domain declarations for the term. That is sort of true, but by declaring many of the core DwC classes to be disjoint, we actually ARE preventing people from using the wrong object properties with instances of the wrong classes. If Joe Curator rdf:type's a determination as a dwc:Identification, but then uses dsw:atEvent (which has the domain dwc:Occurrence) as a property of the determination, then a reasoner will infer that the determination is a type dwc:Occurrence as well as the explicitly declared type dwc:Identification. Because dwc:Identification and dwc:Occurrence are disjoint classes, the reasoner will have a fit. Cam and I are being Naughty (sensu Bob Morris) because we are inhibiting the extensibility of dsw:atEvent, but Joe Curator is being Naughty (sensu Baskauf and Webb) because Cam and I believe in the statement from the 2007 Roadmap: "The consumer needs some level of commonality across all the data received...". Joe Curator is not being consistent with the "shared understanding of at least some of the properties" to the extent that DSW reflects the "shared understanding" of the TDWG community. We are basically trying to enforce a sort of orthodoxy on the use of DwC classes as rdf:types and on the connections between the dwc:classes so that people can have some reasonable expectation that they are talking the same language as their partners whose data are also being aggregated in the same federated database.
It seems to me that this "enforcement of orthodoxy" may be very much at odds with the free-wheeling spirit of the Semantic Web community where Anybody Can Say Anything About Anything. But when I look over those old TAG roadmaps, I see little having to do with clever Semantic inferencing. I see a lot about providers and consumers understanding what each other are talking about. To some extent, Darwin Core can provide most of the necessary commonality between providers and consumers. There were (in our opinion) three areas where it could not. One was the lack of a class to link repeated sampling events and determinations (dwc:IndividualOrganism or TaxonomicallyHomogeneousEntity if you prefer) and another was a class that allowed for the separation of evidence from the Occurrence documented by it (called by us the dsw:Token class). The other area was the dwc:Taxon class which did not seem clear enough in its definition nor to possess enough complexity to express the kinds of relationships commonly discussed on this list. dwc:Taxon needs to be "fixed" before it is Ready For Prime Time (i.e. usable in rdf:type declarations)
So I guess having read the various responses to my query and thinking about the history of the TDWG Ontology, I would say that it may not really be important how dwc:Taxon could be tied to tc:Taxon because the two classes probably don't need to be tied together anyway. As it currently stands, dwc:Taxon (outside of DSW) has no semantic meaning other than what people want to believe that it means because it's not tied to any other classes by object properties of its instances. The mish-mash of terms describing names and taxa listed under dwc:Taxon add to the confusion - since the DwC vocabulary purposefully does not declare domains for the terms listed under a class they really could be used as properties for an instance of any class anywhere. In contract, tc:Taxon does have properties that are described clearly in the TDWG Ontology. The only reason that we declared the two classes to be equivalent was to signal that we felt that some of the DwC terms listed under dwc:Taxon in the DwC vocabulary could be used as data properties for the things in the tc:Taxon class that people like Paul were describing with properties from the TDWG Ontology. Tying them together doesn't (at the moment) mess up anything that anybody is doing with dwc:Taxon because (outside of DSW) there isn't anything to actually DO with dwc:Taxon in RDF. However, the point is well taken that if someone in the future did decide to define properties specifically intended for use with dwc:Taxon, those properties would be hopelessly tangled with tc:Taxon properties.
It seems to me like the real road forward (if one believes as I do that DwC is the only practical alternative to use for typing GUIDs) would be to:
- decide that the TDWG Ontology in its dead form adequately describes
taxa, names, and their properties (use it as-is). OR 2. decide that although the TDWG Ontology doesn't do everything that people want it to do at the present time, it could resurrected and modified to do what people want (use it and hope for the future). OR 3. decide to just create the additional classes, e.g. dwc:Name (or dsw:Name if you prefer not to adulterate the "pure" Darwin Core), and object properties for dwc:Taxon and dwc:Name that are needed to get the job done (i.e. just dump the TDWG Ontology as unfixable and make up new stuff).
In any of these three alternatives, there isn't actually any reason to tie the two classes together that I can see. Of these three, I think the third option would probably be preferable, although it might put Paul (and any others currently using the TDWG Ontology to describe Taxon instances) in the unpleasant position of having to redo their RDF.
Steve
Paul Murray wrote:
On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
Paul I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y). And this is built into standard semantic web reasoners - which is a bonus. But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar). Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
(another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans.
Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties. At least ... I think it is. I should write a test case. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content .
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On Tue, 10 May 2011, Greg Whitbread wrote:
Kevin,
I'm not too keen on the hack-a-thon idea. It will in all probability just consume important contact time with activity that is best integrated into real projects and reported in a forum such as this or at the TDWG meeting. Much in the way that Steve and Cam and Pete and Paul are doing now.
Hi Greg,
The conversation that's going on now is useful. But something is missing, namely reference to the use cases and competency questions that we sometimes talk about generating. Hackathons focus the mind (I've seen it happen), and a DwC-RDF VoCamp/hackathon would be an opportunity to move forward on stuff that keeps getting deferred. Ideally, we'd define the competency questions in advance, and spend the hackathon addressing them. But, if we all turn out to be lazy/indifferent/"busy", and come to New Orleans without use cases, then spending cloister time working them out would, IMHO, be cloister time well spent.
There are any number of possible approaches we could take. We could have an integration competition; we could rely on digitization efforts to supply data; we could tie the hackathon into observations from a field event; we could explore how the work of the observation and annotation task groups interact with DwC in a semantic web context; we can do anything we want!
I encourage anyone with a vision of how such an event might unfold to share it, on or off list, so that the program committee can discuss it. (And it would be great, as Kevin suggested, if someone steps forward as a spearhead.)
Joel.
We (you and I ) have drafted a proposal to put to TDWG executive for a 2-3 day workshop prior to this years meeting to establish context for these issues within the framework of a TDWG Technical Architecture. Do we need to evolve a TDWG level understanding of the requirement for semantic interoperability within our standards space? Would it be useful to spend time and effort to formally model the TDWG domain? Is there a role for TAG? Can we improve the process?
Hopefully, we can find an opportunity to get a small group together between now and then do a little planning around agenda, background requirements, preparation workload and specialist inputs.
greg
On Tue, 2011-05-10 at 07:26, Kevin Richards wrote:
I wonder if it would be a good idea to have a session (hackathon?) at this years TDWG meeting to look at / prove / experiment with, the various ways of working with semantic web data and ontologies we discuss here?
This would soon show any benefits/disadvantages etc of the various approaches.
Is anyone lined up / keen to promote such a session?
Kevin
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: Tuesday, 10 May 2011 8:54 a.m. To: Paul Murray Cc: tdwg-content@lists.tdwg.org List Subject: Re: [tdwg-content] Heretics and illuminati, oh my! [SEC=UNCLASSIFIED]
Being either fearless or a fool (is there actually a difference?), I shall tread into this subject area at which I am a mere novice. So be kind...
I think that there may be several "solutions" to this problem. The one that is "correct" probably depends on what one is trying to accomplish. So I will try to describe in the most succinct way what Cam and I were trying to accomplish with DSW, and how that fits in with this thread. Cam and I basically wanted to do two things:
- Make it possible to use GUIDs RIGHT NOW (not five years from now).
- Create an extremely stripped down ontology that would be
non-controversial enough that people might actually use it, but which wouldn't do anything so bad that it would inhibit future development in the Semantic Web context (i.e. it could be extended in the future by clever people to do clever Semantic Web stuff).
Amazingly, the GUID Applicability Statement has achieved the status of Standard-hood! (http://www.tdwg.org/standards/150/) Hooray! I sort of missed the announcement, but ran across the fact the other day when I was surfing through the TDWG website. Since the GUID A.S. is now a TDWG Standard, I would say it would now officially be a best-practice to follow it. In particular, Recommendation 11 states "Objects in the biodiversity domain that are identified by a GUID should be typed using the TDWG ontology or other well-known vocabularies in accordance with the TDWG common architecture." This is somewhat problematic, given that the TDWG ontology (with the possible exception of the Taxon/TaxonConcept part) is effectively ("socially"?) deprecated. What is the alternative "other well-known vocabulary"? There is none, at least none having any kind of official status with TDWG.
I recently discovered (or maybe re-discovered) the Technical Architecture Group (TAG) Technical Roadmaps from 2006-2008: http://www.tdwg.org/uploads/media/TAG_Roadmap_01.doc http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2007_final.pdf http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2008.pdf I might have seen them before, but if so it was at the point where I was really not knowledgeable enough to comprehend them. I found it very instructive to read about what the TAG had in mind when it set out to create the TDWG Ontology. In particular (from the 2007 Roadmap): "From the point of view of exchanging data - such as in the federation of a number of natural history collections - there is no need for a standards architecture. The federation is a closed system where a single exchange format can be agreed on. ... This model has worked well in the past but it does not meet the primary use case that is emerging. Biodiversity research is typically carried out by combining data of different kinds from multiple sources. The providers of data do not know who will use their data or how it will be combined with data from other sources. The consumer needs some level of commonality across all the data received so that it can be combined for analysis without the need to write computer software for every new combination." [This brings to mind the very "different kinds" of resources Cam is documenting in Borneo and the "multiple sources" that will be handling the metadata once those resources are sent off to herbaria, labs, and arboreta.]
and from the 2008 Roadmap: "If GUIDs are used to uniquely identify 'pieces' of data we need to have a shared understanding of what we mean by a 'piece of data' i.e. what kind of thing is it that a particular id applies to, a specimen, a person, an observation, a complete data set. We also need to have a shared understanding of at least some of the properties we use to describe these things."
Having been barely aware of TDWG's existence in 2008, I am blissfully ignorant of whatever disagreements may have occurred regarding LSIDs, reification, or whatever, and really don't want to know about them. All I can say as an outside observer is that it appears that the failure of the initial efforts to get GUIDs and the TDWG Ontology off the ground was because the system envisioned was too complicated to maintain, too complicated to gain a consensus, and to complicated to actually explain to anybody. Now that GUIDs seem to have a new lease on life, it seems like the greatest chance of successfully implementing them is to start by keeping things absolutely as simple as possible. To Cam and me, Darwin Core seemed to be the only candidate for something relatively simple and relatively universally accepted on which one could base an ontology that could be used to type GUIDs and to use to express "a shared understanding of at least some of the properties used to describe" biodiversity resources. Although I was somewhat skeptical that there was a "community consensus" about what the DwC classes meant and how they were related to each other, the exhaustive discussion on this list in Oct/Nov convinced me that maybe there WAS a consensus, or at least enough of a consensus to move forward. Although some people may at the present time be interested in figuring out how to do things like "define 'Fish' as an owl class as well as as a Taxon object", I would assert that is outside the core mission of TDWG as stated: to "develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms evidenced by the historical record". It is fun to talk about, but to me not the primary consideration in designing a community data exchange model. This outlook explains to some extent why I asked questions about the complexity of taxonconcept.org and its orientation toward facilitating semantic queries. There is nothing wrong with that, but it doesn't seem to be the direction that TDWG has said it wants to go. Perhaps when we have "gotten there" (i.e. have a functioning system using GUIDs for clearly typed resources), we might want to embark further down the road to the Semantic Web.
Aside from just importing the DwC classes into the DSW ontology and connecting them with object properties, Cam and I did a little nasty thing with them. It has been said that declaring ranges and domains for terms doesn't prevent people from using the terms to express relationships among the "wrong" types of things. Rather, it simply asserts that those things are instances of the classes used in the range and domain declarations for the term. That is sort of true, but by declaring many of the core DwC classes to be disjoint, we actually ARE preventing people from using the wrong object properties with instances of the wrong classes. If Joe Curator rdf:type's a determination as a dwc:Identification, but then uses dsw:atEvent (which has the domain dwc:Occurrence) as a property of the determination, then a reasoner will infer that the determination is a type dwc:Occurrence as well as the explicitly declared type dwc:Identification. Because dwc:Identification and dwc:Occurrence are disjoint classes, the reasoner will have a fit. Cam and I are being Naughty (sensu Bob Morris) because we are inhibiting the extensibility of dsw:atEvent, but Joe Curator is being Naughty (sensu Baskauf and Webb) because Cam and I believe in the statement from the 2007 Roadmap: "The consumer needs some level of commonality across all the data received...". Joe Curator is not being consistent with the "shared understanding of at least some of the properties" to the extent that DSW reflects the "shared understanding" of the TDWG community. We are basically trying to enforce a sort of orthodoxy on the use of DwC classes as rdf:types and on the connections between the dwc:classes so that people can have some reasonable expectation that they are talking the same language as their partners whose data are also being aggregated in the same federated database.
It seems to me that this "enforcement of orthodoxy" may be very much at odds with the free-wheeling spirit of the Semantic Web community where Anybody Can Say Anything About Anything. But when I look over those old TAG roadmaps, I see little having to do with clever Semantic inferencing. I see a lot about providers and consumers understanding what each other are talking about. To some extent, Darwin Core can provide most of the necessary commonality between providers and consumers. There were (in our opinion) three areas where it could not. One was the lack of a class to link repeated sampling events and determinations (dwc:IndividualOrganism or TaxonomicallyHomogeneousEntity if you prefer) and another was a class that allowed for the separation of evidence from the Occurrence documented by it (called by us the dsw:Token class). The other area was the dwc:Taxon class which did not seem clear enough in its definition nor to possess enough complexity to express the kinds of relationships commonly discussed on this list. dwc:Taxon needs to be "fixed" before it is Ready For Prime Time (i.e. usable in rdf:type declarations)
So I guess having read the various responses to my query and thinking about the history of the TDWG Ontology, I would say that it may not really be important how dwc:Taxon could be tied to tc:Taxon because the two classes probably don't need to be tied together anyway. As it currently stands, dwc:Taxon (outside of DSW) has no semantic meaning other than what people want to believe that it means because it's not tied to any other classes by object properties of its instances. The mish-mash of terms describing names and taxa listed under dwc:Taxon add to the confusion - since the DwC vocabulary purposefully does not declare domains for the terms listed under a class they really could be used as properties for an instance of any class anywhere. In contract, tc:Taxon does have properties that are described clearly in the TDWG Ontology. The only reason that we declared the two classes to be equivalent was to signal that we felt that some of the DwC terms listed under dwc:Taxon in the DwC vocabulary could be used as data properties for the things in the tc:Taxon class that people like Paul were describing with properties from the TDWG Ontology. Tying them together doesn't (at the moment) mess up anything that anybody is doing with dwc:Taxon because (outside of DSW) there isn't anything to actually DO with dwc:Taxon in RDF. However, the point is well taken that if someone in the future did decide to define properties specifically intended for use with dwc:Taxon, those properties would be hopelessly tangled with tc:Taxon properties.
It seems to me like the real road forward (if one believes as I do that DwC is the only practical alternative to use for typing GUIDs) would be to:
- decide that the TDWG Ontology in its dead form adequately describes
taxa, names, and their properties (use it as-is). OR 2. decide that although the TDWG Ontology doesn't do everything that people want it to do at the present time, it could resurrected and modified to do what people want (use it and hope for the future). OR 3. decide to just create the additional classes, e.g. dwc:Name (or dsw:Name if you prefer not to adulterate the "pure" Darwin Core), and object properties for dwc:Taxon and dwc:Name that are needed to get the job done (i.e. just dump the TDWG Ontology as unfixable and make up new stuff).
In any of these three alternatives, there isn't actually any reason to tie the two classes together that I can see. Of these three, I think the third option would probably be preferable, although it might put Paul (and any others currently using the TDWG Ontology to describe Taxon instances) in the unpleasant position of having to redo their RDF.
Steve
Paul Murray wrote:
On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
Paul I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y). And this is built into standard semantic web reasoners - which is a bonus. But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar). Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
(another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans.
Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties. At least ... I think it is. I should write a test case. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content .
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
australian centre for plant bIodiversity research<------------------+ national greg whitBread voice: +61 2 62509 482 botanic Integrated Botanical Information System fax: +61 2 62509 599 gardens S........ I.T. happens.. ghw@anbg.gov.au +----------------------------------------->GPO Box 1777 Canberra 2601
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Joel,
Maybe we can do the hackathon virtually so that there is some progress by the time of the meeting.
A lot of this could be done via email and perhaps a videoconference or two.
- Pete
On Tue, May 10, 2011 at 8:06 AM, joel sachs jsachs@csee.umbc.edu wrote:
On Tue, 10 May 2011, Greg Whitbread wrote:
Kevin,
I'm not too keen on the hack-a-thon idea. It will in all probability just consume important contact time with activity that is best integrated into real projects and reported in a forum such as this or at the TDWG meeting. Much in the way that Steve and Cam and Pete and Paul are doing now.
Hi Greg,
The conversation that's going on now is useful. But something is missing, namely reference to the use cases and competency questions that we sometimes talk about generating. Hackathons focus the mind (I've seen it happen), and a DwC-RDF VoCamp/hackathon would be an opportunity to move forward on stuff that keeps getting deferred. Ideally, we'd define the competency questions in advance, and spend the hackathon addressing them. But, if we all turn out to be lazy/indifferent/"busy", and come to New Orleans without use cases, then spending cloister time working them out would, IMHO, be cloister time well spent.
There are any number of possible approaches we could take. We could have an integration competition; we could rely on digitization efforts to supply data; we could tie the hackathon into observations from a field event; we could explore how the work of the observation and annotation task groups interact with DwC in a semantic web context; we can do anything we want!
I encourage anyone with a vision of how such an event might unfold to share it, on or off list, so that the program committee can discuss it. (And it would be great, as Kevin suggested, if someone steps forward as a spearhead.)
Joel.
We (you and I ) have drafted a proposal to put to TDWG executive for a 2-3 day workshop prior to this years meeting to establish context for these issues within the framework of a TDWG Technical Architecture. Do we need to evolve a TDWG level understanding of the requirement for semantic interoperability within our standards space? Would it be useful to spend time and effort to formally model the TDWG domain? Is there a role for TAG? Can we improve the process?
Hopefully, we can find an opportunity to get a small group together between now and then do a little planning around agenda, background requirements, preparation workload and specialist inputs.
greg
On Tue, 2011-05-10 at 07:26, Kevin Richards wrote:
I wonder if it would be a good idea to have a session (hackathon?) at this years TDWG meeting to look at / prove / experiment with, the various ways of working with semantic web data and ontologies we discuss here?
This would soon show any benefits/disadvantages etc of the various approaches.
Is anyone lined up / keen to promote such a session?
Kevin
From: tdwg-content-bounces@lists.tdwg.org [mailto:tdwg-content-bounces@lists.tdwg.org] On Behalf Of Steve Baskauf Sent: Tuesday, 10 May 2011 8:54 a.m. To: Paul Murray Cc: tdwg-content@lists.tdwg.org List Subject: Re: [tdwg-content] Heretics and illuminati, oh my! [SEC=UNCLASSIFIED]
Being either fearless or a fool (is there actually a difference?), I shall tread into this subject area at which I am a mere novice. So be kind...
I think that there may be several "solutions" to this problem. The one that is "correct" probably depends on what one is trying to accomplish. So I will try to describe in the most succinct way what Cam and I were trying to accomplish with DSW, and how that fits in with this thread. Cam and I basically wanted to do two things:
- Make it possible to use GUIDs RIGHT NOW (not five years from now).
- Create an extremely stripped down ontology that would be
non-controversial enough that people might actually use it, but which wouldn't do anything so bad that it would inhibit future development in the Semantic Web context (i.e. it could be extended in the future by clever people to do clever Semantic Web stuff).
Amazingly, the GUID Applicability Statement has achieved the status of Standard-hood! (http://www.tdwg.org/standards/150/) Hooray! I sort of missed the announcement, but ran across the fact the other day when I was surfing through the TDWG website. Since the GUID A.S. is now a TDWG Standard, I would say it would now officially be a best-practice to follow it. In particular, Recommendation 11 states "Objects in the biodiversity domain that are identified by a GUID should be typed using the TDWG ontology or other well-known vocabularies in accordance with the TDWG common architecture." This is somewhat problematic, given that the TDWG ontology (with the possible exception of the Taxon/TaxonConcept part) is effectively ("socially"?) deprecated. What is the alternative "other well-known vocabulary"? There is none, at least none having any kind of official status with TDWG.
I recently discovered (or maybe re-discovered) the Technical Architecture Group (TAG) Technical Roadmaps from 2006-2008: http://www.tdwg.org/uploads/media/TAG_Roadmap_01.doc http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2007_final.pdf http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2008.pdf I might have seen them before, but if so it was at the point where I was really not knowledgeable enough to comprehend them. I found it very instructive to read about what the TAG had in mind when it set out to create the TDWG Ontology. In particular (from the 2007 Roadmap): "From the point of view of exchanging data - such as in the federation of a number of natural history collections - there is no need for a standards architecture. The federation is a closed system where a single exchange format can be agreed on. ... This model has worked well in the past but it does not meet the primary use case that is emerging. Biodiversity research is typically carried out by combining data of different kinds from multiple sources. The providers of data do not know who will use their data or how it will be combined with data from other sources. The consumer needs some level of commonality across all the data received so that it can be combined for analysis without the need to write computer software for every new combination." [This brings to mind the very "different kinds" of resources Cam is documenting in Borneo and the "multiple sources" that will be handling the metadata once those resources are sent off to herbaria, labs, and arboreta.]
and from the 2008 Roadmap: "If GUIDs are used to uniquely identify 'pieces' of data we need to have a shared understanding of what we mean by a 'piece of data' i.e. what kind of thing is it that a particular id applies to, a specimen, a person, an observation, a complete data set. We also need to have a shared understanding of at least some of the properties we use to describe these things."
Having been barely aware of TDWG's existence in 2008, I am blissfully ignorant of whatever disagreements may have occurred regarding LSIDs, reification, or whatever, and really don't want to know about them. All I can say as an outside observer is that it appears that the failure of the initial efforts to get GUIDs and the TDWG Ontology off the ground was because the system envisioned was too complicated to maintain, too complicated to gain a consensus, and to complicated to actually explain to anybody. Now that GUIDs seem to have a new lease on life, it seems like the greatest chance of successfully implementing them is to start by keeping things absolutely as simple as possible. To Cam and me, Darwin Core seemed to be the only candidate for something relatively simple and relatively universally accepted on which one could base an ontology that could be used to type GUIDs and to use to express "a shared understanding of at least some of the properties used to describe" biodiversity resources. Although I was somewhat skeptical that there was a "community consensus" about what the DwC classes meant and how they were related to each other, the exhaustive discussion on this list in Oct/Nov convinced me that maybe there WAS a consensus, or at least enough of a consensus to move forward. Although some people may at the present time be interested in figuring out how to do things like "define 'Fish' as an owl class as well as as a Taxon object", I would assert that is outside the core mission of TDWG as stated: to "develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms evidenced by the historical record". It is fun to talk about, but to me not the primary consideration in designing a community data exchange model. This outlook explains to some extent why I asked questions about the complexity of taxonconcept.org and its orientation toward facilitating semantic queries. There is nothing wrong with that, but it doesn't seem to be the direction that TDWG has said it wants to go. Perhaps when we have "gotten there" (i.e. have a functioning system using GUIDs for clearly typed resources), we might want to embark further down the road to the Semantic Web.
Aside from just importing the DwC classes into the DSW ontology and connecting them with object properties, Cam and I did a little nasty thing with them. It has been said that declaring ranges and domains for terms doesn't prevent people from using the terms to express relationships among the "wrong" types of things. Rather, it simply asserts that those things are instances of the classes used in the range and domain declarations for the term. That is sort of true, but by declaring many of the core DwC classes to be disjoint, we actually ARE preventing people from using the wrong object properties with instances of the wrong classes. If Joe Curator rdf:type's a determination as a dwc:Identification, but then uses dsw:atEvent (which has the domain dwc:Occurrence) as a property of the determination, then a reasoner will infer that the determination is a type dwc:Occurrence as well as the explicitly declared type dwc:Identification. Because dwc:Identification and dwc:Occurrence are disjoint classes, the reasoner will have a fit. Cam and I are being Naughty (sensu Bob Morris) because we are inhibiting the extensibility of dsw:atEvent, but Joe Curator is being Naughty (sensu Baskauf and Webb) because Cam and I believe in the statement from the 2007 Roadmap: "The consumer needs some level of commonality across all the data received...". Joe Curator is not being consistent with the "shared understanding of at least some of the properties" to the extent that DSW reflects the "shared understanding" of the TDWG community. We are basically trying to enforce a sort of orthodoxy on the use of DwC classes as rdf:types and on the connections between the dwc:classes so that people can have some reasonable expectation that they are talking the same language as their partners whose data are also being aggregated in the same federated database.
It seems to me that this "enforcement of orthodoxy" may be very much at odds with the free-wheeling spirit of the Semantic Web community where Anybody Can Say Anything About Anything. But when I look over those old TAG roadmaps, I see little having to do with clever Semantic inferencing. I see a lot about providers and consumers understanding what each other are talking about. To some extent, Darwin Core can provide most of the necessary commonality between providers and consumers. There were (in our opinion) three areas where it could not. One was the lack of a class to link repeated sampling events and determinations (dwc:IndividualOrganism or TaxonomicallyHomogeneousEntity if you prefer) and another was a class that allowed for the separation of evidence from the Occurrence documented by it (called by us the dsw:Token class). The other area was the dwc:Taxon class which did not seem clear enough in its definition nor to possess enough complexity to express the kinds of relationships commonly discussed on this list. dwc:Taxon needs to be "fixed" before it is Ready For Prime Time (i.e. usable in rdf:type declarations)
So I guess having read the various responses to my query and thinking about the history of the TDWG Ontology, I would say that it may not really be important how dwc:Taxon could be tied to tc:Taxon because the two classes probably don't need to be tied together anyway. As it currently stands, dwc:Taxon (outside of DSW) has no semantic meaning other than what people want to believe that it means because it's not tied to any other classes by object properties of its instances. The mish-mash of terms describing names and taxa listed under dwc:Taxon add to the confusion - since the DwC vocabulary purposefully does not declare domains for the terms listed under a class they really could be used as properties for an instance of any class anywhere. In contract, tc:Taxon does have properties that are described clearly in the TDWG Ontology. The only reason that we declared the two classes to be equivalent was to signal that we felt that some of the DwC terms listed under dwc:Taxon in the DwC vocabulary could be used as data properties for the things in the tc:Taxon class that people like Paul were describing with properties from the TDWG Ontology. Tying them together doesn't (at the moment) mess up anything that anybody is doing with dwc:Taxon because (outside of DSW) there isn't anything to actually DO with dwc:Taxon in RDF. However, the point is well taken that if someone in the future did decide to define properties specifically intended for use with dwc:Taxon, those properties would be hopelessly tangled with tc:Taxon properties.
It seems to me like the real road forward (if one believes as I do that DwC is the only practical alternative to use for typing GUIDs) would be to:
- decide that the TDWG Ontology in its dead form adequately describes
taxa, names, and their properties (use it as-is). OR 2. decide that although the TDWG Ontology doesn't do everything that people want it to do at the present time, it could resurrected and modified to do what people want (use it and hope for the future). OR 3. decide to just create the additional classes, e.g. dwc:Name (or dsw:Name if you prefer not to adulterate the "pure" Darwin Core), and object properties for dwc:Taxon and dwc:Name that are needed to get the job done (i.e. just dump the TDWG Ontology as unfixable and make up new stuff).
In any of these three alternatives, there isn't actually any reason to tie the two classes together that I can see. Of these three, I think the third option would probably be preferable, although it might put Paul (and any others currently using the TDWG Ontology to describe Taxon instances) in the unpleasant position of having to redo their RDF.
Steve
Paul Murray wrote:
On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
Paul I had the same thought (ie the x is of type dwc:Taxon, y is of
type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y).
And this is built into standard semantic web reasoners - which
is a bonus.
But this was debated (taking into account Bob Morris' issue)
with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar).
Do you think this is reasonable or are we just losing too much
semantic web benefits by not specifying the domain constraint?
A thing to watch out for is that in OWL DL, you cannot apply ordinary
data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
(another reason is that is allows you to put dc:comments and labels on
classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
This all means that dwc:isInCategory, if you want to apply it to
dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
But ... maybe that's ok. Maybe what is attempting to be done here only
ever needs to be understood by humans.
Now ... if what you are trying to do is to define "Fish" as an owl class
as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties. At least ... I think it is. I should write a test case.
If you have received this transmission in error please notify us
immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content .
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
australian centre for plant bIodiversity research<------------------+ national greg whitBread voice: +61 2 62509 482 botanic Integrated Botanical Information System fax: +61 2 62509 599 gardens S........ I.T. happens.. ghw@anbg.gov.au +----------------------------------------->GPO Box 1777 Canberra 2601
If you have received this transmission in error please notify us
immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Hi Steve,
This may surprise you but on the Linked Open Data Cloud TaxonConcept/GeoSpecies is a well known vocabulary.
I has been looked at very closely by many in that community and has been revised overtime.
Other data sets including EUNIS - the European Union Environmental Agency is in part based on it.
Among the LOD data sets TaxonConcept is one of the few biological datasets that correctly follows the standards.
What I don't understand is why your new vocabulary is seen as an official TDWG effort and mine is not.
Also it is not clear to me how you define a taxon. Are *Felis concolor* and *Puma concolor* the same thing or different things?
It is also not clear to me how it is determined what is "accepted" and "consensus" based on the few people that take the time to write to the list.
What do the lurkers think?
I think for the vast majority of data providers the current DarwinCore works for data submission.
What could happen in the future is that GBIF takes these records, cleans them and exposes some portion of their data in a a more semantic markup to the LOD cloud.
The goal of my species concepts are to create URI's for a species that have a RDF (machine interpretable) and HTML representation. (the HTML representation is likely to change in the future to something much better)
That URI can then be used as a GUID that is relatively stable despite changes in nomenclature and classification hierarchies.
This allows searches for occurrence records and other data that are tied to the thing most call *Puma concolor* but is also known by many other names.
The alternative is for someone to search under all the potential name variants (assuming that they know them all)
The example queries are to show that the system seems to work in the way people expect.
To do this correctly and allow the functionality that people seen to want it will need to be a little complex.
Respectfully,
- Pete
On Mon, May 9, 2011 at 3:53 PM, Steve Baskauf steve.baskauf@vanderbilt.eduwrote:
Being either fearless or a fool (is there actually a difference?), I shall tread into this subject area at which I am a mere novice. So be kind...
I think that there may be several "solutions" to this problem. The one that is "correct" probably depends on what one is trying to accomplish. So I will try to describe in the most succinct way what Cam and I were trying to accomplish with DSW, and how that fits in with this thread. Cam and I basically wanted to do two things:
- Make it possible to use GUIDs RIGHT NOW (not five years from now).
- Create an extremely stripped down ontology that would be
non-controversial enough that people might actually use it, but which wouldn't do anything so bad that it would inhibit future development in the Semantic Web context (i.e. it could be extended in the future by clever people to do clever Semantic Web stuff).
Amazingly, the GUID Applicability Statement has achieved the status of Standard-hood! (http://www.tdwg.org/standards/150/) Hooray! I sort of missed the announcement, but ran across the fact the other day when I was surfing through the TDWG website. Since the GUID A.S. is now a TDWG Standard, I would say it would now officially be a best-practice to follow it. In particular, Recommendation 11 states "Objects in the biodiversity domain that are identified by a GUID should be typed using the TDWG ontology or other well-known vocabularies in accordance with the TDWG common architecture." This is somewhat problematic, given that the TDWG ontology (with the possible exception of the Taxon/TaxonConcept part) is effectively ("socially"?) deprecated. What is the alternative "other well-known vocabulary"? There is none, at least none having any kind of official status with TDWG.
I recently discovered (or maybe re-discovered) the Technical Architecture Group (TAG) Technical Roadmaps from 2006-2008: http://www.tdwg.org/uploads/media/TAG_Roadmap_01.doc http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2007_final.pdf http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2008.pdf I might have seen them before, but if so it was at the point where I was really not knowledgeable enough to comprehend them. I found it very instructive to read about what the TAG had in mind when it set out to create the TDWG Ontology. In particular (from the 2007 Roadmap): "From the point of view of exchanging data - such as in the federation of a number of natural history collections - there is no need for a standards architecture. The federation is a closed system where a single exchange format can be agreed on. ... This model has worked well in the past but it does not meet the primary use case that is emerging. Biodiversity research is typically carried out by combining data of different kinds from multiple sources. The providers of data do not know who will use their data or how it will be combined with data from other sources. The consumer needs some level of commonality across all the data received so that it can be combined for analysis without the need to write computer software for every new combination." [This brings to mind the very "different kinds" of resources Cam is documenting in Borneo and the "multiple sources" that will be handling the metadata once those resources are sent off to herbaria, labs, and arboreta.]
and from the 2008 Roadmap: "If GUIDs are used to uniquely identify 'pieces' of data we need to have a shared understanding of what we mean by a 'piece of data' i.e. what kind of thing is it that a particular id applies to, a specimen, a person, an observation, a complete data set. We also need to have a shared understanding of at least some of the properties we use to describe these things."
Having been barely aware of TDWG's existence in 2008, I am blissfully ignorant of whatever disagreements may have occurred regarding LSIDs, reification, or whatever, and really don't want to know about them. All I can say as an outside observer is that it appears that the failure of the initial efforts to get GUIDs and the TDWG Ontology off the ground was because the system envisioned was too complicated to maintain, too complicated to gain a consensus, and to complicated to actually explain to anybody. Now that GUIDs seem to have a new lease on life, it seems like the greatest chance of successfully implementing them is to start by keeping things absolutely as simple as possible. To Cam and me, Darwin Core seemed to be the only candidate for something relatively simple and relatively universally accepted on which one could base an ontology that could be used to type GUIDs and to use to express "a shared understanding of at least some of the properties used to describe" biodiversity resources. Although I was somewhat skeptical that there was a "community consensus" about what the DwC classes meant and how they were related to each other, the exhaustive discussion on this list in Oct/Nov convinced me that maybe there WAS a consensus, or at least enough of a consensus to move forward. Although some people may at the present time be interested in figuring out how to do things like "define 'Fish' as an owl class as well as as a Taxon object", I would assert that is outside the core mission of TDWG as stated: to "develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms evidenced by the historical record". It is fun to talk about, but to me not the primary consideration in designing a community data exchange model. This outlook explains to some extent why I asked questions about the complexity of taxonconcept.org and its orientation toward facilitating semantic queries. There is nothing wrong with that, but it doesn't seem to be the direction that TDWG has said it wants to go. Perhaps when we have "gotten there" (i.e. have a functioning system using GUIDs for clearly typed resources), we might want to embark further down the road to the Semantic Web.
Aside from just importing the DwC classes into the DSW ontology and connecting them with object properties, Cam and I did a little nasty thing with them. It has been said that declaring ranges and domains for terms doesn't prevent people from using the terms to express relationships among the "wrong" types of things. Rather, it simply asserts that those things are instances of the classes used in the range and domain declarations for the term. That is sort of true, but by declaring many of the core DwC classes to be disjoint, we actually ARE preventing people from using the wrong object properties with instances of the wrong classes. If Joe Curator rdf:type's a determination as a dwc:Identification, but then uses dsw:atEvent (which has the domain dwc:Occurrence) as a property of the determination, then a reasoner will infer that the determination is a type dwc:Occurrence as well as the explicitly declared type dwc:Identification. Because dwc:Identification and dwc:Occurrence are disjoint classes, the reasoner will have a fit. Cam and I are being Naughty (sensu Bob Morris) because we are inhibiting the extensibility of dsw:atEvent, but Joe Curator is being Naughty (sensu Baskauf and Webb) because Cam and I believe in the statement from the 2007 Roadmap: "The consumer needs some level of commonality across all the data received...". Joe Curator is not being consistent with the "shared understanding of at least some of the properties" to the extent that DSW reflects the "shared understanding" of the TDWG community. We are basically trying to enforce a sort of orthodoxy on the use of DwC classes as rdf:types and on the connections between the dwc:classes so that people can have some reasonable expectation that they are talking the same language as their partners whose data are also being aggregated in the same federated database.
It seems to me that this "enforcement of orthodoxy" may be very much at odds with the free-wheeling spirit of the Semantic Web community where Anybody Can Say Anything About Anything. But when I look over those old TAG roadmaps, I see little having to do with clever Semantic inferencing. I see a lot about providers and consumers understanding what each other are talking about. To some extent, Darwin Core can provide most of the necessary commonality between providers and consumers. There were (in our opinion) three areas where it could not. One was the lack of a class to link repeated sampling events and determinations (dwc:IndividualOrganism or TaxonomicallyHomogeneousEntity if you prefer) and another was a class that allowed for the separation of evidence from the Occurrence documented by it (called by us the dsw:Token class). The other area was the dwc:Taxon class which did not seem clear enough in its definition nor to possess enough complexity to express the kinds of relationships commonly discussed on this list. dwc:Taxon needs to be "fixed" before it is Ready For Prime Time (i.e. usable in rdf:type declarations)
So I guess having read the various responses to my query and thinking about the history of the TDWG Ontology, I would say that it may not really be important how dwc:Taxon could be tied to tc:Taxon because the two classes probably don't need to be tied together anyway. As it currently stands, dwc:Taxon (outside of DSW) has no semantic meaning other than what people want to believe that it means because it's not tied to any other classes by object properties of its instances. The mish-mash of terms describing names and taxa listed under dwc:Taxon add to the confusion - since the DwC vocabulary purposefully does not declare domains for the terms listed under a class they really could be used as properties for an instance of any class anywhere. In contract, tc:Taxon does have properties that are described clearly in the TDWG Ontology. The only reason that we declared the two classes to be equivalent was to signal that we felt that some of the DwC terms listed under dwc:Taxon in the DwC vocabulary could be used as data properties for the things in the tc:Taxon class that people like Paul were describing with properties from the TDWG Ontology. Tying them together doesn't (at the moment) mess up anything that anybody is doing with dwc:Taxon because (outside of DSW) there isn't anything to actually DO with dwc:Taxon in RDF. However, the point is well taken that if someone in the future did decide to define properties specifically intended for use with dwc:Taxon, those properties would be hopelessly tangled with tc:Taxon properties.
It seems to me like the real road forward (if one believes as I do that DwC is the only practical alternative to use for typing GUIDs) would be to:
- decide that the TDWG Ontology in its dead form adequately describes
taxa, names, and their properties (use it as-is). OR 2. decide that although the TDWG Ontology doesn't do everything that people want it to do at the present time, it could resurrected and modified to do what people want (use it and hope for the future). OR 3. decide to just create the additional classes, e.g. dwc:Name (or dsw:Name if you prefer not to adulterate the "pure" Darwin Core), and object properties for dwc:Taxon and dwc:Name that are needed to get the job done (i.e. just dump the TDWG Ontology as unfixable and make up new stuff).
In any of these three alternatives, there isn't actually any reason to tie the two classes together that I can see. Of these three, I think the third option would probably be preferable, although it might put Paul (and any others currently using the TDWG Ontology to describe Taxon instances) in the unpleasant position of having to redo their RDF.
Steve
Paul Murray wrote:
On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
Paul
I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y). And this is built into standard semantic web reasoners - which is a bonus. But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar). Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
(another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans.
Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties. At least ... I think it is. I should write a test case. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Pete, Comments inline.
Peter DeVries wrote:
This may surprise you but on the Linked Open Data Cloud TaxonConcept/GeoSpecies is a well known vocabulary.
I has been looked at very closely by many in that community and has been revised overtime.
Other data sets including EUNIS - the European Union Environmental Agency is in part based on it.
I am not knowledgeable enough to dispute this.
Among the LOD data sets TaxonConcept is one of the few biological datasets that correctly follows the standards.
To which standards are you referring? W3C? Generic rules for using RDF and OWL? LOD best-practices? TDWG? If TDWG standards, which ones?
What I don't understand is why your new vocabulary is seen as an official TDWG effort and mine is not.
OK, I think that I said this before, but I'll say it again for clarity: DSW is NOT an official TDWG effort! It is a Cam Webb/Steve Baskauf effort. It is a functioning demonstration of a possible approach to describing biodiversity resources in RDF, just as taxonconcept.org is. However, it IS based on using terms from Darwin Core, which IS a ratified TDWG standard. It also (for the moment) incorporates the sections of the TDWG Ontology which create terms in RDF to describe TaxonConcept instances in accordance with another ratified TDWG standard: the TSC schema. So although DSW is NOT an official TDWG effort, it is composed of as many pieces of official TDWG efforts as we could find to build it from.
Also it is not clear to me how you define a taxon. Are /Felis concolor/ and /Puma concolor/ the same thing or different things?
I have already confessed long ago that Cam and I dodged this question by not defining the Taxon class ourselves. We used the relevant sections of the TDWG Ontology because it was based on a TDWG standard (TCS) and was consistent with other published references discussed on the page http://code.google.com/p/darwin-sw/wiki/ClassTaxon. Ask your question to Roger Hyam (who I think wrote most of the TDWG Ontology) and Jessie Kennedy (who along with Roger wrote the TCS standard).
It is also not clear to me how it is determined what is "accepted" and "consensus" based on the few people that take the time to write to the list.
Well, with regards to Darwin Core, we tried to pay particular attention to the comments of John Wieczorek and Markus Döring whose names appear on the DwC documentation, and to Rich Pyle who was apparently consulted about at least parts of DwC. I confess that they probably qualify as Illuminati, but they did write it. I would also note that in the documentation of DSW we tried to show how the structure of DSW class interrelationships was related to earlier models such as the ASC model (see http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels). Given that the ASC model has been around since 1992 and formed the basis for several later models that were in general use, I think that it has some status as "accepted". In my analysis of the many posts to the list over the past couple years (http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary), there was (in my opinion) general agreement about many of the more straightforward classes (such as Event and Identification). In the cases where there was not uniform agreement, we tried to note this in the DSW documentation and provide a rationale for why we chose between the alternatives viewpoints that were expressed. I will grant that the people who post to the list are probably a minority of the people who are receiving the posts. But that's all we have.
What do the lurkers think?
I think for the vast majority of data providers the current DarwinCore works for data submission.
What could happen in the future is that GBIF takes these records, cleans them and exposes some portion of their data in a a more semantic markup to the LOD cloud.
I tried to express in my last post why I felt that it was consistent with the past stated goals of TDWG to prioritize the needs of data submission over the needs of semantic interpretation when those two goals conflicted. I took a considerable amount of time to try to understand the taxonconcept.org ontology and in a previous message, to articulate the questions that I had about the structure of it. Those questions were framed around the basic problem of the tradeoff between constructing an ontology that made prioritized semantic querying (the subclassing cats by color approach) and constructing an ontology that prioritized simplicity in class structure (the single cat type with color properties approach). I don't think that you ever really responded to those specific questions/criticisms other than to say that your ontology was great and everybody should use it, and to demonstrate SPARQL queries with it.
Really, I can't speak for Cam, but from my perspective, I would guess that 5% or less of the time I spent working on DSW was involved in actually writing the RDF (that's excluding the time it took to learn how to use Protege and Subversion). At least 95% of the time or more was spent puzzling over emails and papers, writing annoying questions to the list to try to get people to talk about how they understood things, and then writing up the documentation with references. If you want people to accept the taxonconcept.org ontology as a means to markup metadata and type resources, then I would encourage you to write up some detailed documentation explaining how your approach is similar or different from earlier models (and if different, why). Also, some detailed explanations of what you intend the classes to represent would be helpful. For example, I had initially assumed that what you meant by "Occurrence" and "SpeciesIndividual" meant the same thing as we had described for "Occurrence" and "IndividualOrganism" in the DSW documentation. However, in your email responses it was clear to me that you meant something different (although I'm not sure what that was).
You suggested (if I'm understanding you correctly) in a previous email that you created taxonconcept.org so that its terms and classes could be used as a part of the TDWG infrastructure. If that is your intention, then you need to describe in a detailed document how the mapping from an existing data markup standard (i.e. DwC) to taxonconcept.org should be accomplised. As I pointed out in an earlier email, the structure of taxonconcept.org is rather complex ("reticulated") and potentially utilizes up to millions of classes, some of which have (to me) unclear connections to the DwC classes. The correspondence between DSW classes and DwC classes is generally not an issue because DSW classes ARE DwC classes (mostly).
The goal of my species concepts are to create URI's for a species that have a RDF (machine interpretable) and HTML representation. (the HTML representation is likely to change in the future to something much better)
That URI can then be used as a GUID that is relatively stable despite changes in nomenclature and classification hierarchies.
This allows searches for occurrence records and other data that are tied to the thing most call /Puma concolor/ but is also known by many other names.
The alternative is for someone to search under all the potential name variants (assuming that they know them all)
As I pointed out before, taxonconcept.org is built around taxa-related classes and as such it is good at doing the kinds of taxonomy-related queries that you describe above. However, the TDWG community also includes people who are less interested in taxonomy and more interested in things like tracking individual whales over space, connecting different types of evidence that is located in different institutions, collecting data on one organism over time, etc. That is why I questioned constructing a taxon-oriented ontology rather than an ontology containing a few general classes of things.
Steve
The example queries are to show that the system seems to work in the way people expect.
To do this correctly and allow the functionality that people seen to want it will need to be a little complex.
Respectfully,
- Pete
On Mon, May 9, 2011 at 3:53 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu mailto:steve.baskauf@vanderbilt.edu> wrote:
Being either fearless or a fool (is there actually a difference?), I shall tread into this subject area at which I am a mere novice. So be kind... I think that there may be several "solutions" to this problem. The one that is "correct" probably depends on what one is trying to accomplish. So I will try to describe in the most succinct way what Cam and I were trying to accomplish with DSW, and how that fits in with this thread. Cam and I basically wanted to do two things: 1. Make it possible to use GUIDs RIGHT NOW (not five years from now). 2. Create an extremely stripped down ontology that would be non-controversial enough that people might actually use it, but which wouldn't do anything so bad that it would inhibit future development in the Semantic Web context (i.e. it could be extended in the future by clever people to do clever Semantic Web stuff). Amazingly, the GUID Applicability Statement has achieved the status of Standard-hood! (http://www.tdwg.org/standards/150/) Hooray! I sort of missed the announcement, but ran across the fact the other day when I was surfing through the TDWG website. Since the GUID A.S. is now a TDWG Standard, I would say it would now officially be a best-practice to follow it. In particular, Recommendation 11 states "Objects in the biodiversity domain that are identified by a GUID should be typed using the TDWG ontology or other well-known vocabularies in accordance with the TDWG common architecture." This is somewhat problematic, given that the TDWG ontology (with the possible exception of the Taxon/TaxonConcept part) is effectively ("socially"?) deprecated. What is the alternative "other well-known vocabulary"? There is none, at least none having any kind of official status with TDWG. I recently discovered (or maybe re-discovered) the Technical Architecture Group (TAG) Technical Roadmaps from 2006-2008: http://www.tdwg.org/uploads/media/TAG_Roadmap_01.doc http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2007_final.pdf http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2008.pdf I might have seen them before, but if so it was at the point where I was really not knowledgeable enough to comprehend them. I found it very instructive to read about what the TAG had in mind when it set out to create the TDWG Ontology. In particular (from the 2007 Roadmap): "From the point of view of exchanging data - such as in the federation of a number of natural history collections - there is no need for a standards architecture. The federation is a closed system where a single exchange format can be agreed on. ... This model has worked well in the past but it does not meet the primary use case that is emerging. Biodiversity research is typically carried out by combining data of different kinds from multiple sources. The providers of data do not know who will use their data or how it will be combined with data from other sources. The consumer needs some level of commonality across all the data received so that it can be combined for analysis without the need to write computer software for every new combination." [This brings to mind the very "different kinds" of resources Cam is documenting in Borneo and the "multiple sources" that will be handling the metadata once those resources are sent off to herbaria, labs, and arboreta.] and from the 2008 Roadmap: "If GUIDs are used to uniquely identify 'pieces' of data we need to have a shared understanding of what we mean by a 'piece of data' i.e. what kind of thing is it that a particular id applies to, a specimen, a person, an observation, a complete data set. We also need to have a shared understanding of at least some of the properties we use to describe these things." Having been barely aware of TDWG's existence in 2008, I am blissfully ignorant of whatever disagreements may have occurred regarding LSIDs, reification, or whatever, and really don't want to know about them. All I can say as an outside observer is that it appears that the failure of the initial efforts to get GUIDs and the TDWG Ontology off the ground was because the system envisioned was too complicated to maintain, too complicated to gain a consensus, and to complicated to actually explain to anybody. Now that GUIDs seem to have a new lease on life, it seems like the greatest chance of successfully implementing them is to start by keeping things absolutely as simple as possible. To Cam and me, Darwin Core seemed to be the only candidate for something relatively simple and relatively universally accepted on which one could base an ontology that could be used to type GUIDs and to use to express "a shared understanding of at least some of the properties used to describe" biodiversity resources. Although I was somewhat skeptical that there was a "community consensus" about what the DwC classes meant and how they were related to each other, the exhaustive discussion on this list in Oct/Nov convinced me that maybe there WAS a consensus, or at least enough of a consensus to move forward. Although some people may at the present time be interested in figuring out how to do things like "define 'Fish' as an owl class as well as as a Taxon object", I would assert that is outside the core mission of TDWG as stated: to "develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms evidenced by the historical record". It is fun to talk about, but to me not the primary consideration in designing a community data exchange model. This outlook explains to some extent why I asked questions about the complexity of taxonconcept.org <http://taxonconcept.org> and its orientation toward facilitating semantic queries. There is nothing wrong with that, but it doesn't seem to be the direction that TDWG has said it wants to go. Perhaps when we have "gotten there" (i.e. have a functioning system using GUIDs for clearly typed resources), we might want to embark further down the road to the Semantic Web. Aside from just importing the DwC classes into the DSW ontology and connecting them with object properties, Cam and I did a little nasty thing with them. It has been said that declaring ranges and domains for terms doesn't prevent people from using the terms to express relationships among the "wrong" types of things. Rather, it simply asserts that those things are instances of the classes used in the range and domain declarations for the term. That is sort of true, but by declaring many of the core DwC classes to be disjoint, we actually ARE preventing people from using the wrong object properties with instances of the wrong classes. If Joe Curator rdf:type's a determination as a dwc:Identification, but then uses dsw:atEvent (which has the domain dwc:Occurrence) as a property of the determination, then a reasoner will infer that the determination is a type dwc:Occurrence as well as the explicitly declared type dwc:Identification. Because dwc:Identification and dwc:Occurrence are disjoint classes, the reasoner will have a fit. Cam and I are being Naughty (sensu Bob Morris) because we are inhibiting the extensibility of dsw:atEvent, but Joe Curator is being Naughty (sensu Baskauf and Webb) because Cam and I believe in the statement from the 2007 Roadmap: "The consumer needs some level of commonality across all the data received...". Joe Curator is not being consistent with the "shared understanding of at least some of the properties" to the extent that DSW reflects the "shared understanding" of the TDWG community. We are basically trying to enforce a sort of orthodoxy on the use of DwC classes as rdf:types and on the connections between the dwc:classes so that people can have some reasonable expectation that they are talking the same language as their partners whose data are also being aggregated in the same federated database. It seems to me that this "enforcement of orthodoxy" may be very much at odds with the free-wheeling spirit of the Semantic Web community where Anybody Can Say Anything About Anything. But when I look over those old TAG roadmaps, I see little having to do with clever Semantic inferencing. I see a lot about providers and consumers understanding what each other are talking about. To some extent, Darwin Core can provide most of the necessary commonality between providers and consumers. There were (in our opinion) three areas where it could not. One was the lack of a class to link repeated sampling events and determinations (dwc:IndividualOrganism or TaxonomicallyHomogeneousEntity if you prefer) and another was a class that allowed for the separation of evidence from the Occurrence documented by it (called by us the dsw:Token class). The other area was the dwc:Taxon class which did not seem clear enough in its definition nor to possess enough complexity to express the kinds of relationships commonly discussed on this list. dwc:Taxon needs to be "fixed" before it is Ready For Prime Time (i.e. usable in rdf:type declarations) So I guess having read the various responses to my query and thinking about the history of the TDWG Ontology, I would say that it may not really be important how dwc:Taxon could be tied to tc:Taxon because the two classes probably don't need to be tied together anyway. As it currently stands, dwc:Taxon (outside of DSW) has no semantic meaning other than what people want to believe that it means because it's not tied to any other classes by object properties of its instances. The mish-mash of terms describing names and taxa listed under dwc:Taxon add to the confusion - since the DwC vocabulary purposefully does not declare domains for the terms listed under a class they really could be used as properties for an instance of any class anywhere. In contract, tc:Taxon does have properties that are described clearly in the TDWG Ontology. The only reason that we declared the two classes to be equivalent was to signal that we felt that some of the DwC terms listed under dwc:Taxon in the DwC vocabulary could be used as data properties for the things in the tc:Taxon class that people like Paul were describing with properties from the TDWG Ontology. Tying them together doesn't (at the moment) mess up anything that anybody is doing with dwc:Taxon because (outside of DSW) there isn't anything to actually DO with dwc:Taxon in RDF. However, the point is well taken that if someone in the future did decide to define properties specifically intended for use with dwc:Taxon, those properties would be hopelessly tangled with tc:Taxon properties. It seems to me like the real road forward (if one believes as I do that DwC is the only practical alternative to use for typing GUIDs) would be to: 1. decide that the TDWG Ontology in its dead form adequately describes taxa, names, and their properties (use it as-is). OR 2. decide that although the TDWG Ontology doesn't do everything that people want it to do at the present time, it could resurrected and modified to do what people want (use it and hope for the future). OR 3. decide to just create the additional classes, e.g. dwc:Name (or dsw:Name if you prefer not to adulterate the "pure" Darwin Core), and object properties for dwc:Taxon and dwc:Name that are needed to get the job done (i.e. just dump the TDWG Ontology as unfixable and make up new stuff). In any of these three alternatives, there isn't actually any reason to tie the two classes together that I can see. Of these three, I think the third option would probably be preferable, although it might put Paul (and any others currently using the TDWG Ontology to describe Taxon instances) in the unpleasant position of having to redo their RDF. Steve Paul Murray wrote:
On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
Paul I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y). And this is built into standard semantic web reasoners - which is a bonus. But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar). Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true"). (another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on) This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself. But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans. Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties. At least ... I think it is. I should write a test case. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments. Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content .
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 343-6707 <tel:%28615%29%20343-6707> http://bioimages.vanderbilt.edu _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu mailto:pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecies http://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
Steve,
I have been explaining this since at least 2006.
I have made several proposals including the recent dwc_area.owl, which incorporates both geo and some estimate of extent/error.
In order to do this correctly you need to use URI's that correctly resolve following semantic web standards.
It is still not clear to me and probably others where the authoritative DWC vocabulary is.
I have made several trips to vist other groups and work some issues out and get these connected to related efforts like the GNI.
It was planned that Rich and I were going to try connecting these to his TNU's once he has gotten his head above water. (figuratively)
You did not ask about these specific issues until you had already completed your vocabulary.
In other words, you started this without really understanding what exists or existing semantic web practices.
As was mentioned earlier it is relatively easy to link via equivalent property and equivalent class.
Based on the examples from Rich and an earlier bird example from David Remsen I would say that the David's three birds and the Rich's two fish should get different concept URI's.
These concept URI's should be supported by photo's and specimens etc that give users some guidance as to what kinds of things are instances of that concept and what things are not instances of that concept.
If Rich choose the supporting individuals and references then he would be the editor of that concept, linked in a machine interpretable way to that concept.
Centropyge fisheri se:q72fd http://lod.taxonconcept.org/ses/q72fd#Species the html version for the concept could be this http://www.eol.org/pages/221698 Centropyge flavicauda se:Mj6j4 http://lod.taxonconcept.org/ses/Mj6j4#Species the html version for the concept could be this http://www.eol.org/pages/224676
What is missing from these examples are a curated set of specimens and photographs which are linked via their own URI's
Ideally you want try to keep these concepts from overlapping so when someone tags their specimens or other data it should be clear which is the most appropriate concept.
Now someone can review the candidates and make the assertion that the fish in their hand is an instance of this concept using a relatively stable URI that is simple to include in publications, is trackable and links to the editor Rich.
Merging these later would be relatively simple, splitting further is possible but more complicated.
For the fish example you might want to link to a different concept on the web that indicates (and makes findable) related concepts where the set of specimens are different.
This was the area of modeling that I was planning on working out with Rich.
What I propose also allows tracking of multiple individuals, parts of individuals and individuals over time.
I just have not marked up examples of these.
Note that the Plant List seems to do something like this but without URI's, representative photos or linked specimens. In addition it is not open or machine interpretable.
Respectfully,
- Pete
On Mon, May 9, 2011 at 8:10 PM, Steve Baskauf steve.baskauf@vanderbilt.eduwrote:
Pete, Comments inline.
Peter DeVries wrote:
This may surprise you but on the Linked Open Data Cloud TaxonConcept/GeoSpecies is a well known vocabulary.
I has been looked at very closely by many in that community and has been revised overtime.
Other data sets including EUNIS - the European Union Environmental Agency is in part based on it.
I am not knowledgeable enough to dispute this.
Among the LOD data sets TaxonConcept is one of the few biological datasets that correctly follows the standards.
To which standards are you referring? W3C? Generic rules for using RDF and OWL? LOD best-practices? TDWG? If TDWG standards, which ones?
What I don't understand is why your new vocabulary is seen as an official TDWG effort and mine is not.
OK, I think that I said this before, but I'll say it again for clarity: DSW is NOT an official TDWG effort! It is a Cam Webb/Steve Baskauf effort. It is a functioning demonstration of a possible approach to describing biodiversity resources in RDF, just as taxonconcept.org is. However, it IS based on using terms from Darwin Core, which IS a ratified TDWG standard. It also (for the moment) incorporates the sections of the TDWG Ontology which create terms in RDF to describe TaxonConcept instances in accordance with another ratified TDWG standard: the TSC schema. So although DSW is NOT an official TDWG effort, it is composed of as many pieces of official TDWG efforts as we could find to build it from.
Also it is not clear to me how you define a taxon. Are *Felis concolor*and *Puma concolor* the same thing or different things?
I have already confessed long ago that Cam and I dodged this question by not defining the Taxon class ourselves. We used the relevant sections of the TDWG Ontology because it was based on a TDWG standard (TCS) and was consistent with other published references discussed on the page http://code.google.com/p/darwin-sw/wiki/ClassTaxon. Ask your question to Roger Hyam (who I think wrote most of the TDWG Ontology) and Jessie Kennedy (who along with Roger wrote the TCS standard).
It is also not clear to me how it is determined what is "accepted" and "consensus" based on the few people that take the time to write to the list.
Well, with regards to Darwin Core, we tried to pay particular attention to the comments of John Wieczorek and Markus Döring whose names appear on the DwC documentation, and to Rich Pyle who was apparently consulted about at least parts of DwC. I confess that they probably qualify as Illuminati, but they did write it. I would also note that in the documentation of DSW we tried to show how the structure of DSW class interrelationships was related to earlier models such as the ASC model (see http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels). Given that the ASC model has been around since 1992 and formed the basis for several later models that were in general use, I think that it has some status as "accepted". In my analysis of the many posts to the list over the past couple years ( http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary), there was (in my opinion) general agreement about many of the more straightforward classes (such as Event and Identification). In the cases where there was not uniform agreement, we tried to note this in the DSW documentation and provide a rationale for why we chose between the alternatives viewpoints that were expressed. I will grant that the people who post to the list are probably a minority of the people who are receiving the posts. But that's all we have.
What do the lurkers think?
I think for the vast majority of data providers the current DarwinCore works for data submission.
What could happen in the future is that GBIF takes these records, cleans them and exposes some portion of their data in a a more semantic markup to the LOD cloud.
I tried to express in my last post why I felt that it was consistent with the past stated goals of TDWG to prioritize the needs of data submission over the needs of semantic interpretation when those two goals conflicted. I took a considerable amount of time to try to understand the taxonconcept.org ontology and in a previous message, to articulate the questions that I had about the structure of it. Those questions were framed around the basic problem of the tradeoff between constructing an ontology that made prioritized semantic querying (the subclassing cats by color approach) and constructing an ontology that prioritized simplicity in class structure (the single cat type with color properties approach). I don't think that you ever really responded to those specific questions/criticisms other than to say that your ontology was great and everybody should use it, and to demonstrate SPARQL queries with it.
Really, I can't speak for Cam, but from my perspective, I would guess that 5% or less of the time I spent working on DSW was involved in actually writing the RDF (that's excluding the time it took to learn how to use Protege and Subversion). At least 95% of the time or more was spent puzzling over emails and papers, writing annoying questions to the list to try to get people to talk about how they understood things, and then writing up the documentation with references. If you want people to accept the taxonconcept.org ontology as a means to markup metadata and type resources, then I would encourage you to write up some detailed documentation explaining how your approach is similar or different from earlier models (and if different, why). Also, some detailed explanations of what you intend the classes to represent would be helpful. For example, I had initially assumed that what you meant by "Occurrence" and "SpeciesIndividual" meant the same thing as we had described for "Occurrence" and "IndividualOrganism" in the DSW documentation. However, in your email responses it was clear to me that you meant something different (although I'm not sure what that was).
You suggested (if I'm understanding you correctly) in a previous email that you created taxonconcept.org so that its terms and classes could be used as a part of the TDWG infrastructure. If that is your intention, then you need to describe in a detailed document how the mapping from an existing data markup standard (i.e. DwC) to taxonconcept.org should be accomplised. As I pointed out in an earlier email, the structure of taxonconcept.org is rather complex ("reticulated") and potentially utilizes up to millions of classes, some of which have (to me) unclear connections to the DwC classes. The correspondence between DSW classes and DwC classes is generally not an issue because DSW classes ARE DwC classes (mostly).
The goal of my species concepts are to create URI's for a species that have a RDF (machine interpretable) and HTML representation. (the HTML representation is likely to change in the future to something much better)
That URI can then be used as a GUID that is relatively stable despite changes in nomenclature and classification hierarchies.
This allows searches for occurrence records and other data that are tied to the thing most call *Puma concolor* but is also known by many other names.
The alternative is for someone to search under all the potential name variants (assuming that they know them all)
As I pointed out before, taxonconcept.org is built around taxa-related classes and as such it is good at doing the kinds of taxonomy-related queries that you describe above. However, the TDWG community also includes people who are less interested in taxonomy and more interested in things like tracking individual whales over space, connecting different types of evidence that is located in different institutions, collecting data on one organism over time, etc. That is why I questioned constructing a taxon-oriented ontology rather than an ontology containing a few general classes of things.
Steve
The example queries are to show that the system seems to work in the way people expect.
To do this correctly and allow the functionality that people seen to want it will need to be a little complex.
Respectfully,
- Pete
On Mon, May 9, 2011 at 3:53 PM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:
Being either fearless or a fool (is there actually a difference?), I shall tread into this subject area at which I am a mere novice. So be kind...
I think that there may be several "solutions" to this problem. The one that is "correct" probably depends on what one is trying to accomplish. So I will try to describe in the most succinct way what Cam and I were trying to accomplish with DSW, and how that fits in with this thread. Cam and I basically wanted to do two things:
- Make it possible to use GUIDs RIGHT NOW (not five years from now).
- Create an extremely stripped down ontology that would be
non-controversial enough that people might actually use it, but which wouldn't do anything so bad that it would inhibit future development in the Semantic Web context (i.e. it could be extended in the future by clever people to do clever Semantic Web stuff).
Amazingly, the GUID Applicability Statement has achieved the status of Standard-hood! (http://www.tdwg.org/standards/150/) Hooray! I sort of missed the announcement, but ran across the fact the other day when I was surfing through the TDWG website. Since the GUID A.S. is now a TDWG Standard, I would say it would now officially be a best-practice to follow it. In particular, Recommendation 11 states "Objects in the biodiversity domain that are identified by a GUID should be typed using the TDWG ontology or other well-known vocabularies in accordance with the TDWG common architecture." This is somewhat problematic, given that the TDWG ontology (with the possible exception of the Taxon/TaxonConcept part) is effectively ("socially"?) deprecated. What is the alternative "other well-known vocabulary"? There is none, at least none having any kind of official status with TDWG.
I recently discovered (or maybe re-discovered) the Technical Architecture Group (TAG) Technical Roadmaps from 2006-2008: http://www.tdwg.org/uploads/media/TAG_Roadmap_01.doc http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2007_final.pdf http://www.tdwg.org/fileadmin/subgroups/tag/TAG_Roadmap_2008.pdf I might have seen them before, but if so it was at the point where I was really not knowledgeable enough to comprehend them. I found it very instructive to read about what the TAG had in mind when it set out to create the TDWG Ontology. In particular (from the 2007 Roadmap): "From the point of view of exchanging data - such as in the federation of a number of natural history collections - there is no need for a standards architecture. The federation is a closed system where a single exchange format can be agreed on. ... This model has worked well in the past but it does not meet the primary use case that is emerging. Biodiversity research is typically carried out by combining data of different kinds from multiple sources. The providers of data do not know who will use their data or how it will be combined with data from other sources. The consumer needs some level of commonality across all the data received so that it can be combined for analysis without the need to write computer software for every new combination." [This brings to mind the very "different kinds" of resources Cam is documenting in Borneo and the "multiple sources" that will be handling the metadata once those resources are sent off to herbaria, labs, and arboreta.]
and from the 2008 Roadmap: "If GUIDs are used to uniquely identify 'pieces' of data we need to have a shared understanding of what we mean by a 'piece of data' i.e. what kind of thing is it that a particular id applies to, a specimen, a person, an observation, a complete data set. We also need to have a shared understanding of at least some of the properties we use to describe these things."
Having been barely aware of TDWG's existence in 2008, I am blissfully ignorant of whatever disagreements may have occurred regarding LSIDs, reification, or whatever, and really don't want to know about them. All I can say as an outside observer is that it appears that the failure of the initial efforts to get GUIDs and the TDWG Ontology off the ground was because the system envisioned was too complicated to maintain, too complicated to gain a consensus, and to complicated to actually explain to anybody. Now that GUIDs seem to have a new lease on life, it seems like the greatest chance of successfully implementing them is to start by keeping things absolutely as simple as possible. To Cam and me, Darwin Core seemed to be the only candidate for something relatively simple and relatively universally accepted on which one could base an ontology that could be used to type GUIDs and to use to express "a shared understanding of at least some of the properties used to describe" biodiversity resources. Although I was somewhat skeptical that there was a "community consensus" about what the DwC classes meant and how they were related to each other, the exhaustive discussion on this list in Oct/Nov convinced me that maybe there WAS a consensus, or at least enough of a consensus to move forward. Although some people may at the present time be interested in figuring out how to do things like "define 'Fish' as an owl class as well as as a Taxon object", I would assert that is outside the core mission of TDWG as stated: to "develop, adopt and promote standards and guidelines for the recording and exchange of data about organisms evidenced by the historical record". It is fun to talk about, but to me not the primary consideration in designing a community data exchange model. This outlook explains to some extent why I asked questions about the complexity of taxonconcept.org and its orientation toward facilitating semantic queries. There is nothing wrong with that, but it doesn't seem to be the direction that TDWG has said it wants to go. Perhaps when we have "gotten there" (i.e. have a functioning system using GUIDs for clearly typed resources), we might want to embark further down the road to the Semantic Web.
Aside from just importing the DwC classes into the DSW ontology and connecting them with object properties, Cam and I did a little nasty thing with them. It has been said that declaring ranges and domains for terms doesn't prevent people from using the terms to express relationships among the "wrong" types of things. Rather, it simply asserts that those things are instances of the classes used in the range and domain declarations for the term. That is sort of true, but by declaring many of the core DwC classes to be disjoint, we actually ARE preventing people from using the wrong object properties with instances of the wrong classes. If Joe Curator rdf:type's a determination as a dwc:Identification, but then uses dsw:atEvent (which has the domain dwc:Occurrence) as a property of the determination, then a reasoner will infer that the determination is a type dwc:Occurrence as well as the explicitly declared type dwc:Identification. Because dwc:Identification and dwc:Occurrence are disjoint classes, the reasoner will have a fit. Cam and I are being Naughty (sensu Bob Morris) because we are inhibiting the extensibility of dsw:atEvent, but Joe Curator is being Naughty (sensu Baskauf and Webb) because Cam and I believe in the statement from the 2007 Roadmap: "The consumer needs some level of commonality across all the data received...". Joe Curator is not being consistent with the "shared understanding of at least some of the properties" to the extent that DSW reflects the "shared understanding" of the TDWG community. We are basically trying to enforce a sort of orthodoxy on the use of DwC classes as rdf:types and on the connections between the dwc:classes so that people can have some reasonable expectation that they are talking the same language as their partners whose data are also being aggregated in the same federated database.
It seems to me that this "enforcement of orthodoxy" may be very much at odds with the free-wheeling spirit of the Semantic Web community where Anybody Can Say Anything About Anything. But when I look over those old TAG roadmaps, I see little having to do with clever Semantic inferencing. I see a lot about providers and consumers understanding what each other are talking about. To some extent, Darwin Core can provide most of the necessary commonality between providers and consumers. There were (in our opinion) three areas where it could not. One was the lack of a class to link repeated sampling events and determinations (dwc:IndividualOrganism or TaxonomicallyHomogeneousEntity if you prefer) and another was a class that allowed for the separation of evidence from the Occurrence documented by it (called by us the dsw:Token class). The other area was the dwc:Taxon class which did not seem clear enough in its definition nor to possess enough complexity to express the kinds of relationships commonly discussed on this list. dwc:Taxon needs to be "fixed" before it is Ready For Prime Time (i.e. usable in rdf:type declarations)
So I guess having read the various responses to my query and thinking about the history of the TDWG Ontology, I would say that it may not really be important how dwc:Taxon could be tied to tc:Taxon because the two classes probably don't need to be tied together anyway. As it currently stands, dwc:Taxon (outside of DSW) has no semantic meaning other than what people want to believe that it means because it's not tied to any other classes by object properties of its instances. The mish-mash of terms describing names and taxa listed under dwc:Taxon add to the confusion - since the DwC vocabulary purposefully does not declare domains for the terms listed under a class they really could be used as properties for an instance of any class anywhere. In contract, tc:Taxon does have properties that are described clearly in the TDWG Ontology. The only reason that we declared the two classes to be equivalent was to signal that we felt that some of the DwC terms listed under dwc:Taxon in the DwC vocabulary could be used as data properties for the things in the tc:Taxon class that people like Paul were describing with properties from the TDWG Ontology. Tying them together doesn't (at the moment) mess up anything that anybody is doing with dwc:Taxon because (outside of DSW) there isn't anything to actually DO with dwc:Taxon in RDF. However, the point is well taken that if someone in the future did decide to define properties specifically intended for use with dwc:Taxon, those properties would be hopelessly tangled with tc:Taxon properties.
It seems to me like the real road forward (if one believes as I do that DwC is the only practical alternative to use for typing GUIDs) would be to:
- decide that the TDWG Ontology in its dead form adequately describes
taxa, names, and their properties (use it as-is). OR 2. decide that although the TDWG Ontology doesn't do everything that people want it to do at the present time, it could resurrected and modified to do what people want (use it and hope for the future). OR 3. decide to just create the additional classes, e.g. dwc:Name (or dsw:Name if you prefer not to adulterate the "pure" Darwin Core), and object properties for dwc:Taxon and dwc:Name that are needed to get the job done (i.e. just dump the TDWG Ontology as unfixable and make up new stuff).
In any of these three alternatives, there isn't actually any reason to tie the two classes together that I can see. Of these three, I think the third option would probably be preferable, although it might put Paul (and any others currently using the TDWG Ontology to describe Taxon instances) in the unpleasant position of having to redo their RDF.
Steve
Paul Murray wrote:
On 09/05/2011, at 2:07 PM, Kevin Richards wrote:
Paul
I had the same thought (ie the x is of type dwc:Taxon, y is of type tc:Taxon, we know dwc:Taxon and tc:Taxon are equivalent, so we can reasonably compare x and y). And this is built into standard semantic web reasoners - which is a bonus. But this was debated (taking into account Bob Morris' issue) with respect to DwC and it was decided the benefits weren't significantly better than having a "dwc:isInCategory" sort of property that could then be "equivalent to" another class property and therefore giving you a similar advantage (admittedly not as good, but similar). Do you think this is reasonable or are we just losing too much semantic web benefits by not specifying the domain constraint?
A thing to watch out for is that in OWL DL, you cannot apply ordinary data and object properties to vocabulary objects (classes, predicates) - you can only apply annotation properties. If you apply an ordinary data property to a class, OWL DL treats this as what it calls "punning": it decides that there is a class named X and also a named individual named X, and that these have nothing to do with one another. The individual has properties, the class has members, and the annotation properties, well: whatever. Reasoners do not reason over annotation properties: indeed - that's the entire point. Attempting to put properties on properties and having classes being instances of classes results in things that are mathematically undecidable ("this statement cannot be proven to be true").
(another reason is that is allows you to put dc:comments and labels on classes, and even if you declare those classes to be equivalent nevertheless the comment only applies to the particular thing you put it on)
This all means that dwc:isInCategory, if you want to apply it to dwc:taxon or other classes, will never have any meaning that semweb "engines" can get at. The underlying thing is that dwc:isInCategory is actually a meta-syntactic construct: rather than using owl to define a vocabulary, you are effectively attempting to extend OWL itself.
But ... maybe that's ok. Maybe what is attempting to be done here only ever needs to be understood by humans.
Now ... if what you are trying to do is to define "Fish" as an owl class as well as as a Taxon object - that is do-able, even to the point of being able to get inheritance working, using reflexive properties. At least ... I think it is. I should write a test case. If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email. _______________________________________________ tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
.
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
participants (12)
-
Blum, Stan
-
Bob Morris
-
Greg Whitbread
-
Jim Croft
-
joel sachs
-
Kevin Richards
-
Nico Cellinese
-
Nico Franz
-
Paul Murray
-
Peter DeVries
-
Richard Pyle
-
Steve Baskauf