Jonathan Reese, an employee of the Science Commons and TDWG member (and who knows way more about semantic web than I do) recently sent me this. I copy it here with his permission. Each of the paragraphs seems to me to be germane in different ways to the discussions about what should be an Individual. For those not deep into RDF, for the word "axiom", you could loosely understand "rule", although that term also has technical meaning that is sometimes a little different. Jonathan raises an important use case in the second paragraph, which is data quality control. That's a topic of interest to many, but especially those following the new Annotation Interest Group. Originally, this was part of a discussion we had about my favorite hobby horse, rdfs:domain. He is not on my side. When people who know more than I do about something are skeptical of my arguments about it, I usually suspend disbelief and temporarily adopt their position. Jonathan's first point is pretty much what Paul Murray observed yesterday in response to a question of Kevin Richards. "(a) subclassing is the way in RDFS or OWL you would connect the more specific to the less specific, so that you can apply general theorems to a more specific entity. That is, a well-documented data set would be rendered using classes and properties that were very specific so as to not lose information, and then could be merged with a badly-documented data set by relaxing to more general classes and properties using subclass and subproperty knowledge. (b) axioms (i.e. specificity) are valuable not only for expressing operational and inferential semantics, but also for "sanity checking" e.g. consistency, satisfiability, Clark/Parsia integrity checks ( http://clarkparsia.com/pellet/icv/ ), and similar. Being able to detect ill-formed inputs is incredibly valuable. People talk past one another because there are many distinct use cases for RDF and assumptions are rarely surfaced. For L(O)D, you're interested in making lots of links with little effort. Semantics is the enemy because it drives up costs. For semantic web, on the other hand, you're interested in semantics, i.e. understanding and documenting the import of what's asserted and making a best effort to only assert things that are true, even in the presence of open world assumption and data set extensibility. Semantics is expensive because it requires real thought and often a lot of reverse engineering. People coming from these two places will never be able to get along." ---Jonathan Rees in email to Bob Morris ================ Bob Morris -- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)
It is interesting that Jonathan Reese sees the semantic web and the LOD cloud in a different way that Tim Berners-Lee. The issue with LOD semantics is being worked out on the public-lod list. With the exception of some of the LOD services that do inferencing on cloud data, all inferencing is currently done on one machine with all the relevant data loaded. If you don't like SKOS or some other problematic ontology entailment you can simply: 1) Use a modified version of SKOS for your own inferencing. Also it would be interesting to see some real world inferencing using a data set markup in the current DarwinCore that demonstrates: 1) That it works 2) That it works in a useful way So in addition to failing to work within the standards of the larger informatics community TDWG*, is failing to demonstrate that it has a working, useful standard. Pointing out potential problems with SKOS etc. does not demonstrate that you have anything better. If the opinions of the real experts in the semantic web community matter then you might want to consider what they think of my work. Respectfully, - Pete * It is welcome news to me that TDWG is now going to follow the advice of the semantic web community On Mon, Nov 15, 2010 at 11:29 AM, Bob Morris <morris.bob@gmail.com> wrote:
Jonathan Reese, an employee of the Science Commons and TDWG member (and who knows way more about semantic web than I do) recently sent me this. I copy it here with his permission. Each of the paragraphs seems to me to be germane in different ways to the discussions about what should be an Individual. For those not deep into RDF, for the word "axiom", you could loosely understand "rule", although that term also has technical meaning that is sometimes a little different. Jonathan raises an important use case in the second paragraph, which is data quality control. That's a topic of interest to many, but especially those following the new Annotation Interest Group. Originally, this was part of a discussion we had about my favorite hobby horse, rdfs:domain. He is not on my side. When people who know more than I do about something are skeptical of my arguments about it, I usually suspend disbelief and temporarily adopt their position.
Jonathan's first point is pretty much what Paul Murray observed yesterday in response to a question of Kevin Richards.
"(a) subclassing is the way in RDFS or OWL you would connect the more specific to the less specific, so that you can apply general theorems to a more specific entity. That is, a well-documented data set would be rendered using classes and properties that were very specific so as to not lose information, and then could be merged with a badly-documented data set by relaxing to more general classes and properties using subclass and subproperty knowledge.
(b) axioms (i.e. specificity) are valuable not only for expressing operational and inferential semantics, but also for "sanity checking" e.g. consistency, satisfiability, Clark/Parsia integrity checks ( http://clarkparsia.com/pellet/icv/ ), and similar. Being able to detect ill-formed inputs is incredibly valuable.
People talk past one another because there are many distinct use cases for RDF and assumptions are rarely surfaced. For L(O)D, you're interested in making lots of links with little effort. Semantics is the enemy because it drives up costs. For semantic web, on the other hand, you're interested in semantics, i.e. understanding and documenting the import of what's asserted and making a best effort to only assert things that are true, even in the presence of open world assumption and data set extensibility. Semantics is expensive because it requires real thought and often a lot of reverse engineering. People coming from these two places will never be able to get along." ---Jonathan Rees in email to Bob Morris ================
Bob Morris
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile) _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- --------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies Knowledge Base <http://lod.geospecies.org/> About the GeoSpecies Knowledge Base <http://about.geospecies.org/> ------------------------------------------------------------
On 11/15/10 11:19 AM, "Peter DeVries" <pete.devries@gmail.com> wrote: [ ... ] So in addition to failing to work within the standards of the larger informatics community TDWG*, is failing to demonstrate that it has a working, useful standard. [...] DarwinCore is not the only thing TDWG has done, but the DarwinCore is explicitly based on Dublin Core, both in content and in the DCMI approach to maintenance. More over, every TDWG effort in the last decade has been based on some kind of widely used internet standard (XML schema, for example). Are those not part of the larger informatics community? The second part of the statement is a very narrow opinion. GBIF provides access to more than 200 million organism occurrence records gathered from hundreds of providers, all using TDWG standards. Not working, not useful? Only if your definition of useful includes the qualifier “with semantic web technologies.” Yes, within the semantic web domain, we don’t have anything useful or working. I think the point of this discussion is to determine what demonstrations and supporting specifications would be appropriate and feasible in the near term. Then we need motivation for a herd of cats. -Stan
I would have to agree Stan. For all its faults TDWG has produced standards that were/are useful. The Australia's Virtual Herbarium would not have been possible without the efforts and products of TDWG and the herbarium community still rusns on this. The Online Catalog of Australian Museums and the Atlas of Living Australia extended this nationally and GBIF and EoL are doing the same thing globally. The standards may be incomplete, maybe flaky, they may not even work, but they provide the foundation of communications and we are much better off with slightly borked standards than with none at all. But the criticism of TDWG sitting a bit awkwardly and self-contained in the wider standards framework is a valid one and something we are working to address. I have always assumed this is one of the guiding principles of TDWG. I prefer the analogy of motivating jelly tpo to stay nailed to the wall. :) jim On Tue, Nov 16, 2010 at 9:24 AM, Blum, Stan <SBlum@calacademy.org> wrote:
On 11/15/10 11:19 AM, "Peter DeVries" <pete.devries@gmail.com> wrote: [ ... ]
So in addition to failing to work within the standards of the larger informatics community TDWG*, is failing to demonstrate that it has a working, useful standard.
[...]
DarwinCore is not the only thing TDWG has done, but the DarwinCore is explicitly based on Dublin Core, both in content and in the DCMI approach to maintenance. More over, every TDWG effort in the last decade has been based on some kind of widely used internet standard (XML schema, for example). Are those not part of the larger informatics community?
The second part of the statement is a very narrow opinion. GBIF provides access to more than 200 million organism occurrence records gathered from hundreds of providers, all using TDWG standards. Not working, not useful? Only if your definition of useful includes the qualifier “with semantic web technologies.” Yes, within the semantic web domain, we don’t have anything useful or working.
I think the point of this discussion is to determine what demonstrations and supporting specifications would be appropriate and feasible in the near term. Then we need motivation for a herd of cats.
-Stan
_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://www.google.com/profiles/jim.croft 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.' - Robert Frost, poet (1874-1963) Please send URIs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html
We have done a fair amount of due diligence in recent years to stay abreast of the wider informatics community. We still have an MOA with OGC. We were even members of OASIS for a year. The whole LSID "thing" was an attempt to follow the lead of the bio(=molecular)informatics community. So I don't accept that we are self-contained, sitting awkwardly (narcissistic), within the wider informatics community. With appropriate cautions duly noted (re Bob Morris, Jonathan Rees, Matt Jones [a few years ago]), we ARE moving in this direction (semantic web, RDF, etc.), and have been since about 2006. ... Just not very fast :-| -Stan On 11/15/10 2:33 PM, "Jim Croft" <jim.croft@gmail.com> wrote:
But the criticism of TDWG sitting a bit awkwardly and self-contained in the wider standards framework is a valid one and something we are working to address. I have always assumed this is one of the guiding principles of TDWG.
I prefer the analogy of motivating jelly tpo to stay nailed to the wall. :)
jim
With appropriate cautions duly noted (re Bob Morris, Jonathan Rees, Matt Jones [a few years ago]), we ARE moving in this direction (semantic web, RDF, etc.), and have been since about 2006. ... Just not very fast :-|
In additional to microformats and RDF, is the HTML5 microdata feature part of the conversation among stakeholders here? http://dev.w3.org/html5/md/#introduction http://diveintohtml5.org/extensibility.html
Hi Aaron, The data that is generated using microdata looks to be very similar to RDFa, except that the properties are not qualified as URIs, and they seem to be limited to the properties available in the same location as the type definition due to the lack of qualification. It would be nice to standardise on something that is extensible, like RDF with RDFa as the HTML annotation method, as microdata just seems to be a cut down version of RDF that doesn't have a theoretical basis or a model outside of HTML. Cheers, Peter On 16 November 2010 10:29, Aaron Steele <eightysteele@gmail.com> wrote:
With appropriate cautions duly noted (re Bob Morris, Jonathan Rees, Matt Jones [a few years ago]), we ARE moving in this direction (semantic web, RDF, etc.), and have been since about 2006. ... Just not very fast :-|
In additional to microformats and RDF, is the HTML5 microdata feature part of the conversation among stakeholders here?
http://dev.w3.org/html5/md/#introduction http://diveintohtml5.org/extensibility.html _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Hi Peter,
The data that is generated using microdata looks to be very similar to RDFa, except that the properties are not qualified as URIs, and they seem to be limited to the properties available in the same location as the type definition due to the lack of qualification.
Interesting. So are there things that RDFa can do that microdata cannot do? Can you give an example?
On 16 November 2010 13:45, Aaron Steele <eightysteele@gmail.com> wrote:
Hi Peter,
The data that is generated using microdata looks to be very similar to RDFa, except that the properties are not qualified as URIs, and they seem to be limited to the properties available in the same location as the type definition due to the lack of qualification.
Interesting. So are there things that RDFa can do that microdata cannot do? Can you give an example?
I think I missed something when I first read through the Microdata draft, or it was updated in the last few hours, as it has todays date on it. On closer reading, there can be arbitrary URL's for properties, so there is no hassle there actually. The examples I was reading through before I went to the spec only used simple property names from the vocabulary matching the type of the item. There is also a good specification of how to generate RDF from Microdata which includes some common Dublin Core terms as well known predicates for different parts [1]. However, you could write an RDFa parser that added the extra triples without having to create a new spec called Microdata. I still don't think I would use Microdata, as it seems to duplicate the RDFa spec that has already been standardised by the W3C (since 2008!), even though technically it is for XHTML and not technically HTML (yet). The examples in [2] and [3] seem so similar that it seems like a waste of energy to recommend two slightly different ways to do the same thing. [1] http://dev.w3.org/html5/md/#rdf [2] http://www.w3.org/TR/xhtml-rdfa-primer/#id84801 [3] http://dev.w3.org/html5/md/#names:-the-itemprop-attribute
The universe that TDWG sets out to support is getting wider and wider. In a way, the needs of the community have become a moving target and may continue to move, making the target very difficult to hit. I think the temptation is great to be disappointed with not keeping up, but realistically TDWG is doing a pretty good job of going after all the many directions. There have been successes and they didn't vanish because some new needs have come along. It's clear to me that any "one size fits all" approach will never work for biodiversity informatics standards. There are several "kinds" of audience, multiple overlapping and even orthogonal use cases, and multiple levels of capability and interest within this very broad community. Recognizing these segments of the world to which standards need to apply will be an important part of the TDWG future. And that recognition I suppose means some tolerance is called for from all sides to acknowledge the existence of other sides. That makes solutions even tougher though. But, we need to take it on nevertheless. Chuck -----Original Message----- From: tdwg-tag-bounces@lists.tdwg.org [mailto:tdwg-tag-bounces@lists.tdwg.org] On Behalf Of Jim Croft Sent: Monday, November 15, 2010 4:34 PM To: Blum, Stan Cc: tdwg-tag@tdwg.org Subject: Re: [tdwg-tag] class design, generalization, L(O)D I would have to agree Stan. For all its faults TDWG has produced standards that were/are useful. The Australia's Virtual Herbarium would not have been possible without the efforts and products of TDWG and the herbarium community still rusns on this. The Online Catalog of Australian Museums and the Atlas of Living Australia extended this nationally and GBIF and EoL are doing the same thing globally. The standards may be incomplete, maybe flaky, they may not even work, but they provide the foundation of communications and we are much better off with slightly borked standards than with none at all. But the criticism of TDWG sitting a bit awkwardly and self-contained in the wider standards framework is a valid one and something we are working to address. I have always assumed this is one of the guiding principles of TDWG. I prefer the analogy of motivating jelly tpo to stay nailed to the wall. :) jim On Tue, Nov 16, 2010 at 9:24 AM, Blum, Stan <SBlum@calacademy.org> wrote:
On 11/15/10 11:19 AM, "Peter DeVries" <pete.devries@gmail.com> wrote: [ ... ]
So in addition to failing to work within the standards of the larger informatics community TDWG*, is failing to demonstrate that it has a working, useful standard.
[...]
DarwinCore is not the only thing TDWG has done, but the DarwinCore is explicitly based on Dublin Core, both in content and in the DCMI approach to maintenance. More over, every TDWG effort in the last decade has been based on some kind of widely used internet standard (XML schema, for example). Are those not part of the larger informatics community?
The second part of the statement is a very narrow opinion. GBIF provides access to more than 200 million organism occurrence records gathered from hundreds of providers, all using TDWG standards. Not working, not useful? Only if your definition of useful includes the qualifier "with semantic web technologies." Yes, within the semantic web domain, we don't have anything useful or working.
I think the point of this discussion is to determine what demonstrations and supporting specifications would be appropriate and feasible in the near term. Then we need motivation for a herd of cats.
-Stan
_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- _________________ Jim Croft ~ jim.croft@gmail.com ~ +61-2-62509499 ~ http://www.google.com/profiles/jim.croft 'A civilized society is one which tolerates eccentricity to the point of doubtful sanity.' - Robert Frost, poet (1874-1963) Please send URIs, not attachments: http://www.gnu.org/philosophy/no-word-attachments.html _______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
Hi Stan, This is more about the decision process than the group itself. It is also in reference to using the DarwinCore for the semantic web. Roger Hyam and others have stated that we need some set of use cases and test data that people can try. I think the merits and potential problems of various approaches will be much clearer when there is a set of use cases and a test data set. In other words, people should demonstrate and support their arguments with real examples. The alternative is an unending series of debates. As I said before, the DarwinCore is fine for what is is being used for. What I am not seeing are real examples of it being used successfully on the semantic web. Good examples would include demonstrations of useful SPARQL queries on a DarwinCore data set. The proof will be in the pudding, either it will work fine and do everything people expect or it will not. The motivation could that those arguments that are not supported by examples lack credibility. Respectfully, - Pete On Mon, Nov 15, 2010 at 4:24 PM, Blum, Stan <SBlum@calacademy.org> wrote:
On 11/15/10 11:19 AM, "Peter DeVries" <pete.devries@gmail.com> wrote: [ ... ]
So in addition to failing to work within the standards of the larger informatics community TDWG*, is failing to demonstrate that it has a working, useful standard.
[...]
DarwinCore is not the only thing TDWG has done, but the DarwinCore is explicitly based on Dublin Core, both in content and in the DCMI approach to maintenance. More over, every TDWG effort in the last decade has been based on some kind of widely used internet standard (XML schema, for example). Are those not part of the larger informatics community?
The second part of the statement is a very narrow opinion. GBIF provides access to more than 200 million organism occurrence records gathered from hundreds of providers, all using TDWG standards. Not working, not useful? Only if your definition of useful includes the qualifier “with semantic web technologies.” Yes, within the semantic web domain, we don’t have anything useful or working.
I think the point of this discussion is to determine what demonstrations and supporting specifications would be appropriate and feasible in the near term. Then we need motivation for a herd of cats.
-Stan
_______________________________________________ tdwg-tag mailing list tdwg-tag@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tag
-- --------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 TaxonConcept Knowledge Base <http://www.taxonconcept.org/> / GeoSpecies Knowledge Base <http://lod.geospecies.org/> About the GeoSpecies Knowledge Base <http://about.geospecies.org/> ------------------------------------------------------------
Because I am an RDF/OWL novice, I will not say much about how I think Jonathan/Bob's comments apply to the discussion about the scope of Individual. But I think that the suggestion that there may be stricter and broader uses of an Individual class probably describes the root of the disagreement pretty well. I have a particular, narrow use-case in mind (facilitating resampling and inferring duplicates using some kind of taxonomically homogeneous entity). Rich wants a broader interpretation based on ideas of what an individual means in various contexts. I think this difference in outlook is reflected in this statement from Rich's last response: 'We can argue about the properties and tokens later; first we need to nail down the "essence" of an Individual.' I actually disagree strongly with this statement. I have tried to stay out of the current thread about the future course of the TDWG ontology because it isn't my something that I know much about. But I think I am leaning to the side of those who suggest that we create use cases first and then see how the ontology can be developed to facilitate those use cases. I am actively using the class Individual in RDF way I have defined it in my proposal. I know of at least one other person who plans to do so as well. It is not clear to me what the use case is for doing what Rich wants: combining what I've called the "token" aspect of individuals with the "resampling" aspect. It may be that there is such a use, but I'd like to see how it will work - in particular how it will work in RDF without "breaking" the use that I need for the Individual class. If it is possible to combine the two aspects of individuals, perhaps that might be done in a "lax" definition (using Paul's term) that reflects the "essence" of an individual. Unfortunately, figuring out what the "essence" is of an individual is more difficult than showing how one plans to use the term. Steve Bob Morris wrote:
Jonathan Reese, an employee of the Science Commons and TDWG member (and who knows way more about semantic web than I do) recently sent me this. I copy it here with his permission. Each of the paragraphs seems to me to be germane in different ways to the discussions about what should be an Individual. For those not deep into RDF, for the word "axiom", you could loosely understand "rule", although that term also has technical meaning that is sometimes a little different. Jonathan raises an important use case in the second paragraph, which is data quality control. That's a topic of interest to many, but especially those following the new Annotation Interest Group. Originally, this was part of a discussion we had about my favorite hobby horse, rdfs:domain. He is not on my side. When people who know more than I do about something are skeptical of my arguments about it, I usually suspend disbelief and temporarily adopt their position.
Jonathan's first point is pretty much what Paul Murray observed yesterday in response to a question of Kevin Richards.
"(a) subclassing is the way in RDFS or OWL you would connect the more specific to the less specific, so that you can apply general theorems to a more specific entity. That is, a well-documented data set would be rendered using classes and properties that were very specific so as to not lose information, and then could be merged with a badly-documented data set by relaxing to more general classes and properties using subclass and subproperty knowledge.
(b) axioms (i.e. specificity) are valuable not only for expressing operational and inferential semantics, but also for "sanity checking" e.g. consistency, satisfiability, Clark/Parsia integrity checks ( http://clarkparsia.com/pellet/icv/ ), and similar. Being able to detect ill-formed inputs is incredibly valuable.
People talk past one another because there are many distinct use cases for RDF and assumptions are rarely surfaced. For L(O)D, you're interested in making lots of links with little effort. Semantics is the enemy because it drives up costs. For semantic web, on the other hand, you're interested in semantics, i.e. understanding and documenting the import of what's asserted and making a best effort to only assert things that are true, even in the presence of open world assumption and data set extensibility. Semantics is expensive because it requires real thought and often a lot of reverse engineering. People coming from these two places will never be able to get along." ---Jonathan Rees in email to Bob Morris ================
Bob Morris
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
I (hope that I) have moved this conversation to tdwg-content On Mon, Nov 15, 2010 at 12:29 PM, Bob Morris <morris.bob@gmail.com> wrote:
[snip]
-- Robert A. Morris Emeritus Professor of Computer Science UMASS-Boston 100 Morrissey Blvd Boston, MA 02125-3390 Associate, Harvard University Herbaria email: morris.bob@gmail.com web: http://bdei.cs.umb.edu/ web: http://etaxonomy.org/mw/FilteredPush http://www.cs.umb.edu/~ram phone (+1) 857 222 7992 (mobile)
participants (8)
-
Aaron Steele
-
Blum, Stan
-
Bob Morris
-
Chuck Miller
-
Jim Croft
-
Peter Ansell
-
Peter DeVries
-
Steve Baskauf