If you need something for referring to a population, then it is probably best to do it as a related class
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I was thinking that it would be best to create a separate class that can be used for populations of a species.
This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities
http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar
See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Base View (http://bit.ly bit.ly/gMFqR1
The model mints URI's for the following related entities. See RDF. or KB View
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
Here is how a subset of these would relate to the new #Population Tag and related semantic entities.
This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.
<txn:SpeciesIndividualTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Individual%22%3E dcterms:titleA Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Individual </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesIndividualTag>
Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.
<txn:SpeciesPopulationTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Population%22%3E dcterms:titleA Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Population </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesPopulationTag>
This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars.
<owl:Class rdf:about=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation%22%3E <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Population "/> skos:prefLabelThe population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource=" http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individ... "/> <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf%22/%3E </owl:Class>
Respectfully,
- Pete
-------------------------------------------------------------------------------------
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://lod.geospecies.org/ Knowledge Bases
A Semantic Web, Linked Open Data http://linkeddata.org/ Project
---------------------------------------------------------------------------------------
Hi Pete, I want to respond to your message in two parts. It may take me some time to write a response to the second part (i.e. questions about your suggestion) so it may not come right away. But I also wanted to comment about the first part, that is:
Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I'm not exactly sure if this is directed at Cam and me (in the context of darwin-sw), or to others. If it is directed at me, then you can read my response. If not, then you can ignore it.
First of all, I'd like to say that I greatly respect the work that you've done on trying to promote the use of LOD in the TDWG community. I have read every one of your posts and have tried to understand all of them to the extent that I'm able. I think we referenced your posts over 20 times at http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary, cited your taxon concept examples at http://code.google.com/p/darwin-sw/wiki/ClassTaxon and included your model in the analysis of previous models at http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels. Your suggestion of using the geo: scheme was included in the discussion of the Location class at http://code.google.com/p/darwin-sw/wiki/ClassLocation . I would say that at least half of what I know about RDF comes from looking at your examples and trying to understand what you have done. I am therefore very grateful for the work that you have done and your enthusiasm for bringing creative ideas into the community.
So why did Cam and I create Darwin-SW instead of just using your ontologies at taxonconcept.org ? There are several reasons, some of which are alluded to at http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels . But to be succinct (OK, maybe not that succinct), I'll state them here:
1. Darwin Core is a ratified TDWG standard. It therefore qualifies as a "well-known" vocabulary. If I refer to dwc:recordedBy as a property, people in the biodiversity informatics community will know what it means. If I refer to dwc:Identification, it will also be known in our community. For this reason, Cam and I wanted as much as possible to build Darwin-SW on Darwin Core rather than using terms that we or any other individual minted.
2. Cam and I wanted the Darwin-SW ontology to (as much as possible) reflect the community consensus on what classes meant and and how they were related to each other. Of course, the problem is knowing what that consensus was. After the hundreds of emails that were posted on the tdwg-content list from September 2010 to the present, I feel like I have a much better understanding of what the consensus is than I did before (where there IS a consensus, of course). I have spent more hours than I care to remember trying to read, re-read, and understand the various emails that were sent and then asking annoying questions until somebody was patient enough to explain things to me. Most of those explanations are referenced on the class wiki pages. So I don't consider the ideas embodied in Darwin-SW to be "our" ideas - they are the ideas we absorbed from the community, including you. (If you want to see "my" actual ideas, look at the examples in my Biodiversity Informatics article. I don't really think that they are really that good any more.) The outlook of DSW also recognizes historical precedents such as the ACS model. As cool and clever as taxonconcept.org is, it fundamentally represents Pete DeVries' ideas. That means that it will readily be accepted by you, but the community may be less apt to buy into it if it doesn't embody community concepts. It may turn out that Darwin-SW does NOT actually represent the community consensus (as we hope it does), or is stupid, or doesn't work. In those cases, it will get shot down and somebody else will pick up the task of trying to figure out what the community consensus is about how things should be represented in RDF. I should note that I don't think the discussion last Oct/Nov was limited to a clique of TDWG architects. I was an active participant and I certainly don't qualify as a TDWG insider, having only been to one TDWG meeting for less than 24 hours and knowing almost no other TDWG contributors personally.
3. There are a couple of structural things about taxonconcept.org terms and classes that I have questions about and I'll raise them in my second email to come after this one. But I think that one of the most problematic things about taxonconcept.org for me is the way that you describe taxon concepts. I hate to even bring up the subject because it's taken me months just to try to understand what people mean when they are talking about a taxon concept and I don't want to unleash another hundred emails about the minutae of taxon concepts, which people on this list love to talk about. So suffice it to say that the sense that I've gotten from the many posts on the subject is that most people see a Taxon Concept (= = Taxon and similar to a "taxon name use") as the combination of a taxon name and a "sensu" or "secundum" (accordingTo) reference. That's how it is modeled in the TCS model, which is another ratified TDWG standard. That's also how it's modeled in the unfinished TDWG ontology, which despite its unfinished state is nonetheless is actually being used by some people to describe taxon concepts (see http://code.google.com/p/darwin-sw/wiki/ClassTaxon for links to some examples). When I look at how you model taxon concepts such as in http://lod.taxonconcept.org/ses/v6n7p.rdf which describes the species concept http://lod.taxonconcept.org/ses/v6n7p#Species , there are a lot of metadata about the scientific name, related name strings, URIs that represent similar resources, connections to the original description, etc. But I don't see any sensu/secundum reference or a property that links to one. So although http://lod.taxonconcept.org/ses/v6n7p#Species is a cool thing that links to a lot of useful information about Puma concolor, it doesn't seem to be the same thing as what everybody else is calling a taxon concept. If I were to link to your "species concepts", you and I might know what that meant, but nobody else would. That is in contrast to a tc:Taxon (= = tc:TaxonConcept) instance which is defined by reference to the TCS model and I would therefore consider "well known" (and is what we DO reference in DSW).
So I think that in some sense, my reluctance to adopt individually "minted" classes and properties comes from the reason why I'm interested in RDF in the first place. I'm actually NOT very interested in using RDF to do reasoning in the "Semantic Web" sense - I guess I'm still a bit of a skeptic about how likely it is that anybody will be able to find out anything useful by doing reasoning on RDF that they suck in from the cloud, particularly if lots of people are using their own minted properties and if different people intend for the classes they use to rdf:type things to mean different things and have different properties. What I AM interested in is figuring out a way to make it possible for people to have a consistent understanding of the meaning of metadata that they discover when they resolve GUIDs. I think that will be increasingly important when projects like BiSciCol get rolling. The only way that I see this as possible is to base properties primarily on vocabularies that a lot of institutions already "understand" and are using, like Darwin Core.
That doesn't mean that people will ONLY use Darwin Core. I have already heard plenty of talk on this list about using other vocabularies such as geo:, skos:, and foaf: (with some cautions from Bob). John Wieczorek, the architect of the Darwin Core standard, proposed adding geo: terms to DwC (see http://code.google.com/p/darwincore/issues/detail?id=82). There are also at least four people on this list that I know are interested in trying to make sure that DwC can interface with the OBOE ontology. So I don't think it is fair to say that "TDWG" is opposed to adopting things outside its clique. I just think that people are cautious about supporting things that they are not familiar with (or perhaps don't understand) and in a lot of cases just don't have the time (or aren't willing to take the time) to figure out something new.
So I hope that you aren't discouraged that people are slow to jump on the LOD bandwagon. I think that more people will be interested in supporting it when they start seeing tangible applications within our community and that's already starting to happen. Unfortunately, since you are so far ahead of the rest of us in your understanding of how the LOD world works, I think that you are probably doomed to be one of the ones pulling the wagon! :-) I look forward (as I have in the past) to hearing more of your innovative ideas.
Steve
Hi Steve,
I was expressing my disappointment about how this process operates, not disappointment in you or your efforts.
A while back I posted several messages with examples showing that the current DarwinCore, while good for it's current use, is not well suited for the Semantic Web.
I proposed that we work on something that might be better suited for that use while data providers continued to use the DarwinCore.
There seemed to be no consensus that this was the thing to do, or that anyone else agreed with my proposal.
It is not clear how TDWG decisions are made. There seem to be discussions and debate, not voting or clear consensus on the email list.
What does seem to happen is some mysterious entity, *that I like to call the TDWG Illuminati*, seems decide what the "consensus" is.
So apparently there was consensus that we should investigate a more semantic version of the DarwinCore and the minting a separate set of URI"s was blessed by the Illuminati.
What I don't understand is this: Is there is any real difference between examples that start with "txn" are any "dsw"
Haven't I said all along that my stuff could be moved into the DarwinCore?
The reason I created txn was that my efforts at advocating for a markup that followed Linked Open Data best practices was going nowhere.
I needed URI's that resolved correctly.
The issue relating to my TaxonConcepts are complicated, the specifics will need to come at a later date.
However, I take issue with the idea that Taxon = name use when you consider the actual practice of the vast majority of biologists and the actual data sets that are available.
If NCBI, eBird, BOLD, or any data from of the countless field and lab observations were tagged with a "sensu" or "secundum" then you might have an argument.
So the consensus in the larger biological community is that this is not how information about species is tagged.
The goal of my TaxonConcepts is provide machine / human interpretable species concepts so that different biologists can more reliably and repeatedly determine which concept is the most appropriate "tag" for the individual organism they are looking at in a microscope or through binoculars.
A tag that can remain stable to changes in nomenclature like that seen seen with Aedes / Ochlerotatus.
A first step in this process is to connect that species to the many name variants with which it is associated.
A second step is to link the concept to closely related concepts, identifiers and other existing documentation on the web.
Some of these concepts include links to the original description, etype pages, BOLD barcodes etc. They have the ability to be linked to the type and other specimens.
They have been looked at very carefully my leaders in the Linked Open Data community and the GeoSpecies / TaxonConcept are seen as one of the better designed LOD data sets.
In general, SPARQL queries on a local or LOD endpoint work as expected.
Are they perfect? No.
Respectfully,
- Pete
On Fri, Apr 29, 2011 at 9:46 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu
wrote:
Hi Pete, I want to respond to your message in two parts. It may take me some time to write a response to the second part (i.e. questions about your suggestion) so it may not come right away. But I also wanted to comment about the first part, that is:
Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I'm not exactly sure if this is directed at Cam and me (in the context of
darwin-sw), or to others. If it is directed at me, then you can read my response. If not, then you can ignore it. First of all, I'd like to say that I greatly respect the work that you've done on trying to promote the use of LOD in the TDWG community. I have read every one of your posts and have tried to understand all of them to the extent that I'm able. I think we referenced your posts over 20 times at http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary, cited your taxon concept examples at http://code.google.com/p/darwin-sw/wiki/ClassTaxon and included your model in the analysis of previous models at http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels. Your suggestion of using the geo: scheme was included in the discussion of the Location class at http://code.google.com/p/darwin-sw/wiki/ClassLocation . I would say that at least half of what I know about RDF comes from looking at your examples and trying to understand what you have done. I am therefore very grateful for the work that you have done and your enthusiasm for bringing creative ideas into the community. So why did Cam and I create Darwin-SW instead of just using your ontologies at taxonconcept.org ? There are several reasons, some of which are alluded to at http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels . But to be succinct (OK, maybe not that succinct), I'll state them here:
- Darwin Core is a ratified TDWG standard. It therefore qualifies as a
"well-known" vocabulary. If I refer to dwc:recordedBy as a property, people in the biodiversity informatics community will know what it means. If I refer to dwc:Identification, it will also be known in our community. For this reason, Cam and I wanted as much as possible to build Darwin-SW on Darwin Core rather than using terms that we or any other individual minted. 2. Cam and I wanted the Darwin-SW ontology to (as much as possible) reflect the community consensus on what classes meant and and how they were related to each other. Of course, the problem is knowing what that consensus was. After the hundreds of emails that were posted on the tdwg-content list from September 2010 to the present, I feel like I have a much better understanding of what the consensus is than I did before (where there IS a consensus, of course). I have spent more hours than I care to remember trying to read, re-read, and understand the various emails that were sent and then asking annoying questions until somebody was patient enough to explain things to me. Most of those explanations are referenced on the class wiki pages. So I don't consider the ideas embodied in Darwin-SW to be "our" ideas - they are the ideas we absorbed from the community, including you. (If you want to see "my" actual ideas, look at the examples in my Biodiversity Informatics article. I don't really think that they are really that good any more.) The outlook of DSW also recognizes historical precedents such as the ACS model. As cool and clever as taxonconcept.orgis, it fundamentally represents Pete DeVries' ideas. That means that it will readily be accepted by you, but the community may be less apt to buy into it if it doesn't embody community concepts. It may turn out that Darwin-SW does NOT actually represent the community consensus (as we hope it does), or is stupid, or doesn't work. In those cases, it will get shot down and somebody else will pick up the task of trying to figure out what the community consensus is about how things should be represented in RDF. I should note that I don't think the discussion last Oct/Nov was limited to a clique of TDWG architects. I was an active participant and I certainly don't qualify as a TDWG insider, having only been to one TDWG meeting for less than 24 hours and knowing almost no other TDWG contributors personally.
- There are a couple of structural things about taxonconcept.org terms
and classes that I have questions about and I'll raise them in my second email to come after this one. But I think that one of the most problematic things about taxonconcept.org for me is the way that you describe taxon concepts. I hate to even bring up the subject because it's taken me months just to try to understand what people mean when they are talking about a taxon concept and I don't want to unleash another hundred emails about the minutae of taxon concepts, which people on this list love to talk about. So suffice it to say that the sense that I've gotten from the many posts on the subject is that most people see a Taxon Concept (= = Taxon and similar to a "taxon name use") as the combination of a taxon name and a "sensu" or "secundum" (accordingTo) reference. That's how it is modeled in the TCS model, which is another ratified TDWG standard. That's also how it's modeled in the unfinished TDWG ontology, which despite its unfinished state is nonetheless is actually being used by some people to describe taxon concepts (see http://code.google.com/p/darwin-sw/wiki/ClassTaxon for links to some examples). When I look at how you model taxon concepts such as in http://lod.taxonconcept.org/ses/v6n7p.rdf which describes the species concept http://lod.taxonconcept.org/ses/v6n7p#Species , there are a lot of metadata about the scientific name, related name strings, URIs that represent similar resources, connections to the original description, etc. But I don't see any sensu/secundum reference or a property that links to one. So although http://lod.taxonconcept.org/ses/v6n7p#Species is a cool thing that links to a lot of useful information about Puma concolor, it doesn't seem to be the same thing as what everybody else is calling a taxon concept. If I were to link to your "species concepts", you and I might know what that meant, but nobody else would. That is in contrast to a tc:Taxon (= = tc:TaxonConcept) instance which is defined by reference to the TCS model and I would therefore consider "well known" (and is what we DO reference in DSW). So I think that in some sense, my reluctance to adopt individually "minted" classes and properties comes from the reason why I'm interested in RDF in the first place. I'm actually NOT very interested in using RDF to do reasoning in the "Semantic Web" sense - I guess I'm still a bit of a skeptic about how likely it is that anybody will be able to find out anything useful by doing reasoning on RDF that they suck in from the cloud, particularly if lots of people are using their own minted properties and if different people intend for the classes they use to rdf:type things to mean different things and have different properties. What I AM interested in is figuring out a way to make it possible for people to have a consistent understanding of the meaning of metadata that they discover when they resolve GUIDs. I think that will be increasingly important when projects like BiSciCol get rolling. The only way that I see this as possible is to base properties primarily on vocabularies that a lot of institutions already "understand" and are using, like Darwin Core. That doesn't mean that people will ONLY use Darwin Core. I have already heard plenty of talk on this list about using other vocabularies such as geo:, skos:, and foaf: (with some cautions from Bob). John Wieczorek, the architect of the Darwin Core standard, proposed adding geo: terms to DwC (see http://code.google.com/p/darwincore/issues/detail?id=82). There are also at least four people on this list that I know are interested in trying to make sure that DwC can interface with the OBOE ontology. So I don't think it is fair to say that "TDWG" is opposed to adopting things outside its clique. I just think that people are cautious about supporting things that they are not familiar with (or perhaps don't understand) and in a lot of cases just don't have the time (or aren't willing to take the time) to figure out something new. So I hope that you aren't discouraged that people are slow to jump on the LOD bandwagon. I think that more people will be interested in supporting it when they start seeing tangible applications within our community and that's already starting to happen. Unfortunately, since you are so far ahead of the rest of us in your understanding of how the LOD world works, I think that you are probably doomed to be one of the ones pulling the wagon! :-) I look forward (as I have in the past) to hearing more of your innovative ideas.
Steve
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Comments inline
Peter DeVries wrote:
I was expressing my disappointment about how this process operates, not disappointment in you or your efforts.
OK, cool. I am excited about the possibility that there can be productive discussion about how to move forward on something that will work for LOD.
A while back I posted several messages with examples showing that the current DarwinCore, while good for it's current use, is not well suited for the Semantic Web.
I proposed that we work on something that might be better suited for that use while data providers continued to use the DarwinCore.
There seemed to be no consensus that this was the thing to do, or that anyone else agreed with my proposal.
Well, I think you are right that there may not have been a consensus on exactly what to do, but I think there might have been a consensus that SOMETHING should be done. See "Do we need an RDF version/guide/representation of Darwin Core?" at the bottom of the "http://code.google.com/p/darwin-sw/wiki/Rationale" wiki page. I suppose that I may disagree with you somewhat that the current DwC is not well-suited for the semantic web. I think that most of the terms that are there currently could be used for data properties. The problem is that there are a lot of missing object properties and the dwc:xxxxxxID terms are too ambiguous to use for that. So I would assert that DwC really needs an extension rather than a replacement. Of course I could be wrong...
It is not clear how TDWG decisions are made. There seem to be discussions and debate, not voting or clear consensus on the email list.
I think that you won't necessarily get disagreement on this. The fact that proposals for DwC additions have been there for as long as a year without an up or down vote is an indication that the process isn't very efficient. I think you were at a session at TDWG where this problem (i.e. the functioning of the TDWG Technical Architecture Group) was discussed.
It was suggested in http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001591.html and http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001653.html that a Task Group be chartered to work on this. Cam and I didn't follow that route for several reasons. Cam's in the jungle of Borneo and I have no funding, and work and family constraints that make it difficult for me to attend meetings. Also, we are both pretty much novices at this and are "learning by doing". I, in particular, wouldn't feel comfortable chairing a Task Group if I didn't think I could follow through with the commitment. So maybe it was bad to come up with DSW outside of a Task Group. But it was something we needed for our own work in a relatively short period of time and development by committee is usually a very slow process. So rightly or wrongly we just did it. Maybe it is time for a Task Group. But I think somebody else may have to shoulder that responsibility.
What does seem to happen is some mysterious entity, /that I like to call the TDWG Illuminati/, seems decide what the "consensus" is.
So apparently there was consensus that we should investigate a more semantic version of the DarwinCore and the minting a separate set of URI"s was blessed by the Illuminati.
I don't understand what you mean. Are you talking about DSW? If so, it wasn't "blessed" by anybody. Last fall Cam suggested we try having a go at it and at first I didn't think it was realistic, but then he convinced me that it would be fun and that we could do it over the Internet. So off we went. It was entirely our project and idea and we didn't ask for or get a blessing from anybody in TDWG or anywhere else. So we will take full responsibility (and blame if necessary) for it. At this point, we are interested in knowing how many bullets it can take and still keep standing. So load up and fire away. Our feelings are not easily hurt.
I really don't like conspiracy theories because there is never any way to convince the people who are proposing them that they are not true. What I see in the situation (from list posts) is a number of people "playing around" with RDF, LOD, and semantic web stuff to try to see what they can do with it. I think this is really a necessary step before trying to do anything more concrete. I don't think you will disagree with that, you are doing the same thing yourself. I don't think there is any kind of consensus at all about what should happen, just maybe that "something" should happen. I'm only getting that from the list posts, given that I don't really have direct connections with other TDWG people (except for one BiSciCol meeting and that's not an official TDWG thing).
What I don't understand is this: Is there is any real difference between examples that start with "txn" are any "dsw"
No. You are completely right about this, to the extent that you intend for your txn classes to mean the same thing as what Cam and I mean with the dsw classes. Actually there are only two dsw classes, IndividualOrganism and Token. The rest are either dwc or dcterms classes that we imported. If you classes are the same as the dwc/dcterms/dsw classes, then it makes absolutely no difference whether the string "txn:identificationOfIndividual" or "dsw:identifies" is used to describe the relationships. Our choices were mostly based on liking names that were short and which expressed the relationship clearly. But they were arbitrary and we could have just as easily used your names (if we were sure that you intended for your classes to be equivalent to dwc/dcterms classes).
Haven't I said all along that my stuff could be moved into the DarwinCore?
I guess maybe you said that at some point, but I think I either forgot it or missed it. I have thought of your examples as an alternative to Darwin Core - I think primarily because you often choose term names that aren't the same as those used in DwC.
The reason I created txn was that my efforts at advocating for a markup that followed Linked Open Data best practices was going nowhere.
I wouldn't say nowhere. I think that because of people's lack of familiarity with RDF, involvement in other projects, and general busy-ness movement towards involvement in LOD is very slow but it is there. There have been a number of people talking about it. As I said in the other message that I just sent, I think that the large scale and complexity of txn is an impediment to people understanding it. I'm not saying that as a criticism, I just think that it is true. I also think that it is not a safe assumption that because people don't post to the list that it means that people are ignoring what you are doing. Some people may feel insecure about posting and others just may be "lurking" and don't have anything to say. I for one have read and considered every post that you have ever written and even though you never got much in the way of responses on your geo: suggestions, I know that there is at least one group who has seriously looked at it as a viable way to express location information.
I needed URI's that resolved correctly.
The issue relating to my TaxonConcepts are complicated, the specifics will need to come at a later date.
However, I take issue with the idea that Taxon = name use when you consider the actual practice of the vast majority of biologists and the actual data sets that are available.
If NCBI, eBird, BOLD, or any data from of the countless field and lab observations were tagged with a "sensu" or "secundum" then you might have an argument.
I think there is widespread agreement that people don't do this, but I think there is considerable sentiment that they should. And so any system we build should be designed to make it easy for people to do this.
So the consensus in the larger biological community is that this is not how information about species is tagged.
The goal of my TaxonConcepts is provide machine / human interpretable species concepts so that different biologists can more reliably and repeatedly determine which concept is the most appropriate "tag" for the individual organism they are looking at in a microscope or through binoculars.
A tag that can remain stable to changes in nomenclature like that seen seen with Aedes / Ochlerotatus.
A first step in this process is to connect that species to the many name variants with which it is associated.
I don't fundamentally disagree with most of what you say here. What you are trying to do is useful and important. What I am saying is that the way that the term "taxon concept" has been used in the literature, in previous TDWG standards (e.g. the TCS schema), and the the nascent TDWG ontology does not seem to be the same thing that you are calling "taxon concept". Perhaps there needs to be a different term for what you are doing.
The meaning of "Taxon", especially as it applies to the dwc:Taxon class is much more obscure. If one ferrets out information from the hundreds of tdwg-content postings, you pretty much come up with http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001703.html and http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001814.html which seems to indicate that the people who created DwC intended for instances of the dwc:Taxon class to be taxon name usages (TNUs) which I take to be a more broadly defined entity that includes the more narrowly defined tc:TaxonConcept. It would be nice if it were stated explicitly on the DwC wiki that this is the case (if it actually is). In the tdwg ontology, tc:Taxon is defined to be an equivalent class to tc:TaxonConcept. Whether that means that tc:Taxon is exactly the same as dwc:Taxon is not clear, but from the standpoint of DSW, we have defined them to be the same thing. I think we can do that because Darwin Core terms don't have very strict definitions. (By the way, this is all summarized at http://code.google.com/p/darwin-sw/wiki/ClassTaxon)
A second step is to link the concept to closely related concepts, identifiers and other existing documentation on the web.
Some of these concepts include links to the original description, etype pages, BOLD barcodes etc. They have the ability to be linked to the type and other specimens.
Again, an awesome and cool thing which I support wholeheartedly. I just think it doesn't fit the standard definition of "taxon concept". That isn't a conspiracy of any Illuminati. It means that people have published work saying that's what it means. It's kind of like the in the legal system where people follow precedents even if they aren't what those people would have done in the absence of previous cases.
They have been looked at very carefully my leaders in the Linked Open Data community and the GeoSpecies / TaxonConcept are seen as one of the better designed LOD data sets.
I do not have the experience to judge this, but I'll take your word for it. The challenge here is to come up with something that will make both people inside and outside the biodiversity informatics community happy (or as happy as is possible). If you only make one group or the other happy, then you don't have something that will bridge both groups. So despite whatever problems there might be in the TDWG community, that's the community that we have. People who don't have experience with a technology need to be led by the hand and shown why the technology that is proposed will do something that they actually need. That probably won't happen fast.
That's my "two cents worth" (translate into your local currency :-). Steve
In general, SPARQL queries on a local or LOD endpoint work as expected.
Are they perfect? No.
Respectfully,
- Pete
On Fri, Apr 29, 2011 at 9:46 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu mailto:steve.baskauf@vanderbilt.edu> wrote:
Hi Pete, I want to respond to your message in two parts. It may take me some time to write a response to the second part (i.e. questions about your suggestion) so it may not come right away. But I also wanted to comment about the first part, that is: Peter DeVries wrote: I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click? I'm not exactly sure if this is directed at Cam and me (in the context of darwin-sw), or to others. If it is directed at me, then you can read my response. If not, then you can ignore it. First of all, I'd like to say that I greatly respect the work that you've done on trying to promote the use of LOD in the TDWG community. I have read every one of your posts and have tried to understand all of them to the extent that I'm able. I think we referenced your posts over 20 times at http://code.google.com/p/darwin-sw/wiki/TdwgContentEmailSummary, cited your taxon concept examples at http://code.google.com/p/darwin-sw/wiki/ClassTaxon and included your model in the analysis of previous models at http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels. Your suggestion of using the geo: scheme was included in the discussion of the Location class at http://code.google.com/p/darwin-sw/wiki/ClassLocation . I would say that at least half of what I know about RDF comes from looking at your examples and trying to understand what you have done. I am therefore very grateful for the work that you have done and your enthusiasm for bringing creative ideas into the community. So why did Cam and I create Darwin-SW instead of just using your ontologies at taxonconcept.org <http://taxonconcept.org> ? There are several reasons, some of which are alluded to at http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels . But to be succinct (OK, maybe not that succinct), I'll state them here: 1. Darwin Core is a ratified TDWG standard. It therefore qualifies as a "well-known" vocabulary. If I refer to dwc:recordedBy as a property, people in the biodiversity informatics community will know what it means. If I refer to dwc:Identification, it will also be known in our community. For this reason, Cam and I wanted as much as possible to build Darwin-SW on Darwin Core rather than using terms that we or any other individual minted. 2. Cam and I wanted the Darwin-SW ontology to (as much as possible) reflect the community consensus on what classes meant and and how they were related to each other. Of course, the problem is knowing what that consensus was. After the hundreds of emails that were posted on the tdwg-content list from September 2010 to the present, I feel like I have a much better understanding of what the consensus is than I did before (where there IS a consensus, of course). I have spent more hours than I care to remember trying to read, re-read, and understand the various emails that were sent and then asking annoying questions until somebody was patient enough to explain things to me. Most of those explanations are referenced on the class wiki pages. So I don't consider the ideas embodied in Darwin-SW to be "our" ideas - they are the ideas we absorbed from the community, including you. (If you want to see "my" actual ideas, look at the examples in my Biodiversity Informatics article. I don't really think that they are really that good any more.) The outlook of DSW also recognizes historical precedents such as the ACS model. As cool and clever as taxonconcept.org <http://taxonconcept.org> is, it fundamentally represents Pete DeVries' ideas. That means that it will readily be accepted by you, but the community may be less apt to buy into it if it doesn't embody community concepts. It may turn out that Darwin-SW does NOT actually represent the community consensus (as we hope it does), or is stupid, or doesn't work. In those cases, it will get shot down and somebody else will pick up the task of trying to figure out what the community consensus is about how things should be represented in RDF. I should note that I don't think the discussion last Oct/Nov was limited to a clique of TDWG architects. I was an active participant and I certainly don't qualify as a TDWG insider, having only been to one TDWG meeting for less than 24 hours and knowing almost no other TDWG contributors personally. 3. There are a couple of structural things about taxonconcept.org <http://taxonconcept.org> terms and classes that I have questions about and I'll raise them in my second email to come after this one. But I think that one of the most problematic things about taxonconcept.org <http://taxonconcept.org> for me is the way that you describe taxon concepts. I hate to even bring up the subject because it's taken me months just to try to understand what people mean when they are talking about a taxon concept and I don't want to unleash another hundred emails about the minutae of taxon concepts, which people on this list love to talk about. So suffice it to say that the sense that I've gotten from the many posts on the subject is that most people see a Taxon Concept (= = Taxon and similar to a "taxon name use") as the combination of a taxon name and a "sensu" or "secundum" (accordingTo) reference. That's how it is modeled in the TCS model, which is another ratified TDWG standard. That's also how it's modeled in the unfinished TDWG ontology, which despite its unfinished state is nonetheless is actually being used by some people to describe taxon concepts (see http://code.google.com/p/darwin-sw/wiki/ClassTaxon for links to some examples). When I look at how you model taxon concepts such as in http://lod.taxonconcept.org/ses/v6n7p.rdf which describes the species concept http://lod.taxonconcept.org/ses/v6n7p#Species , there are a lot of metadata about the scientific name, related name strings, URIs that represent similar resources, connections to the original description, etc. But I don't see any sensu/secundum reference or a property that links to one. So although http://lod.taxonconcept.org/ses/v6n7p#Species is a cool thing that links to a lot of useful information about Puma concolor, it doesn't seem to be the same thing as what everybody else is calling a taxon concept. If I were to link to your "species concepts", you and I might know what that meant, but nobody else would. That is in contrast to a tc:Taxon (= = tc:TaxonConcept) instance which is defined by reference to the TCS model and I would therefore consider "well known" (and is what we DO reference in DSW). So I think that in some sense, my reluctance to adopt individually "minted" classes and properties comes from the reason why I'm interested in RDF in the first place. I'm actually NOT very interested in using RDF to do reasoning in the "Semantic Web" sense - I guess I'm still a bit of a skeptic about how likely it is that anybody will be able to find out anything useful by doing reasoning on RDF that they suck in from the cloud, particularly if lots of people are using their own minted properties and if different people intend for the classes they use to rdf:type things to mean different things and have different properties. What I AM interested in is figuring out a way to make it possible for people to have a consistent understanding of the meaning of metadata that they discover when they resolve GUIDs. I think that will be increasingly important when projects like BiSciCol get rolling. The only way that I see this as possible is to base properties primarily on vocabularies that a lot of institutions already "understand" and are using, like Darwin Core. That doesn't mean that people will ONLY use Darwin Core. I have already heard plenty of talk on this list about using other vocabularies such as geo:, skos:, and foaf: (with some cautions from Bob). John Wieczorek, the architect of the Darwin Core standard, proposed adding geo: terms to DwC (see http://code.google.com/p/darwincore/issues/detail?id=82). There are also at least four people on this list that I know are interested in trying to make sure that DwC can interface with the OBOE ontology. So I don't think it is fair to say that "TDWG" is opposed to adopting things outside its clique. I just think that people are cautious about supporting things that they are not familiar with (or perhaps don't understand) and in a lot of cases just don't have the time (or aren't willing to take the time) to figure out something new. So I hope that you aren't discouraged that people are slow to jump on the LOD bandwagon. I think that more people will be interested in supporting it when they start seeing tangible applications within our community and that's already starting to happen. Unfortunately, since you are so far ahead of the rest of us in your understanding of how the LOD world works, I think that you are probably doomed to be one of the ones pulling the wagon! :-) I look forward (as I have in the past) to hearing more of your innovative ideas. Steve -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 343-6707 <tel:%28615%29%20343-6707> http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu mailto:pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecies http://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
OK, Pete, I'm going to try to write the other half of the email that I promised. I'm going to start by saying that some of what I'm talking about here has already been posted on the Darwin-SW (DSW) wiki page called RelationshipToExistingModels (http://code.google.com/p/darwin-sw/wiki/RelationshipToExistingModels). I've actually wanted to bring this up with you for about six months, but have never taken the time to put it into an email. Since this is going out to the list, I'll include some comments about background that you already know (assuming a broader audience). I'd welcome your comments and feedback on what I've said and whether you think it is accurate or not.
One of the things that I think makes it difficult for people to follow what you are proposing on taxonconcept.org is that the structure of your RDF is complex. I'm not saying that is a bad thing, I'm just saying that if you combine that with people's general unfamiliarity with RDF and the difficulty that some people have with visualizing RDF in XML format, it just isn't accessible to most people. Even that difficulty in itself isn't necessarily a bad thing because RDF isn't really intended to be understood primarily by people - it's designed to be understood by computers, so many people on this list don't really need to care about it. Nevertheless, in order to have a discussion about a proposal, one must be able to visualize it. I am a very right-brained person and must have maps, diagrams, and graphs to conceptualize things. So the first thing I did was to go to http://www.w3.org/RDF/Validator/, put in the URI your example, and tell the parser to give me graph only. In the RelationshipToExistingModels wiki page, I looked at http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf which provided information about an occurrence. One of the most obvious features of the resulting graph is that it is complex. I used the word "reticulated" because there are many cross-connections between the nodes. It takes a bit of time and a large-screen monitor to sort it all out, but if one ignores the literals and concentrates only on the nodes that are labeled with URIs, the structure is actually very similar to the structure we used in DSW. So that tells me that we (Cam and I) are seeing the biodiversity informatics world in a very similar manner to the way you have been seeing it. On obvious difference is in the class names used to type resources - we mostly used DwC classes and you used ones that you defined in your ontology, but that is a cosmetic difference if one assumes that the classes in DSW and at taxonconcept.org represent the same thing.
If one considers the basic RDF graph for DSW shown on the DSW home page (http://code.google.com/p/darwin-sw/), with the exception of dsw:Token which can be evidence for several things (and foaf:Person which was kind of thrown in at the end), the basic structure of DSW is linear. There is a connection between each class wherever there is a potential need for a one-to-many join between class instances (see triangles [="crow's feet"] on the Fully Normalized Model on the RelationshipToExistingModels wiki page; DSW is like this diagram except there is no Time class, and TaxonNameUsage in the diagram is the Taxon class in DSW). The connections are made by object properties that we defined in the DSW ontology. A major difference between the DSW structure and the structure that can be seen in the RDF graph of the taxonconcept.org occurrence example is that there is not just one connection that can be used to traverse the classes. For example, in DSW, to obtain information about Occurrences that are associated with an Identification, one would have to "surf" from a dwc:Identification instance to dsw:Individual instance using the dsw:identifies property, then from the dsw:Individual instance to the dwc:Occurrence instance using the dsw:hasOccurrence property. Similarly, in the taxonconcept.org example, one could go from the txn:Identification instance to the txn:SpeciesIndividual instance using the txn:identificationOfIndividual property, then from the txn:SpeciesIndividual instance to the txn:Occurrence instance using the txn:individualHasOccurrence property. However, taxonconcept.org also allows one to make the connection from the txn:Identification instance directly to the txn:Occurrence instance using the txn:identificationHasOccurrence, skipping the txn:SpeciesIndividual altogether. Similar "shortcuts" connect other classes in taxonconcept.org whose analogues in DSW are separated by intervening classes, e.g. from txn:Occurrence to txn:SpeciesConcept (roughly analogous to dwc:Taxon) by ttxn:occurrenceHasSpeciesConcept, from dwc_area:Area (roughly analogous to dcterms:Location) to txn:SpeciesConcept by txn:areaHasObservedSpeciesConcept, etc. I don't think the taxonconcept.org ontology has every possible connection between every class, but in theory one could do that if one wanted. There would be even more connections and shortcuts if the taxonconcept.org ontology included a class that is analogous to dwc:Event (it's "flattened out" of the taxonconcept.org model, see http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001710.html and the posts that precede and follow it for more on the topic of "flattening" databases). It is the presence of these "shortcut" properties in the taxonconcept.org ontology that makes it's RDF graph so complex and "reticulated" and the absence of them that makes a DWC RDF graph much simpler.
Which approach is correct? As the old adage says "anybody can say anything about anything". There isn't really anything intrinsically "wrong" with either including or excluding "shortcut" properties. I am guessing that the reason why your ontology has them and DSW doesn't may be a reflection of the reasons why the ontologies were created. From what you've said in the past, I gather that you would like to facilitate assembling masses of metadata in triple stores and run SPARQL queries on them to discover interesting things. Cam and I want to make it possible to apply GUIDs to very diverse kinds of things and be able to track what happens to them if they end up in different places. These are not necessarily mutually exclusive desires, but they do represent a difference in outlook. I know virtually nothing about SPARQL, but at the risk of exposing myself as an ignoramus, I'm going to mention SPARQL queries in this post anyway. I assume from your examples that it is relatively easy to run a query to discover resources that are one object property-step away from a subject resource. I would assume that it would be much more difficult to run such queries on things that are five object property-steps apart. For example, if one wanted to know all of the instances of txn:SpeciesConcept's that occurred at a dwc_area:Area all one would have to do is to search for all of the objects of the txn:areaHasObservedSpeciesConcept properties for instances of that particular dwc_area:Area in a triple store. In DSW, one would need to look for all of the dwc:Events that happened at that dcterms:Location, then find all of the dwc:Occurrences that happened at those dwc:Events, then find out which dsw:Individuals were represented in those dwc:Occurrences, then look up all of the dwc:Identifications for those dsw:Individuals, and finally make a non-redundant list of dwc:Taxon instances that were represented in those dwc:Identifications. I don't know if there is a simple SPARQL query for that, but I doubt it. So from the standpoint of querying, the "shortcut" property method that taxonconcept.org uses is much better.
However, there is an important problem with the "shortcut" strategy. In order to be able to make a simple query that makes use of single-step properties, one must know what kind of query a user will want to make and then make sure that there is a shortcut property that connects the classes of interest. This requires either a crystal ball to be able to predict what people are interested in asking, or just making up properties for every possible shortcut. If I'm doing the math right, with the 6 classes that are included in the existing DwC plus IndividualOrganism (or SpeciesIndividual if you prefer), there would be 15 connections among the classes which would make 30 object properties required to connect them if one wanted every connection to have a pair of inverse properties to enable going in either direction. If one included the Token class, that would make 21 pairs. The burden would then fall on the metadata provider to provide values for all of those properties. And although 21 connections doesn't sound that bad, there could actually be a lot more actually property assignments than that because there isn't any restriction that says that there will only be one value for a property. If an organism has two Identifications, then every xxxHasIdentification kind of property is going to have two values. If there are many Identifications (e.g. http://bioimages.vanderbilt.edu/rdf/examples/lsu000/0428.rdf) there would be many values. Essentially, the metadata provider is left with the job of pre-running every kind of query that a user could possibly want to do.
An alternative to this would be to simply the model by "flattening out" certain classes (making the model less "normalized"). You did that with Event. In my Biodiversity Informatics article (https://journals.ku.edu/index.php/jbi/article/view/3664) I did it for Event and Location. Historically museum people have "flattened out" IndividualOrganism and Token. People normalize out Identification all of the time. As Rich Pyle pointed out in the post I cited above, people "flatten" more complex models into simpler models all the time because it is convenient and it makes their databases simpler and easier to manage. But if our desire is to come up with a general model that will work for museum people and their old specimen labels, bird and whale observation people, DNA barcoding people, people who document live organisms with images and sound, bioblitzers, etc. it has to include every class that participants can reasonably need to have to facilitate needed "one-to-many database joins" or whatever you want to call that. In October, I posted a message to the tdwg-content list where I warned against setting precedents for using "wrong" RDF (http://lists.tdwg.org/pipermail/tdwg-content/2010-October/001663.html). In that post, my point was that people should not apply properties to instances of the wrong class. That is exactly what happens when people simplify models by eliminating classes that have only one-to-one relationships with other classes in their database. So if I were to restate my complaint again, I'd frame it this way: a "wrong" RDF model is one that leaves out classes that potential users may need to express the complexities of their data. This principle was an underlying assumption when we constructed DSW, and to know what classes people needed, we looked at the discussion that took place on the tdwg-content list in Oct/Nov. As I point out on the RelationshipToExistingModels wiki page, we could have included a Time class, but as a practical matter, nobody has expressed a need for it (at least yet). So given this principle, reducing the number of shortcut properties by getting rid of classes is simply not an option for any model that hopes to include all of the kinds of metadata that one would like to describe within a community.
So the bottom line, in my opinion, is that in a model as complex as what we need in the biodiversity informatics community (i.e. a "fully normalized" model) we simply cannot hope to create and assign object properties that connect every class. Hence in DSW we only created the minimum number of object properties needed to express what we considered the fundamental relationships among the classes. What this means is that it simply may not be possible to make simple SPARQL queries on the data to find out what people want to know. Rather, the burden will fall on software developers to create software that can traverse the network of connections among the classes and extract the information that they need to answer the questions they want people to be able to pose through use of their software. Nailing down what the community consensus is on the classes and their connections is the first step to being able to create that kind of software.
This email is already too long, but I think that I need to make one more point about the impossibility of expecting a metadata provider pre-populating all of the necessary "shortcut" properties that one would want to use in simple SPARQL queries. If there is only one person at one institution creating all of the metadata, then it is easy to make sure that all of the subject resources are assigned values for the appropriate shortcut object properties. (I think this is the case in the example SPARQL queries that you have put out on the list, i.e. all of the metadata was provided by you - I didn't go back and look at the examples again, so I could be wrong about that.) However, in the situation that Cam and I are interested in, the various connected resources may be at different institutions with metadata submitted "to the cloud" by different people. For example, the tree http://bioimages.vanderbilt.edu/uncg/84 is in the University of North Carolina at Greensboro arboretum. An image of that tree, http://bioimages.vanderbilt.edu/kirchoff/em1968 , is in the Bioimages image collection. A specimen from that tree, http://bioimages.vanderbilt.edu/specimen/ncu592805 , is in the University of North Carolina herbarium in Chapel Hill. Although at the moment these URIs are all under the http://bioimages.vanderbilt.edu subdomain, I would hope that at some point in the future, there would be permanent GUIDs for all of them (except the image) under someone else's management other than me. Hopefully there will be a GUID for the Taxon assigned to an Identification of the tree which would eventually be managed at some community-maintained place like the Global Name Use Bank (GNUB). So lets say I used the shortcut model and assigned a ""dsw:occurrenceHasTaxon" property (which doesn't actually exist in DSW at the present) to the Occurrence documented by the image in my collection (URI=http://bioimages.vanderbilt.edu/kirchoff/em1968#occ). Now let's say that a /Quercus /expert looks at the UNC specimen and decides that it is some different species (i.e. creates a different dwc:Identification). There now should be an additional value of the "dsw:occurrenceHasTaxon" property of the Occurrence metadata that I'm managing, but I'm not going to know that because the Identification has been made by somebody else, not me. [I should note that the BiSciCol project is hoping to make it possible for people to find out this kind of thing, see http://biscicol.blogspot.com/ .] Is it my responsibility to continually trawl the cloud and always be updating all of the many shortcut properties that would be possible to assign to the resources whose metadata I'm managing? If I don't do that, then SPARQL queries that people would run on "dsw:occurrenceHasTaxon" properties would miss information that had been added to the cloud by others - it would only find out things that I already knew when I created my metadata record for the resource I control. It seems to me that a major point of Linked Open Data is that individuals add to the cloud by contributing their little bit to it and that Wonderful Things happen when people find out stuff by connecting those bits with other bits contributed by other people at another place in the cloud. If we create a system that only works when people are expected to know in advance what those Wonderful Things are, then the whole exercise becomes pointless.
Anyway, I hope that this explains to some extent one of the reasons why Cam and I created DSW rather than just jumping in and using the taxonconcept.org ontology. We wanted something considerably simpler.
I was going to comment/ask about at least one more thing about the ontology at taxonconcept.org, but this email is already way too long, so I'll take that up in a subsequent email.
Steve
Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I was thinking that it would be best to create a separate class that can be used for populations of a species.
This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities
http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar
See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Knowledge Base View (http://bit.ly bit.ly/gMFqR1 http://bit.ly%20bit.ly/gMFqR1
The model mints URI's for the following related entities. See RDF. or KB View
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
Here is how a subset of these would relate to the new #Population Tag and related semantic entities.
This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.
<txn:SpeciesIndividualTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Individual%22%3E dcterms:titleA Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel>
dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Individual</dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesIndividualTag>
Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.
<txn:SpeciesPopulationTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Population%22%3E dcterms:titleA Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel>
dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Population</dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesPopulationTag>
This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars.
<owl:Class rdf:about="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation%22%3E <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Population%22/%3E skos:prefLabelThe population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource="http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individ... <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf%22/%3E </owl:Class>
Respectfully,
- Pete
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu mailto:pdevries@wisc.edu
TaxonConcept http://www.taxonconcept.org/ & GeoSpecies http://lod.geospecies.org/ Knowledge Bases
A Semantic Web, Linked Open Data http://linkeddata.org/ Project
OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence". In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email (http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (/Boloria selene/). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is /Boloria selene/ . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"?
I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species /Bororia selene/. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce.
I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for /Bororia selene/ (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurre... is described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this.
The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything.
Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties.
Steve
Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I was thinking that it would be best to create a separate class that can be used for populations of a species.
This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities
http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar
See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Knowledge Base View (http://bit.ly bit.ly/gMFqR1 http://bit.ly%20bit.ly/gMFqR1
The model mints URI's for the following related entities. See RDF. or KB View
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
Here is how a subset of these would relate to the new #Population Tag and related semantic entities.
This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.
<txn:SpeciesIndividualTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Individual%22%3E dcterms:titleA Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel>
dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Individual</dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesIndividualTag>
Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.
<txn:SpeciesPopulationTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Population%22%3E dcterms:titleA Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel>
dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Population</dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesPopulationTag>
This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars.
<owl:Class rdf:about="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation%22%3E <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Population%22/%3E skos:prefLabelThe population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource="http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individ... <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf%22/%3E </owl:Class>
Respectfully,
- Pete
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu mailto:pdevries@wisc.edu
TaxonConcept http://www.taxonconcept.org/ & GeoSpecies http://lod.geospecies.org/ Knowledge Bases
A Semantic Web, Linked Open Data http://linkeddata.org/ Project
Hi Steve,
I try to take some time to think about your notes, sorry for the delay.
There are many different contexts that can be used when thinking about species and related data.
It is often useful to separate these contexts into different kinds of related entities.
Here are some contexts that I think are useful to separate
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
* Note that in this model a species can have several Taxonomies or classifications. This reflects the reality that the same species has one hierarchy in NCBI and a different one in CoL.
You can find all the tagged images of the Cougar by finding all those that are of the type http://lod.taxonconcept.org/ses/mCcSp#Image
Here is one example of an image that is tagged in this way. (From http://lod.taxonconcept.org/ses/v6n7p.html )
<foaf:Image rdf:about=" http://assets.taxonconcept.org/seuuids/603bebac-cc44-4168-bbf7-b11b976f9d79/... "> <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Image%22/%3E <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <dcterms:source rdf:resource=" http://commons.wikimedia.org/wiki/File:Mountain_lion.jpg%22/%3E dcterms:contributorUnited States Department of Agriculture</dcterms:contributor> <cc:license rdf:resource="http://creativecommons.org/publicdomain/%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </foaf:Image>
You are correct in noting that an occurrence of a species could simply be typed in a similar way, and maybe that would be better than the somewhat awkward.
txn:occurrenceHasSpeciesOccurrenceTag
I originally went with this name because I wanted it to be clear that the subject and objects should be.
If we use this data set as and example http://ocs.taxonconcept.org/ocs/index.html (Mainly TDWG BioBlitz 2010)
We can demonstrate how this is useful for SPARQL Queries.
We can run a SPARQL describe query for all the observations of the Honey Bee with this query.
PREFIX txn: http://lod.taxonconcept.org/ontology/txn.owl#
describe ?s where { ?s txn:occurrenceHasSpeciesOccurrenceTag < http://lod.taxonconcept.org/ses/z9oqP#Occurrence%3E }
* It might be simpler to mark these observations up as having a type of http://lod.taxonconcept.org/ses/z9oqP#Occurrence.
In this case the query would look like this. (You can use "a" as a short cut meaning (http://www.w3.org/1999/02/22-rdf-syntax-ns#type)
PREFIX txn: http://lod.taxonconcept.org/ontology/txn.owl#
describe ?s where { ?s a http://lod.taxonconcept.org/ses/z9oqP#Occurrence }
* I would need to redo the occurrence record RDF for this new query to work
We can take that original query above and paste into the LOD SPARQL Endpoint http://uriburner.com/isparql/ (Advanced Tab)
Run the query
This link will run the query - will probably not go through all email system intact. See bit.ly link below. < http://uriburner.com/isparql/view/?query=PREFIX%20txn%3A%20%20%20%20%20%3Cht...
Bit.ly version http://bit.ly/lM6vWB
and get a esult (Not very pretty, or interpretable by humans)
We can select make "Make Pivot" from the top left corner of the Window.
This will run the query and feed the data to MS Pivot which parses and displays the result.
In theory, and I hope in the future, there will be an open source solution that does this as easily and does not require MS Silverlight.
The result is a Browsable Pivot View which you can select to view the result by Observer, Location etc.
This bit.ly will take you to a view by observer (the person who made the observation) http://bit.ly/lacRb1 This biit.ly will take you to a view by dwcArea http://t.co/eu55BaG
I have bundled all these examples including screenshots into one bit.lybundle so you won't need Sliverlight to get an idea on how this works.
http://bit.ly/iXg2y8 <- Link to Bit.ly bundle with screen shots etc.
I have included closeups of the Pivot settings in the top right corner so you can see how to change the attribute that Pivot uses to create the view.
Note also that if you go to the Knowledge Base View of the Honey Bee you can browse to the observations of that species.
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Bit.ly Link http://bit.ly/g1zzJC
Since I have updated to the latest version of Virtuoso the strange URI links have been replaced with Human readable text from the label view for that entity.
This includes the links to occurrences, gni names strings, and links to GeoNames.
Part of the reasoning behind this structure is to make explicit to computers what context we are talking about.
The human brain makes these context switches automatically but computers do not.
That said there are areas where they could be improved or simplified.
Also I think that you will need a class for each species concept, but they are all instances of txn:SpeciesConcept - something allowed in OWL2.
My ontology has probably changed slightly since you last saw it.
OWL http://lod.taxonconcept.org/ontology/txn.owl
OWL Doc http://lod.taxonconcept.org/ontology/doc/index.html
Respectfully,
- Pete
On Mon, May 2, 2011 at 8:16 AM, Steve Baskauf steve.baskauf@vanderbilt.eduwrote:
OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence"http://lod.taxonconcept.org/ses/mCcSp#Occurrence. In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email ( http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (*Boloria selene*). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is *Boloria selene* . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"?
I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species *Bororia selene*. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce.
I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for *Bororia selene* (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurre... described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this.
The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything.
Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties.
Steve
Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I was thinking that it would be best to create a separate class that can be used for populations of a species.
This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities
http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar
See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Base View (http://bit.ly bit.ly/gMFqR1
The model mints URI's for the following related entities. See RDF. or KB View
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
Here is how a subset of these would relate to the new #Population Tag and related semantic entities.
This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.
<txn:SpeciesIndividualTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Individual%22%3E dcterms:titleA Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Individual </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesIndividualTag>
Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.
<txn:SpeciesPopulationTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Population%22%3E dcterms:titleA Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Population </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesPopulationTag>
This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars.
<owl:Class rdf:about=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation%22%3E <rdf:type rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Population%22/%3E skos:prefLabelThe population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource=" http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individ... "/> <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf%22/%3E </owl:Class>
Respectfully,
- Pete
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://lod.geospecies.org/ Knowledge Bases
A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
Thanks for the information, Pete. I was very interested to try doing a SPARQL query using the urnburner interface (as I have already confessed to lack of experience with that). One thing I was curious about was how OpenLink/uriburner knew what metadata to run the query on. I was going to redo the process to see if there was an option to point to some particular triple store, but the site seems to be down at the moment. Or do they run the query on data that they have "discovered" through links on the cloud or on data that people have asked them to scrape/crawl?
As cool as the SPARQL querying thing is, I still think that I have a general issue with the approach that you are suggesting, i.e. that each "species concept" has a set of classes defined as "partOf" the general species concept class for that species. For the sake of argument, let's say that you manage to describe a species concept for each of the approximately 1.7 million described species. That means that you will have 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Image classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Occurrence classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Indivdual classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Taxonomy classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#NCBI_Taxonomy classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#OriginalDescription classes, and 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Population classes in addition to the 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Species classes that describe the species concept itself. That is a total of 13.6 million separate classes in your model that are needed to describe biodiversity records of life on earth. In contrast, we defined or imported a total of seven classes to do the same thing in Darwin-SW (not counting foaf:Person which is somewhat tangential to the ontology) and those seven classes should be capable of describing biodiversity records of life on earth. My point here is that the structure of the taxonconcept.org ontology seems to be designed around making queries easy (by creating a class for anything that somebody may want to ask about), but not around describing classes that reflect the structure of databases that people in the TDWG community are likely to use. In contrast, simple queries would (it seems to me) be difficult to construct based on Darwin-SW, but it would be relatively easy to adopt the class structure to the primary types of things that people keep track of in databases (even "flattened" databases that only explicitly recognize fewer than the seven classes in Darwin-SW). So it's a trade-off, but it seems like it would be more productive to put the burden on the few software developers (i.e. people who would be creating clients that could search RDF databases/triple stores) than on the many data providers.
I also still do not see how you get around the problem that I mentioned in my May 1 email (http://lists.tdwg.org/pipermail/tdwg-content/2011-May/002385.html). In a nutshell, let's say that a tree in an arboretum has its HTTP URI GUID on a label nailed to its trunk. If I take a picture of that tree (recording evidence of an Occurrence) and assign that tree to a Taxon through an Identification, and somebody else collects a specimen from that tree and assigns that same tree to a different Taxon through their own Identification, how could a query on a txn: species occurrence tag ever show me both the occurrence record associated with the image and the one associated with the specimen? I am going to query for
describe ?s where { ?s a http://lod.taxonconcept.org/ses/[myTaxon]#Occurrence }
which will pick up the occurrence documented by my image, but it would not pick up the occurrence documented by the specimen, which would require the search
describe ?s where { ?s a http://lod.taxonconcept.org/ses/[theOtherPersonsTaxon]#Occurrence }
In other words, the approach that you are suggesting requires me to know in advance what other Identifications somebody else may apply to the tree and either: type my occurrence record with those other taxa tags or know to run a separate query for each of those taxa
Either of these involves mind-reading on my part. This is different than the way one would find this out using Darwin-SW. In Darwin-SW, one would first query for Identifications that specified [myTaxon] and then find the dsw:Individuals associated with those Identifications. Then one would look for all of the dwc:Occurrences that were associated with the dsw:Individuals. The fact that somebody else assigned the tree to a different taxon is irrelevant to me finding the occurrences of the tree. This is messy and I don't see how you could do it with SPARQL, but I don't think it would require complex programming to write software that could do it. Since the taxonconcept.org ontology also has properties to relate occurrences to individuals and individuals to identifications and taxa, one could do the same kind of complex search. But that leaves me wondering what purpose the "lightweight tags" have if they can't be used reliably to search for all of the metadata that others have put out on the cloud. They allow me to find out about things that I already know but restrict my ability to discover unknown things.
Steve
Peter DeVries wrote:
Hi Steve,
I try to take some time to think about your notes, sorry for the delay.
There are many different contexts that can be used when thinking about species and related data.
It is often useful to separate these contexts into different kinds of related entities.
Here are some contexts that I think are useful to separate
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
- Note that in this model a species can have several Taxonomies or
classifications. This reflects the reality that the same species has one hierarchy in NCBI and a different one in CoL.
You can find all the tagged images of the Cougar by finding all those that are of the type http://lod.taxonconcept.org/ses/mCcSp#Image
Here is one example of an image that is tagged in this way. (From http://lod.taxonconcept.org/ses/v6n7p.html )
<foaf:Image rdf:about="http://assets.taxonconcept.org/seuuids/603bebac-cc44-4168-bbf7-b11b976f9d79/... <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Image%22/%3E <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <dcterms:source rdf:resource="http://commons.wikimedia.org/wiki/File:Mountain_lion.jpg%22/%3E dcterms:contributorUnited States Department of Agriculture</dcterms:contributor> <cc:license rdf:resource="http://creativecommons.org/publicdomain/%22/%3E <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </foaf:Image>
You are correct in noting that an occurrence of a species could simply be typed in a similar way, and maybe that would be better than the somewhat awkward.
txn:occurrenceHasSpeciesOccurrenceTag
I originally went with this name because I wanted it to be clear that the subject and objects should be.
If we use this data set as and example http://ocs.taxonconcept.org/ocs/index.html (Mainly TDWG BioBlitz 2010)
We can demonstrate how this is useful for SPARQL Queries.
We can run a SPARQL describe query for all the observations of the Honey Bee with this query.
PREFIX txn: http://lod.taxonconcept.org/ontology/txn.owl#
describe ?s where { ?s txn:occurrenceHasSpeciesOccurrenceTag http://lod.taxonconcept.org/ses/z9oqP#Occurrence }
* It might be simpler to mark these observations up as having a
type of http://lod.taxonconcept.org/ses/z9oqP#Occurrence.
In this case the query would look like this. (You can use "a" as a
short cut meaning (http://www.w3.org/1999/02/22-rdf-syntax-ns#type)
PREFIX txn: http://lod.taxonconcept.org/ontology/txn.owl#
describe ?s where { ?s a http://lod.taxonconcept.org/ses/z9oqP#Occurrence }
- I would need to redo the occurrence record RDF for this new query
to work
We can take that original query above and paste into the LOD SPARQL Endpoint http://uriburner.com/isparql/ (Advanced Tab)
Run the query
This link will run the query - will probably not go through all email system intact. See bit.ly http://bit.ly link below. < http://uriburner.com/isparql/view/?query=PREFIX%20txn%3A%20%20%20%20%20%3Cht... http://uriburner.com/isparql/view/?query=PREFIX%20txn%3A%20%20%20%20%20%3Chttp%3A%2F%2Flod.taxonconcept.org%2Fontology%2Ftxn.owl%23%3E%0A%0Adescribe%20%3Fs%20where%20%7B%20%3Fs%20txn%3AoccurrenceHasSpeciesOccurrenceTag%20%3Chttp%3A%2F%2Flod.taxonconcept.org%2Fses%2Fz9oqP%23Occurrence%3E%20%7D%0A%20&endpoint=/sparql&resultview=navigator&maxrows=50&view=1>
Bit.ly version http://bit.ly/lM6vWB
and get a esult (Not very pretty, or interpretable by humans)
We can select make "Make Pivot" from the top left corner of the Window.
This will run the query and feed the data to MS Pivot which parses and displays the result.
In theory, and I hope in the future, there will be an open source solution that does this as easily and does not require MS Silverlight.
The result is a Browsable Pivot View which you can select to view the result by Observer, Location etc.
This bit.ly http://bit.ly will take you to a view by observer (the person who made the observation) http://bit.ly/lacRb1 This biit.ly http://biit.ly will take you to a view by dwcArea http://t.co/eu55BaG
I have bundled all these examples including screenshots into one bit.ly http://bit.ly bundle so you won't need Sliverlight to get an idea on how this works.
http://bit.ly/iXg2y8 <- Link to Bit.ly bundle with screen shots etc.
I have included closeups of the Pivot settings in the top right corner so you can see how to change the attribute that Pivot uses to create the view.
Note also that if you go to the Knowledge Base View of the Honey Bee you can browse to the observations of that species.
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Bit.ly Link http://bit.ly/g1zzJC
Since I have updated to the latest version of Virtuoso the strange URI links have been replaced with Human readable text from the label view for that entity.
This includes the links to occurrences, gni names strings, and links to GeoNames.
Part of the reasoning behind this structure is to make explicit to computers what context we are talking about.
The human brain makes these context switches automatically but computers do not.
That said there are areas where they could be improved or simplified.
Also I think that you will need a class for each species concept, but they are all instances of txn:SpeciesConcept - something allowed in OWL2.
My ontology has probably changed slightly since you last saw it.
OWL http://lod.taxonconcept.org/ontology/txn.owl
OWL Doc http://lod.taxonconcept.org/ontology/doc/index.html
Respectfully,
- Pete
On Mon, May 2, 2011 at 8:16 AM, Steve Baskauf <steve.baskauf@vanderbilt.edu mailto:steve.baskauf@vanderbilt.edu> wrote:
OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence" <http://lod.taxonconcept.org/ses/mCcSp#Occurrence>. In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email (http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (/Boloria selene/). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is /Boloria selene/ . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"? I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species /Bororia selene/. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce. I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for /Bororia selene/ (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurrence is described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this. The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything. Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties. Steve Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click? I was thinking that it would be best to create a separate class that can be used for populations of a species. This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%2Fses%2Fv6n7p%23Species Knowledge Base View (http://bit.ly bit.ly/gMFqR1 <http://bit.ly%20bit.ly/gMFqR1> The model mints URI's for the following related entities. See RDF. or KB View http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI. Here is how a subset of these would relate to the new #Population Tag and related semantic entities. This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept. <txn:SpeciesIndividualTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Individual"> <dcterms:title>A Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> <skos:prefLabel>A Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel> <dcterms:identifier>http://lod.taxonconcept.org/ses/v6n7p#Individual</dcterms:identifier> <dcterms:description>A lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/> <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/> </txn:SpeciesIndividualTag> Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept. <txn:SpeciesPopulationTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Population"> <dcterms:title>A Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> <skos:prefLabel>A Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel> <dcterms:identifier>http://lod.taxonconcept.org/ses/v6n7p#Population</dcterms:identifier> <dcterms:description>A lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/> <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/> </txn:SpeciesPopulationTag> This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars. <owl:Class rdf:about="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation"> <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Population"/> <skos:prefLabel>The population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource="http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individual"/> <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf"/> </owl:Class> Respectfully, - Pete ------------------------------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu <mailto:pdevries@wisc.edu> TaxonConcept <http://www.taxonconcept.org/> & GeoSpecies <http://lod.geospecies.org/> Knowledge Bases A Semantic Web, Linked Open Data <http://linkeddata.org/> Project ---------------------------------------------------------------------------------------
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 343-6707 <tel:%28615%29%20343-6707> http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu mailto:pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecies http://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
On Tue, May 3, 2011 at 2:15 PM, Steve Baskauf steve.baskauf@vanderbilt.eduwrote:
Thanks for the information, Pete. I was very interested to try doing a SPARQL query using the urnburner interface (as I have already confessed to lack of experience with that). One thing I was curious about was how OpenLink/uriburner knew what metadata to run the query on. I was going to redo the process to see if there was an option to point to some particular triple store, but the site seems to be down at the moment. Or do they run the query on data that they have "discovered" through links on the cloud or on data that people have asked them to scrape/crawl?
I produce a semantic site map file http://lod.taxonconcept.org/rdf/txn_ses.ttl.gz ( http://sw.deri.org/2007/07/sitemapextension/ )
This is documented in both my
void http://lod.taxonconcept.org/ontology/void.rdf CKAN Documentation http://ckan.net/package/taxonconcept
The Cloud Machine periodically checks the sitemap file to see if anything has changed. If so it downloads the RDF Dump files described here.
http://www.taxonconcept.org/rdf_and_sitemap/
You can also use pingthesemanticweb to inform the cloud about a particular file. http://pingthesemanticweb.com/
And or tell Sindice directly about your RDF or sitemap file. http://sindice.com/main/submit
I don't see a difference between 1.7 million RDF files with instances vs 1.7 million RDF files with classes?
It would also be possible to split the hosting of the concepts into different taxonomic groups or institutions.
This query will get you a list of the identifications of the Humpback Whale < http://lod.taxonconcept.org/ses/CsmOq#Species%3E
I think this is the query you were wanting to know how to do?
PREFIX txn: http://lod.taxonconcept.org/ontology/txn.owl# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
describe ?s where { ?s rdf:type txn:Identification. ?s txn:identificationHasSpeciesConcept < http://lod.taxonconcept.org/ses/CsmOq#Species%3E. }
You can also run this query on my endpoint but you don't get the nice Pivot interface. (Or any other endpoint that has the data set)
http://lsd.taxonconcept.org/isparql/
Does this clear thiings up?
Respectfully,
- Pete
As cool as the SPARQL querying thing is, I still think that I have a general issue with the approach that you are suggesting, i.e. that each "species concept" has a set of classes defined as "partOf" the general species concept class for that species. For the sake of argument, let's say that you manage to describe a species concept for each of the approximately 1.7 million described species. That means that you will have 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Image classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Occurrence classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Indivdual classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Taxonomy classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#NCBI_Taxonomy classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#OriginalDescription classes, and 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Population classes in addition to the 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Speciesclasses that describe the species concept itself. That is a total of 13.6 million separate classes in your model that are needed to describe biodiversity records of life on earth. In contrast, we defined or imported a total of seven classes to do the same thing in Darwin-SW (not counting foaf:Person which is somewhat tangential to the ontology) and those seven classes should be capable of describing biodiversity records of life on earth. My point here is that the structure of the taxonconcept.orgontology seems to be designed around making queries easy (by creating a class for anything that somebody may want to ask about), but not around describing classes that reflect the structure of databases that people in the TDWG community are likely to use. In contrast, simple queries would (it seems to me) be difficult to construct based on Darwin-SW, but it would be relatively easy to adopt the class structure to the primary types of things that people keep track of in databases (even "flattened" databases that only explicitly recognize fewer than the seven classes in Darwin-SW). So it's a trade-off, but it seems like it would be more productive to put the burden on the few software developers (i.e. people who would be creating clients that could search RDF databases/triple stores) than on the many data providers.
I also still do not see how you get around the problem that I mentioned in my May 1 email ( http://lists.tdwg.org/pipermail/tdwg-content/2011-May/002385.html). In a nutshell, let's say that a tree in an arboretum has its HTTP URI GUID on a label nailed to its trunk. If I take a picture of that tree (recording evidence of an Occurrence) and assign that tree to a Taxon through an Identification, and somebody else collects a specimen from that tree and assigns that same tree to a different Taxon through their own Identification, how could a query on a txn: species occurrence tag ever show me both the occurrence record associated with the image and the one associated with the specimen? I am going to query for
describe ?s where { ?s a http://lod.taxonconcept.org/ses/[myTaxon]#Occurrencehttp://lod.taxonconcept.org/ses/%5BmyTaxon%5D#Occurrence}
which will pick up the occurrence documented by my image, but it would not pick up the occurrence documented by the specimen, which would require the search
describe ?s where { ?s a http://lod.taxonconcept.org/ses/[theOtherPersonsTaxon]#Occurrencehttp://lod.taxonconcept.org/ses/%5BtheOtherPersonsTaxon%5D#Occurrence}
In other words, the approach that you are suggesting requires me to know in advance what other Identifications somebody else may apply to the tree and either: type my occurrence record with those other taxa tags or know to run a separate query for each of those taxa
Either of these involves mind-reading on my part. This is different than the way one would find this out using Darwin-SW. In Darwin-SW, one would first query for Identifications that specified [myTaxon] and then find the dsw:Individuals associated with those Identifications. Then one would look for all of the dwc:Occurrences that were associated with the dsw:Individuals. The fact that somebody else assigned the tree to a different taxon is irrelevant to me finding the occurrences of the tree. This is messy and I don't see how you could do it with SPARQL, but I don't think it would require complex programming to write software that could do it. Since the taxonconcept.org ontology also has properties to relate occurrences to individuals and individuals to identifications and taxa, one could do the same kind of complex search. But that leaves me wondering what purpose the "lightweight tags" have if they can't be used reliably to search for all of the metadata that others have put out on the cloud. They allow me to find out about things that I already know but restrict my ability to discover unknown things.
Steve
Peter DeVries wrote:
Hi Steve,
I try to take some time to think about your notes, sorry for the delay.
There are many different contexts that can be used when thinking about species and related data.
It is often useful to separate these contexts into different kinds of related entities.
Here are some contexts that I think are useful to separate
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
- Note that in this model a species can have several Taxonomies or
classifications. This reflects the reality that the same species has one hierarchy in NCBI and a different one in CoL.
You can find all the tagged images of the Cougar by finding all those that are of the type http://lod.taxonconcept.org/ses/mCcSp#Image
Here is one example of an image that is tagged in this way. (From http://lod.taxonconcept.org/ses/v6n7p.html )
<foaf:Image rdf:about=" http://assets.taxonconcept.org/seuuids/603bebac-cc44-4168-bbf7-b11b976f9d79/... "> <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Image%22/%3E <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <dcterms:source rdf:resource=" http://commons.wikimedia.org/wiki/File:Mountain_lion.jpg%22/%3E dcterms:contributorUnited States Department of Agriculture</dcterms:contributor> <cc:license rdf:resource="http://creativecommons.org/publicdomain/%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </foaf:Image>
You are correct in noting that an occurrence of a species could simply be typed in a similar way, and maybe that would be better than the somewhat awkward.
txn:occurrenceHasSpeciesOccurrenceTag
I originally went with this name because I wanted it to be clear that the subject and objects should be.
If we use this data set as and example http://ocs.taxonconcept.org/ocs/index.html (Mainly TDWG BioBlitz 2010)
We can demonstrate how this is useful for SPARQL Queries.
We can run a SPARQL describe query for all the observations of the Honey Bee with this query.
PREFIX txn: http://lod.taxonconcept.org/ontology/txn.owl#
describe ?s where { ?s txn:occurrenceHasSpeciesOccurrenceTag < http://lod.taxonconcept.org/ses/z9oqP#Occurrence%3E }
* It might be simpler to mark these observations up as having a type
of http://lod.taxonconcept.org/ses/z9oqP#Occurrence.
In this case the query would look like this. (You can use "a" as a
short cut meaning (http://www.w3.org/1999/02/22-rdf-syntax-ns#type)
PREFIX txn: <http://lod.taxonconcept.org/ontology/txn.owl#>
describe ?s where { ?s a < http://lod.taxonconcept.org/ses/z9oqP#Occurrence%3E }
- I would need to redo the occurrence record RDF for this new query to
work
We can take that original query above and paste into the LOD SPARQL Endpoint http://uriburner.com/isparql/ (Advanced Tab)
Run the query
This link will run the query - will probably not go through all email system intact. See bit.ly link below. < http://uriburner.com/isparql/view/?query=PREFIX%20txn%3A%20%20%20%20%20%3Cht...http://uriburner.com/isparql/view/?query=PREFIX%20txn%3A%20%20%20%20%20%3Chttp%3A%2F%2Flod.taxonconcept.org%2Fontology%2Ftxn.owl%23%3E%0A%0Adescribe%20%3Fs%20where%20%7B%20%3Fs%20txn%3AoccurrenceHasSpeciesOccurrenceTag%20%3Chttp%3A%2F%2Flod.taxonconcept.org%2Fses%2Fz9oqP%23Occurrence%3E%20%7D%0A%20&endpoint=/sparql&resultview=navigator&maxrows=50&view=1
Bit.ly version http://bit.ly/lM6vWB
and get a esult (Not very pretty, or interpretable by humans)
We can select make "Make Pivot" from the top left corner of the Window.
This will run the query and feed the data to MS Pivot which parses and displays the result.
In theory, and I hope in the future, there will be an open source solution that does this as easily and does not require MS Silverlight.
The result is a Browsable Pivot View which you can select to view the result by Observer, Location etc.
This bit.ly will take you to a view by observer (the person who made the observation) http://bit.ly/lacRb1 This biit.ly will take you to a view by dwcArea http://t.co/eu55BaG
I have bundled all these examples including screenshots into one bit.lybundle so you won't need Sliverlight to get an idea on how this works.
http://bit.ly/iXg2y8 <- Link to Bit.ly bundle with screen shots etc.
I have included closeups of the Pivot settings in the top right corner so you can see how to change the attribute that Pivot uses to create the view.
Note also that if you go to the Knowledge Base View of the Honey Bee you can browse to the observations of that species.
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Bit.ly Link http://bit.ly/g1zzJC
Since I have updated to the latest version of Virtuoso the strange URI links have been replaced with Human readable text from the label view for that entity.
This includes the links to occurrences, gni names strings, and links to GeoNames.
Part of the reasoning behind this structure is to make explicit to computers what context we are talking about.
The human brain makes these context switches automatically but computers do not.
That said there are areas where they could be improved or simplified.
Also I think that you will need a class for each species concept, but they are all instances of txn:SpeciesConcept - something allowed in OWL2.
My ontology has probably changed slightly since you last saw it.
OWL http://lod.taxonconcept.org/ontology/txn.owl
OWL Doc http://lod.taxonconcept.org/ontology/doc/index.html
Respectfully,
- Pete
On Mon, May 2, 2011 at 8:16 AM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:
OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence"http://lod.taxonconcept.org/ses/mCcSp#Occurrence. In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email ( http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (*Boloria selene *). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is *Boloria selene* . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"?
I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species *Bororia selene*. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce.
I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for *Bororia selene* (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurre... described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this.
The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything.
Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties.
Steve
Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I was thinking that it would be best to create a separate class that can be used for populations of a species.
This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities
http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar
See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Base View (http://bit.ly bit.ly/gMFqR1 http://bit.ly%20bit.ly/gMFqR1
The model mints URI's for the following related entities. See RDF. or KB View
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
Here is how a subset of these would relate to the new #Population Tag and related semantic entities.
This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.
<txn:SpeciesIndividualTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Individual%22%3E dcterms:titleA Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Individual </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesIndividualTag>
Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.
<txn:SpeciesPopulationTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Population%22%3E dcterms:titleA Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Population </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesPopulationTag>
This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars.
<owl:Class rdf:about=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation%22%3E <rdf:type rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Population%22/%3E skos:prefLabelThe population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource=" http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individ... "/> <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf%22/%3E </owl:Class>
Respectfully,
- Pete
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://lod.geospecies.org/ Knowledge Bases
A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
Comments inline
Peter DeVries wrote:
I produce a semantic site map file http://lod.taxonconcept.org/rdf/txn_ses.ttl.gz ( http://sw.deri.org/2007/07/sitemapextension/ )
[info omitted]
And or tell Sindice directly about your RDF or sitemap file. http://sindice.com/main/submit
OK, cool, I get it. I will have to read up on this and figure out which of the methods would be practical for me to play around with.
I don't see a difference between 1.7 million RDF files with instances vs 1.7 million RDF files with classes?
Well, I hate to say this because there are people on this list who know 100 times as much as I do about modeling. But I was under the impression that one models things by describing classes and the properties that connect them. Classes are (to me) a very different thing than instances of classes. A model containing more than 13.6 million classes is at least 1.9 million times as complicated as a model with 7 classes. I would hate to have to draw an RDF graph of that model. In a model we don't expect people to know in advance how many instances there will ever be of the classes in the model or to predict what those instances will be when we make the model. But it seems entirely reasonable to me to expect there to be a set number of classes in a model that are known before the model is used.
It would also be possible to split the hosting of the concepts into different taxonomic groups or institutions.
This query will get you a list of the identifications of the Humpback Whale < http://lod.taxonconcept.org/ses/CsmOq#Species%3E
I think this is the query you were wanting to know how to do?
PREFIX txn: http://lod.taxonconcept.org/ontology/txn.owl# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
describe ?s where { ?s rdf:type txn:Identification. ?s txn:identificationHasSpeciesConcept http://lod.taxonconcept.org/ses/CsmOq#Species. }
Well, no. This query seems to find Identifications of a Species Concept. I want to know Occurrences of Individuals that have Identifications to a Species Concept. I don't see how you can do that with a simple query if you have a model with nodes that are named by URIs, to which others can independently link their resources (the whole point of Linked Data to my way of thinking). Let's say that we have a bird (an IndividualOrganism/SpeciesIndividual) that has an assigned URI. The bander records an observation that documents an Occurrence at the time the bird is banded. I catch it a year later and collect a DNA sample from it (another Occurrence documented by the DNA). The query that you gave above isn't going to find those Occurrences. It will only find the Identifications. The only way that I can see queries of the sort you are describing to be able to come up with the two Occurrences is if there is a property that directly connects the Occurrence to the Taxon, and I've objected to that because in the spirit of Linked Data, other people ought to be able to link their own Identifications and Occurrences to the Individual. As I tried to describe before, if you shortcut directly from Occurrence to Taxon, you miss the connections that others have made to the intermediate nodes in the graph. Maybe I need to make a diagram of what I'm talking about...
Does this clear thiings up?
Unfortunately not. But thanks for the explanation. I need to read through your other messages carefully before writing more. Steve
Respectfully,
Pete
As cool as the SPARQL querying thing is, I still think that I have a general issue with the approach that you are suggesting, i.e. that each "species concept" has a set of classes defined as "partOf" the general species concept class for that species. For the sake of argument, let's say that you manage to describe a species concept for each of the approximately 1.7 million described species. That means that you will have 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Image classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Occurrence classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Indivdual classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Taxonomy classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#NCBI_Taxonomy classes, 1.7 million http://lod.taxonconcept.org/ses/XXXXX#OriginalDescription classes, and 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Population classes in addition to the 1.7 million http://lod.taxonconcept.org/ses/XXXXX#Species classes that describe the species concept itself. That is a total of 13.6 million separate classes in your model that are needed to describe biodiversity records of life on earth. In contrast, we defined or imported a total of seven classes to do the same thing in Darwin-SW (not counting foaf:Person which is somewhat tangential to the ontology) and those seven classes should be capable of describing biodiversity records of life on earth. My point here is that the structure of the taxonconcept.org http://taxonconcept.org ontology seems to be designed around making queries easy (by creating a class for anything that somebody may want to ask about), but not around describing classes that reflect the structure of databases that people in the TDWG community are likely to use. In contrast, simple queries would (it seems to me) be difficult to construct based on Darwin-SW, but it would be relatively easy to adopt the class structure to the primary types of things that people keep track of in databases (even "flattened" databases that only explicitly recognize fewer than the seven classes in Darwin-SW). So it's a trade-off, but it seems like it would be more productive to put the burden on the few software developers (i.e. people who would be creating clients that could search RDF databases/triple stores) than on the many data providers.
I also still do not see how you get around the problem that I mentioned in my May 1 email (http://lists.tdwg.org/pipermail/tdwg-content/2011-May/002385.html). In a nutshell, let's say that a tree in an arboretum has its HTTP URI GUID on a label nailed to its trunk. If I take a picture of that tree (recording evidence of an Occurrence) and assign that tree to a Taxon through an Identification, and somebody else collects a specimen from that tree and assigns that same tree to a different Taxon through their own Identification, how could a query on a txn: species occurrence tag ever show me both the occurrence record associated with the image and the one associated with the specimen? I am going to query for
describe ?s where { ?s a http://lod.taxonconcept.org/ses/[myTaxon]#Occurrence http://lod.taxonconcept.org/ses/%5BmyTaxon%5D#Occurrence }
which will pick up the occurrence documented by my image, but it would not pick up the occurrence documented by the specimen, which would require the search
describe ?s where { ?s a http://lod.taxonconcept.org/ses/[theOtherPersonsTaxon]#Occurrence http://lod.taxonconcept.org/ses/%5BtheOtherPersonsTaxon%5D#Occurrence }
In other words, the approach that you are suggesting requires me to know in advance what other Identifications somebody else may apply to the tree and either: type my occurrence record with those other taxa tags or know to run a separate query for each of those taxa
Either of these involves mind-reading on my part. This is different than the way one would find this out using Darwin-SW. In Darwin-SW, one would first query for Identifications that specified [myTaxon] and then find the dsw:Individuals associated with those Identifications. Then one would look for all of the dwc:Occurrences that were associated with the dsw:Individuals. The fact that somebody else assigned the tree to a different taxon is irrelevant to me finding the occurrences of the tree. This is messy and I don't see how you could do it with SPARQL, but I don't think it would require complex programming to write software that could do it. Since the taxonconcept.org http://taxonconcept.org ontology also has properties to relate occurrences to individuals and individuals to identifications and taxa, one could do the same kind of complex search. But that leaves me wondering what purpose the "lightweight tags" have if they can't be used reliably to search for all of the metadata that others have put out on the cloud. They allow me to find out about things that I already know but restrict my ability to discover unknown things.
Steve
Peter DeVries wrote:
Hi Steve, I try to take some time to think about your notes, sorry for the delay. There are many different contexts that can be used when thinking about species and related data. It is often useful to separate these contexts into different kinds of related entities. Here are some contexts that I think are useful to separate http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI. * Note that in this model a species can have several Taxonomies or classifications. This reflects the reality that the same species has one hierarchy in NCBI and a different one in CoL. You can find all the tagged images of the Cougar by finding all those that are of the type <http://lod.taxonconcept.org/ses/mCcSp#Image> Here is one example of an image that is tagged in this way. (From http://lod.taxonconcept.org/ses/v6n7p.html ) <foaf:Image rdf:about="http://assets.taxonconcept.org/seuuids/603bebac-cc44-4168-bbf7-b11b976f9d79/Puma_concolor_480x320.jpg"> <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Image"/> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/> <dcterms:source rdf:resource="http://commons.wikimedia.org/wiki/File:Mountain_lion.jpg"/> <dcterms:contributor>United States Department of Agriculture</dcterms:contributor> <cc:license rdf:resource="http://creativecommons.org/publicdomain/"/> <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/> </foaf:Image> You are correct in noting that an occurrence of a species could simply be typed in a similar way, and maybe that would be better than the somewhat awkward. txn:occurrenceHasSpeciesOccurrenceTag I originally went with this name because I wanted it to be clear that the subject and objects should be. If we use this data set as and example http://ocs.taxonconcept.org/ocs/index.html (Mainly TDWG BioBlitz 2010) We can demonstrate how this is useful for SPARQL Queries. We can run a SPARQL describe query for all the observations of the Honey Bee with this query. PREFIX txn: <http://lod.taxonconcept.org/ontology/txn.owl#> describe ?s where { ?s txn:occurrenceHasSpeciesOccurrenceTag <http://lod.taxonconcept.org/ses/z9oqP#Occurrence> } * It might be simpler to mark these observations up as having a type of <http://lod.taxonconcept.org/ses/z9oqP#Occurrence>. In this case the query would look like this. (You can use "a" as a short cut meaning (http://www.w3.org/1999/02/22-rdf-syntax-ns#type) PREFIX txn: <http://lod.taxonconcept.org/ontology/txn.owl#> describe ?s where { ?s a <http://lod.taxonconcept.org/ses/z9oqP#Occurrence> } * I would need to redo the occurrence record RDF for this new query to work We can take that original query above and paste into the LOD SPARQL Endpoint http://uriburner.com/isparql/ (Advanced Tab) Run the query This link will run the query - will probably not go through all email system intact. See bit.ly <http://bit.ly> link below. < http://uriburner.com/isparql/view/?query=PREFIX%20txn%3A%20%20%20%20%20%3Chttp%3A%2F%2Flod.taxonconcept.org%2Fontology%2Ftxn.owl%23%3E%0A%0Adescribe%20%3Fs%20where%20{%20%3Fs%20txn%3AoccurrenceHasSpeciesOccurrenceTag%20%3Chttp%3A%2F%2Flod.taxonconcept.org%2Fses%2Fz9oqP%23Occurrence%3E%20}%0A%20&endpoint=/sparql&resultview=navigator&maxrows=50&view=1 <http://uriburner.com/isparql/view/?query=PREFIX%20txn%3A%20%20%20%20%20%3Chttp%3A%2F%2Flod.taxonconcept.org%2Fontology%2Ftxn.owl%23%3E%0A%0Adescribe%20%3Fs%20where%20%7B%20%3Fs%20txn%3AoccurrenceHasSpeciesOccurrenceTag%20%3Chttp%3A%2F%2Flod.taxonconcept.org%2Fses%2Fz9oqP%23Occurrence%3E%20%7D%0A%20&endpoint=/sparql&resultview=navigator&maxrows=50&view=1>> Bit.ly version http://bit.ly/lM6vWB and get a esult (Not very pretty, or interpretable by humans) We can select make "Make Pivot" from the top left corner of the Window. This will run the query and feed the data to MS Pivot which parses and displays the result. In theory, and I hope in the future, there will be an open source solution that does this as easily and does not require MS Silverlight. The result is a Browsable Pivot View which you can select to view the result by Observer, Location etc. This bit.ly <http://bit.ly> will take you to a view by observer (the person who made the observation) http://bit.ly/lacRb1 This biit.ly <http://biit.ly> will take you to a view by dwcArea http://t.co/eu55BaG I have bundled all these examples including screenshots into one bit.ly <http://bit.ly> bundle so you won't need Sliverlight to get an idea on how this works. http://bit.ly/iXg2y8 <- Link to Bit.ly bundle with screen shots etc. I have included closeups of the Pivot settings in the top right corner so you can see how to change the attribute that Pivot uses to create the view. Note also that if you go to the Knowledge Base View of the Honey Bee you can browse to the observations of that species. http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%2Fses%2Fz9oqP%23Species Bit.ly Link http://bit.ly/g1zzJC Since I have updated to the latest version of Virtuoso the strange URI links have been replaced with Human readable text from the label view for that entity. This includes the links to occurrences, gni names strings, and links to GeoNames. Part of the reasoning behind this structure is to make explicit to computers what context we are talking about. The human brain makes these context switches automatically but computers do not. That said there are areas where they could be improved or simplified. Also I think that you will need a class for each species concept, but they are all instances of txn:SpeciesConcept - something allowed in OWL2. My ontology has probably changed slightly since you last saw it. OWL http://lod.taxonconcept.org/ontology/txn.owl OWL Doc http://lod.taxonconcept.org/ontology/doc/index.html Respectfully, - Pete On Mon, May 2, 2011 at 8:16 AM, Steve Baskauf <steve.baskauf@vanderbilt.edu <mailto:steve.baskauf@vanderbilt.edu>> wrote: OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence" <http://lod.taxonconcept.org/ses/mCcSp#Occurrence>. In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email (http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (/Boloria selene/). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is /Boloria selene/ . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"? I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species /Bororia selene/. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce. I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for /Bororia selene/ (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurrence is described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this. The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything. Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties. Steve Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click? I was thinking that it would be best to create a separate class that can be used for populations of a species. This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%2Fses%2Fv6n7p%23Species Knowledge Base View (http://bit.ly bit.ly/gMFqR1 <http://bit.ly%20bit.ly/gMFqR1> The model mints URI's for the following related entities. See RDF. or KB View http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI. Here is how a subset of these would relate to the new #Population Tag and related semantic entities. This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept. <txn:SpeciesIndividualTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Individual"> <dcterms:title>A Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> <skos:prefLabel>A Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel> <dcterms:identifier>http://lod.taxonconcept.org/ses/v6n7p#Individual</dcterms:identifier> <dcterms:description>A lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/> <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/> </txn:SpeciesIndividualTag> Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept. <txn:SpeciesPopulationTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Population"> <dcterms:title>A Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> <skos:prefLabel>A Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel> <dcterms:identifier>http://lod.taxonconcept.org/ses/v6n7p#Population</dcterms:identifier> <dcterms:description>A lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/> <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/> </txn:SpeciesPopulationTag> This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars. <owl:Class rdf:about="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation"> <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Population"/> <skos:prefLabel>The population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource="http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individual"/> <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf"/> </owl:Class> Respectfully, - Pete ------------------------------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu <mailto:pdevries@wisc.edu> TaxonConcept <http://www.taxonconcept.org/> & GeoSpecies <http://lod.geospecies.org/> Knowledge Bases A Semantic Web, Linked Open Data <http://linkeddata.org/> Project ---------------------------------------------------------------------------------------
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 343-6707 <tel:%28615%29%20343-6707> http://bioimages.vanderbilt.edu -- ------------------------------------------------------------------------------------ Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu <mailto:pdevries@wisc.edu> TaxonConcept <http://www.taxonconcept.org/> & GeoSpecies <http://about.geospecies.org/> Knowledge Bases A Semantic Web, Linked Open Data <http://linkeddata.org/> Project --------------------------------------------------------------------------------------
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 343-6707 <tel:%28615%29%20343-6707> http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu mailto:pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecies http://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
On May 3, 2011, at 9:00 PM, Steve Baskauf wrote:
But I was under the impression that one models things by describing classes and the properties that connect them.
In OWL, properties connect instances, not classes. RDF allows metaclasses (things that are classes and instances), but doing this will throw most (all?) reasoners off the track.
Classes are (to me) a very different thing than instances of classes. A model containing more than 13.6 million classes is at least 1.9 million times as complicated as a model with 7 classes.
Yes and no. I can model a taxonomy as a subclass hierarchy of classes, or as a property-based (memberOf or some such) hierarchy of individuals that all instantiate a single "Taxon" class. The former isn't 1 million times more complex than the latter. However, they are not identical either, and which approach one chooses has significant consequences for how easy it is to express things about those taxa, and for inferring new things from those with a DL reasoner.
I would hate to have to draw an RDF graph of that model
I would as much hate to have to draw an RDF graph of 1.7 million instances. The point being, in order to draw a graph of how someone models a domain you don't draw a graph of the entire RDF triple store.
-hilmar
Comments inline:
Hilmar Lapp wrote:
On May 3, 2011, at 9:00 PM, Steve Baskauf wrote:
But I was under the impression that one models things by describing classes and the properties that connect them.
In OWL, properties connect instances, not classes. RDF allows metaclasses (things that are classes and instances), but doing this will throw most (all?) reasoners off the track.
I knew I would get in trouble talking about this among experts. :-) Thanks for the correction. I should have said "properties that connect instances of those classes". I think that is what I meant. My point was that in creating a model, one doesn't have to enumerate every particular instance, particularly if there are many of them. One can describe the class in general and let the users create the instances that are appropriate for that class.
Classes are (to me) a very different thing than instances of classes. A model containing more than 13.6 million classes is at least 1.9 million times as complicated as a model with 7 classes.
Yes and no. I can model a taxonomy as a subclass hierarchy of classes, or as a property-based (memberOf or some such) hierarchy of individuals that all instantiate a single "Taxon" class. The former isn't 1 million times more complex than the latter. However, they are not identical either, and which approach one chooses has significant consequences for how easy it is to express things about those taxa, and for inferring new things from those with a DL reasoner.
Well, I guess I was influenced in my thinking about this by http://wiki.tdwg.org/twiki/bin/view/TAG/SubclassOrNot , particularly the part about the cats, which I can actually understand pretty well. To me, the part of that wiki page that is most relevant to this discussion is:
"Whether the subclassing option is preferable to the tagging approach depends on the use of the ontology. The TDWG ontology's principal role is not modeling the entire domain to permit inference but allowing the mark up of data so that it will flow between applications as freely as possible. It has to be something that is easy to map into multiple technologies and something that people can agree on rapidly.
This strongly suggests that the tagging approach should be taken wherever possible. First agree on the basic semantic units and model the rest of the semantics with tagging. Only subclass when absolutely necessary."
The really great thing about this is that I can dodge further responsibility by just blaming my way of thinking on the people who posted that page (Roger Hyam modified by Bob Morris, I think). :-) But seriously, I think that the statement above pretty well summarizes what may be the difference between what Pete and I are saying. My primary concern is to allow "the mark up of data so that it will flow between applications as freely as possible". Pete's point may be to permit inferencing. . OK, so let's imagine that we mark up several million records of specimens, tissue samples, and images as RDF. (We don't have to imagine very hard, I think the BiSciCol group is planning to actually do this within the next several months.) I would really like to hear from some of the people who actually use "DL reasoners" (a group which certainly does not include me) to know what it is that we could actually find out that would be useful about that big data blob using reasoners. I have already confessed that my primary concern is enabling data discovery, transfer, and aggregation using GUIDs and RDF. I'm still somewhat of a "semantic web" skeptic as far as the whole inferencing thing is concerned. Aside from inferring "duplicates", I'm really wanting to know what else there is useful that could be reasoned outside of the Taxon/TaxonConcept class. (I can imaging useful reasoning being done about things in that class like the relationships among names, concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15 article https://journals.ku.edu/index.php/jbi/article/view/25) I think this (data markup priority vs. inferencing priority) is an important discussion to have before the tdwg community can settle on some kind of consensus way of turning database records into RDF, particularly if it is going to have a big influence on the way the RDF model is set up. To me, there is a clear and immediate need to be able to mark data up in a straightforward way. If we can get the semantic part, too, that would be great but not at the expense of data markup. I just was at a meeting of a bunch of herbarium curators. They desperately need a way to implement GUIDs and aggregate data and they need it now. I really don't think they care one whit about inferencing. If we coalesce on a model that is great for doing cool things with 10 records but which can't handle hundreds of thousands of records easily and simply, then we are wasting our time. I don't think we need to dither about this for another five years.
I would hate to have to draw an RDF graph of that model
I would as much hate to have to draw an RDF graph of 1.7 million instances. The point being, in order to draw a graph of how someone models a domain you don't draw a graph of the entire RDF triple store.
That was the point I was trying to make (I think).
Thanks for the clarification, Hilmar. Steve
-hilmar
--
: Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :
See, for example,
Mungall et al., “Integrating phenotype ontologies across multiple species”, Genome Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2)
Ward Blondé et al. "Reasoning with bio-ontologies: using relational closure rules to enable practical querying", Bioinformatics (2011) doi: 10.1093/bioinformatics/btr164
Calder, et al. "Machine Reasoning about Anomalous Sensor Data" http://dx.doi.org/10.1016/j.ecoinf.2009.08.007 or in manuscript form at http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf
...
OK, so maybe these knowledge domains are all hypothesis-driven sciences (i.e., sciences), and <whatever dsw is modelling> is not. But that would be sad.
Bob p.s. I had almost finished something else on this thread when Hilmar beat me to the punch. But here's a slightly different expression of his point:
It turns out that the differences between instances and classes is mainly important in contexts in which you have declaimed interest, namely reasoning. In the RDF/RDFS/OWL stack, enforcing a distinction between classes and instances only occurs pretty high up in the stack, when one desires an OWL variant that will offer guarantees that reasoners will finish any inference they are asked to verify, preferably in less than exponential time . I guess, but am not certain, that even in an LOD context, if data are described with an OWL ontology that is known to be intractable, e.g. not in OWL DL, that it is possible to design SPARQL queries that will never complete. In fact, I believe that even with tractable ontologies, there are SPARQL queries that are fundamentally exponential in the number of variables.
p.p.s. Irrelevant, but equivalent, aside about mathematics. At the turn of the 20th century, Whitehead and Russell tried (and failed) to show that everything about numbers could be logically derived from an axiomatic description of the natural numbers (i.e. non-negative integers). It was later shown to be the case that you must include in your logical foundations something deeper, namely the ability to have sets that are elements of other sets (roughly, classes that are individuals in other classes.). Without this, and starting only with the natural numbers, you can logically derive all rational numbers (fractions) and their arithmetic properties, and even all the irrational numbers that are are the solutions of polynomial equations with integer coefficients ("algebraic numbers") such as sqrt(2), and even solutions of the polynomials that have coefficients that are algebraic numbers. But without introducing the notion of the set of subsets of a set, you cannot logically derive the all the interesting transcendental numbers (i.e. those which are not the roots of polynomials), such as e and pi. So if you love calculus, you better not insist on distinguishing instances from classes. But if you are content with polynomials, you can probably be ontologically sloppy. Or, if you don't care about logical foundations of your science, you can forget about the whole thing. :-)
On Tue, May 3, 2011 at 11:51 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
[snip] OK, so let's imagine that we mark up several million records of specimens, tissue samples, and images as RDF. (We don't have to imagine very hard, I think the BiSciCol group is planning to actually do this within the next several months.) I would really like to hear from some of the people who actually use "DL reasoners" (a group which certainly does not include me) to know what it is that we could actually find out that would be useful about that big data blob using reasoners. I have already confessed that my primary concern is enabling data discovery, transfer, and aggregation using GUIDs and RDF. I'm still somewhat of a "semantic web" skeptic as far as the whole inferencing thing is concerned. Aside from inferring "duplicates", I'm really wanting to know what else there is useful that could be reasoned outside of the Taxon/TaxonConcept class. (I can imaging useful reasoning being done about things in that class like the relationships among names, concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15 article https://journals.ku.edu/index.php/jbi/article/view/25)%C2%A0 I think this (data markup priority vs. inferencing priority) is an important discussion to have before the tdwg community can settle on some kind of consensus way of turning database records into RDF, particularly if it is going to have a big influence on the way the RDF model is set up. To me, there is a clear and immediate need to be able to mark data up in a straightforward way. If we can get the semantic part, too, that would be great but not at the expense of data markup. I just was at a meeting of a bunch of herbarium curators. They desperately need a way to implement GUIDs and aggregate data and they need it now. I really don't think they care one whit about inferencing. If we coalesce on a model that is great for doing cool things with 10 records but which can't handle hundreds of thousands of records easily and simply, then we are wasting our time. I don't think we need to dither about this for another five years.
I would hate to have to draw an RDF graph of that model
I would as much hate to have to draw an RDF graph of 1.7 million instances. The point being, in order to draw a graph of how someone models a domain you don't draw a graph of the entire RDF triple store.
That was the point I was trying to make (I think).
Thanks for the clarification, Hilmar. Steve
-hilmar
=========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : ===========================================================
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Thanks, Bob, for the examples. I will try to dig my way through them.
I don't want to give the impression that Darwin-SW was not intended to facilitate any reasoning. That is, after all why it is called "Darwin-SW" instead of "Darwin-data-markup". I know that Cam is quite interested in the "semantic" end of it, and when he has Internet again I hope he will chime in on this. I'm simply confessing what my primary concern is (data markup). When we started working on the ontology, we decided to make it as simple as possible while still trying to permit every (or almost every) kind of class and relationship that was discussed in the Oct/Nov discussion. The result was to have a single class Occurrence whose instances are described by properties, not 1.7 million classes N#occurrence and so on for the other six classes in the model. The intention was that DSW 1.0 would be constructed in such a way that it could support the addition of more complex components (Cam has actually marked the posted version at version 0.2 which means that it is certainly subject to improvement) and possibly more complex reasoning. But the more complex stuff was not put into the model at the start because we wanted something that (hopefully) most people could agree represents reality reasonably well (at least a TDWG form of reality since it uses the structure of DwC as its basis) and hence it would actually have the possibility of being used by more than two people.
I hope that people realize that I'm not making these comments to give Pete a hard time or anything. I really am trying to understand the relative benefits and problems of modeling on class of cat with many properties vs. creating a class of cats for every property we care about. Clearly Pete's interest is in Taxon Concepts in the sense that he has defined them. OK, just to set up a straw man, let's say that I am interested in geography more than taxonomy. So I define a class and URI for every state and province in the world. I have no idea how many of those there are, but I'll guess 400. Now I want to describe other things in the biodiversity informatics world. So I mint classes http://baskaufgeo.org/lod/ohio#occurrence for occurrences that happen in Ohio, http://baskaufgeo.org/lod/swaziland#occurrence for occurrences that happen in Swaziland, http://baskaufgeo.org/lod/tennessee#occurrence, http://baskaufgeo.org/lod/ohio#taxon, http://baskaufgeo.org/lod/swaziland#taxon, http://baskaufgeo.org/lod/tennessee#taxon, etc. etc. for all 400 state/provinces and all seven basic types of things in the biodiversity domain. I can now do cool queries that involve geography.
OK, maybe I'm somebody else and I love thinking about temporal relationships. So I create http://baskauf-time.org/lod/1959may#occurrence for occurrences that happen in May of 1959, http://baskauf-time.org/lod/2005may#occurrence for occurrences that happened in May of 2005, etc. Given a billion or so years of life on earth, that would give me about 12 billion classes for each of the six other basic kinds of things I want to model. I could do all kinds of cool queries that involve time now.
So which one of these three ontologies are we going to adopt? The taxon based one? The time based one? The geography based one? Now we are not just having to chose whether to model things as a single class of cats whose instance have many color and reproductiveMethod properties vs. many classes of cats each defined on the basis of its color. We must decide whether it's better to have many classes of colors each defined by the kind of animal that has that color, or many kinds of reproductive systems, each with different kinds of animals, etc. Where is it going to end and how could we agree on which system to use? It seems to me that it would be better to have a class of cats, a class of reproductive systems, etc. and connect their instances with properties.
Am I somehow thinking about this incorrectly? Steve
Bob Morris wrote:
See, for example,
Mungall et al., “Integrating phenotype ontologies across multiple species”, Genome Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2)
Ward Blondé et al. "Reasoning with bio-ontologies: using relational closure rules to enable practical querying", Bioinformatics (2011) doi: 10.1093/bioinformatics/btr164
Calder, et al. "Machine Reasoning about Anomalous Sensor Data" http://dx.doi.org/10.1016/j.ecoinf.2009.08.007 or in manuscript form at http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf
...
OK, so maybe these knowledge domains are all hypothesis-driven sciences (i.e., sciences), and <whatever dsw is modelling> is not. But that would be sad.
Bob p.s. I had almost finished something else on this thread when Hilmar beat me to the punch. But here's a slightly different expression of his point:
It turns out that the differences between instances and classes is mainly important in contexts in which you have declaimed interest, namely reasoning. In the RDF/RDFS/OWL stack, enforcing a distinction between classes and instances only occurs pretty high up in the stack, when one desires an OWL variant that will offer guarantees that reasoners will finish any inference they are asked to verify, preferably in less than exponential time . I guess, but am not certain, that even in an LOD context, if data are described with an OWL ontology that is known to be intractable, e.g. not in OWL DL, that it is possible to design SPARQL queries that will never complete. In fact, I believe that even with tractable ontologies, there are SPARQL queries that are fundamentally exponential in the number of variables.
p.p.s. Irrelevant, but equivalent, aside about mathematics. At the turn of the 20th century, Whitehead and Russell tried (and failed) to show that everything about numbers could be logically derived from an axiomatic description of the natural numbers (i.e. non-negative integers). It was later shown to be the case that you must include in your logical foundations something deeper, namely the ability to have sets that are elements of other sets (roughly, classes that are individuals in other classes.). Without this, and starting only with the natural numbers, you can logically derive all rational numbers (fractions) and their arithmetic properties, and even all the irrational numbers that are are the solutions of polynomial equations with integer coefficients ("algebraic numbers") such as sqrt(2), and even solutions of the polynomials that have coefficients that are algebraic numbers. But without introducing the notion of the set of subsets of a set, you cannot logically derive the all the interesting transcendental numbers (i.e. those which are not the roots of polynomials), such as e and pi. So if you love calculus, you better not insist on distinguishing instances from classes. But if you are content with polynomials, you can probably be ontologically sloppy. Or, if you don't care about logical foundations of your science, you can forget about the whole thing. :-)
On Tue, May 3, 2011 at 11:51 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
[snip] OK, so let's imagine that we mark up several million records of specimens, tissue samples, and images as RDF. (We don't have to imagine very hard, I think the BiSciCol group is planning to actually do this within the next several months.) I would really like to hear from some of the people who actually use "DL reasoners" (a group which certainly does not include me) to know what it is that we could actually find out that would be useful about that big data blob using reasoners. I have already confessed that my primary concern is enabling data discovery, transfer, and aggregation using GUIDs and RDF. I'm still somewhat of a "semantic web" skeptic as far as the whole inferencing thing is concerned. Aside from inferring "duplicates", I'm really wanting to know what else there is useful that could be reasoned outside of the Taxon/TaxonConcept class. (I can imaging useful reasoning being done about things in that class like the relationships among names, concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15 article https://journals.ku.edu/index.php/jbi/article/view/25) I think this (data markup priority vs. inferencing priority) is an important discussion to have before the tdwg community can settle on some kind of consensus way of turning database records into RDF, particularly if it is going to have a big influence on the way the RDF model is set up. To me, there is a clear and immediate need to be able to mark data up in a straightforward way. If we can get the semantic part, too, that would be great but not at the expense of data markup. I just was at a meeting of a bunch of herbarium curators. They desperately need a way to implement GUIDs and aggregate data and they need it now. I really don't think they care one whit about inferencing. If we coalesce on a model that is great for doing cool things with 10 records but which can't handle hundreds of thousands of records easily and simply, then we are wasting our time. I don't think we need to dither about this for another five years.
I would hate to have to draw an RDF graph of that model
I would as much hate to have to draw an RDF graph of 1.7 million instances. The point being, in order to draw a graph of how someone models a domain you don't draw a graph of the entire RDF triple store.
That was the point I was trying to make (I think).
Thanks for the clarification, Hilmar. Steve
-hilmar
=========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : ===========================================================
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Oh yeah, I forgot to say in the interest of defining acronyms used, TDWG stands for "Biodiversity Information Standards". It supposedly had grown beyond "Taxonomic Databases Working Group" and "focuses on the development of standards for the exchange of biological/biodiversity data." (http://www.tdwg.org/about-tdwg/). ;-) Steve.
Steve Baskauf wrote:
Thanks, Bob, for the examples. I will try to dig my way through them.
I don't want to give the impression that Darwin-SW was not intended to facilitate any reasoning. That is, after all why it is called "Darwin-SW" instead of "Darwin-data-markup". I know that Cam is quite interested in the "semantic" end of it, and when he has Internet again I hope he will chime in on this. I'm simply confessing what my primary concern is (data markup). When we started working on the ontology, we decided to make it as simple as possible while still trying to permit every (or almost every) kind of class and relationship that was discussed in the Oct/Nov discussion. The result was to have a single class Occurrence whose instances are described by properties, not 1.7 million classes N#occurrence and so on for the other six classes in the model. The intention was that DSW 1.0 would be constructed in such a way that it could support the addition of more complex components (Cam has actually marked the posted version at version 0.2 which means that it is certainly subject to improvement) and possibly more complex reasoning. But the more complex stuff was not put into the model at the start because we wanted something that (hopefully) most people could agree represents reality reasonably well (at least a TDWG form of reality since it uses the structure of DwC as its basis) and hence it would actually have the possibility of being used by more than two people.
I hope that people realize that I'm not making these comments to give Pete a hard time or anything. I really am trying to understand the relative benefits and problems of modeling on class of cat with many properties vs. creating a class of cats for every property we care about. Clearly Pete's interest is in Taxon Concepts in the sense that he has defined them. OK, just to set up a straw man, let's say that I am interested in geography more than taxonomy. So I define a class and URI for every state and province in the world. I have no idea how many of those there are, but I'll guess 400. Now I want to describe other things in the biodiversity informatics world. So I mint classes http://baskaufgeo.org/lod/ohio#occurrence for occurrences that happen in Ohio, http://baskaufgeo.org/lod/swaziland#occurrence for occurrences that happen in Swaziland, http://baskaufgeo.org/lod/tennessee#occurrence, http://baskaufgeo.org/lod/ohio#taxon, http://baskaufgeo.org/lod/swaziland#taxon, http://baskaufgeo.org/lod/tennessee#taxon, etc. etc. for all 400 state/provinces and all seven basic types of things in the biodiversity domain. I can now do cool queries that involve geography.
OK, maybe I'm somebody else and I love thinking about temporal relationships. So I create http://baskauf-time.org/lod/1959may#occurrence for occurrences that happen in May of 1959, http://baskauf-time.org/lod/2005may#occurrence for occurrences that happened in May of 2005, etc. Given a billion or so years of life on earth, that would give me about 12 billion classes for each of the six other basic kinds of things I want to model. I could do all kinds of cool queries that involve time now.
So which one of these three ontologies are we going to adopt? The taxon based one? The time based one? The geography based one? Now we are not just having to chose whether to model things as a single class of cats whose instance have many color and reproductiveMethod properties vs. many classes of cats each defined on the basis of its color. We must decide whether it's better to have many classes of colors each defined by the kind of animal that has that color, or many kinds of reproductive systems, each with different kinds of animals, etc. Where is it going to end and how could we agree on which system to use? It seems to me that it would be better to have a class of cats, a class of reproductive systems, etc. and connect their instances with properties.
Am I somehow thinking about this incorrectly? Steve
Bob Morris wrote:
See, for example,
Mungall et al., “Integrating phenotype ontologies across multiple species”, Genome Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2)
Ward Blondé et al. "Reasoning with bio-ontologies: using relational closure rules to enable practical querying", Bioinformatics (2011) doi: 10.1093/bioinformatics/btr164
Calder, et al. "Machine Reasoning about Anomalous Sensor Data" http://dx.doi.org/10.1016/j.ecoinf.2009.08.007 or in manuscript form at http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf
...
OK, so maybe these knowledge domains are all hypothesis-driven sciences (i.e., sciences), and <whatever dsw is modelling> is not. But that would be sad.
Bob p.s. I had almost finished something else on this thread when Hilmar beat me to the punch. But here's a slightly different expression of his point:
It turns out that the differences between instances and classes is mainly important in contexts in which you have declaimed interest, namely reasoning. In the RDF/RDFS/OWL stack, enforcing a distinction between classes and instances only occurs pretty high up in the stack, when one desires an OWL variant that will offer guarantees that reasoners will finish any inference they are asked to verify, preferably in less than exponential time . I guess, but am not certain, that even in an LOD context, if data are described with an OWL ontology that is known to be intractable, e.g. not in OWL DL, that it is possible to design SPARQL queries that will never complete. In fact, I believe that even with tractable ontologies, there are SPARQL queries that are fundamentally exponential in the number of variables.
p.p.s. Irrelevant, but equivalent, aside about mathematics. At the turn of the 20th century, Whitehead and Russell tried (and failed) to show that everything about numbers could be logically derived from an axiomatic description of the natural numbers (i.e. non-negative integers). It was later shown to be the case that you must include in your logical foundations something deeper, namely the ability to have sets that are elements of other sets (roughly, classes that are individuals in other classes.). Without this, and starting only with the natural numbers, you can logically derive all rational numbers (fractions) and their arithmetic properties, and even all the irrational numbers that are are the solutions of polynomial equations with integer coefficients ("algebraic numbers") such as sqrt(2), and even solutions of the polynomials that have coefficients that are algebraic numbers. But without introducing the notion of the set of subsets of a set, you cannot logically derive the all the interesting transcendental numbers (i.e. those which are not the roots of polynomials), such as e and pi. So if you love calculus, you better not insist on distinguishing instances from classes. But if you are content with polynomials, you can probably be ontologically sloppy. Or, if you don't care about logical foundations of your science, you can forget about the whole thing. :-)
On Tue, May 3, 2011 at 11:51 PM, Steve Baskauf steve.baskauf@vanderbilt.edu wrote:
[snip] OK, so let's imagine that we mark up several million records of specimens, tissue samples, and images as RDF. (We don't have to imagine very hard, I think the BiSciCol group is planning to actually do this within the next several months.) I would really like to hear from some of the people who actually use "DL reasoners" (a group which certainly does not include me) to know what it is that we could actually find out that would be useful about that big data blob using reasoners. I have already confessed that my primary concern is enabling data discovery, transfer, and aggregation using GUIDs and RDF. I'm still somewhat of a "semantic web" skeptic as far as the whole inferencing thing is concerned. Aside from inferring "duplicates", I'm really wanting to know what else there is useful that could be reasoned outside of the Taxon/TaxonConcept class. (I can imaging useful reasoning being done about things in that class like the relationships among names, concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15 article https://journals.ku.edu/index.php/jbi/article/view/25) I think this (data markup priority vs. inferencing priority) is an important discussion to have before the tdwg community can settle on some kind of consensus way of turning database records into RDF, particularly if it is going to have a big influence on the way the RDF model is set up. To me, there is a clear and immediate need to be able to mark data up in a straightforward way. If we can get the semantic part, too, that would be great but not at the expense of data markup. I just was at a meeting of a bunch of herbarium curators. They desperately need a way to implement GUIDs and aggregate data and they need it now. I really don't think they care one whit about inferencing. If we coalesce on a model that is great for doing cool things with 10 records but which can't handle hundreds of thousands of records easily and simply, then we are wasting our time. I don't think we need to dither about this for another five years.
I would hate to have to draw an RDF graph of that model
I would as much hate to have to draw an RDF graph of 1.7 million instances. The point being, in order to draw a graph of how someone models a domain you don't draw a graph of the entire RDF triple store.
That was the point I was trying to make (I think).
Thanks for the clarification, Hilmar. Steve
-hilmar
=========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : ===========================================================
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707 http://bioimages.vanderbilt.edu
Hi Steve, Bob and Hilmar,
It might be helpful to think of it this way.
This species concept http://lod.taxonconcept.org/ses/mCcSp#Species
is *both* an instance of txn:SpeciesConcept and an owl:Class
The occurrence record http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurre...
http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurrenceis *both* an instance of txn:Occurrence and an instance of http://lod.taxonconcept.org/ses/ICmLC#Occurrence
This identification record has links back to the species concept, occurrence and individual
< http://lsd.taxonconcept.org/describe/?url=http://ocs.taxonconcept.org/ocs/1d...
bit.ly http://bit.ly/jtLgNu
My reasoning behind the current structure is that you want to be able to easily query for:
*Occurrences at at the TDWG BioBlitz*
PREFIX txn: http://lod.taxonconcept.org/ontology/txn.owl# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX dcterms: http://purl.org/dc/terms/ PREFIX foaf: http://xmlns.com/foaf/0.1/
select distinct ?s, ?o as ?image, ?kingdom, ?phylum, ?class, ?order, ?family, ?genus, ?sciname, ?cname where { ?s rdf:type txn:Occurrence. ?s dcterms:isPartOf < http://lod.taxonconcept.org/ontology/txn.owl#TDWG2010_BioBlitz%3E. ?s txn:kingdom ?kingdom. ?s txn:phylum ?phylum. ?s txn:class ?class. ?s txn:order ?order. ?s txn:family ?family. ?s txn:genus ?genus. ?s txn:hasScientificName ?sciname. optional {?s foaf:depiction ?o. ?s txn:commonName ?cname}. }
Run This Query: bit.ly/kZ8C1Q
*Species expected in Massachusetts* * * PREFIX txn: http://lod.taxonconcept.org/ontology/txn.owl# PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX dcterms: http://purl.org/dc/terms/ PREFIX massachusetts: http://sws.geonames.org/6254926/
select distinct ?s, ?sciname, ?cname, ?concept where { ?s rdf:type txn:SpeciesConcept. ?s txn:isExpectedIn massachusetts:. ?s txn:hasScientificName ?sciname. ?s dcterms:identifier ?concept. optional {?s txn:commonName ?cname}. }
Run This Query http://bit.ly/mt3Tsx
And get results free of all inappropriate identifications.
Do you want the misidentifications showing up in these species lists?
How would a general user correctly determine which of these identifications are correct?
Those that are interested in looking at the identification history of a particular specimen can do so,
It would also be possible to create your own identification RDF file and apply that to the data set.
As to what vocabulary to use, I think it is best to use what exists as long as it works properly.
With weighting based on how many other data sets use those same URI's
I use Geonames for locations and DBpedia for Taxonomic Authors. (I also link to a lot of similar related data sets either through URI's or their ID like ITIS.)
I wish the BHL would expose URI's for publications and GBIF would expose URI's for specimens - especially type specimens.
There are efforts to document the well known LOD vocabularies and work out interoperability issues. Here is a sample.
http://www4.wiwiss.fu-berlin.de/lodcloud/state/ <= Lists best practices and what data sets seem to be following them. GeoSpecies ~ TaxonConcept
http://labs.mondeca.com/dataset/lov/index.html
I am still thinking about how to handle multiple classifications.
The current thinking has been to markup the different hierarchies with things like #Taxonomy and #NCBI_Taxonomy.
If someone then chooses to tie their version of the TaxonConcept Species concepts to a specific hierarchy they can create a sameAs mapping file that makes
ID#Taxonomy owl:sameAs ID#Species
I would also like to use URI's for the different clades like I have with GeoSpecies but that will take some work.
Another option would be to do something like this.
txn_kingdom: Animalia or URI_To_Animailia txn_phylum: Chordata or URI_To_Chordata
ncbi_kingdom: URI_To_UniProt_Animalia ncbi_phylum: : URI_To_UniProt_Chordata
That is have different predicates to allow one species concept to have several different taxonomic hierarchies.
These operate as tags, not as subclasses.
One issue with Uniprot and Bio2RDF is that the clades are subclassed, so you don't want to do owl:SameAs unless you want to entail that subclassing.
Here is an example of a Uniprot taxon http://www.uniprot.org/taxonomy/6426.rdf
In regards to missing potential identifications or occurrences, I don't know how much a problem this would actually be since they should show up in the Cloud.
However, it might be interesting to creating a listener that watches for ?:Occurrences and ?:Identifications and harvests them.
Or make a PingTheSemanticWeb type service for them.
Respectfully,
- Pete
On Wed, May 4, 2011 at 6:04 AM, Steve Baskauf steve.baskauf@vanderbilt.eduwrote:
Thanks, Bob, for the examples. I will try to dig my way through them.
I don't want to give the impression that Darwin-SW was not intended to facilitate any reasoning. That is, after all why it is called "Darwin-SW" instead of "Darwin-data-markup". I know that Cam is quite interested in the "semantic" end of it, and when he has Internet again I hope he will chime in on this. I'm simply confessing what my primary concern is (data markup). When we started working on the ontology, we decided to make it as simple as possible while still trying to permit every (or almost every) kind of class and relationship that was discussed in the Oct/Nov discussion. The result was to have a single class Occurrence whose instances are described by properties, not 1.7 million classes N#occurrence and so on for the other six classes in the model. The intention was that DSW 1.0 would be constructed in such a way that it could support the addition of more complex components (Cam has actually marked the posted version at version 0.2 which means that it is certainly subject to improvement) and possibly more complex reasoning. But the more complex stuff was not put into the model at the start because we wanted something that (hopefully) most people could agree represents reality reasonably well (at least a TDWG form of reality since it uses the structure of DwC as its basis) and hence it would actually have the possibility of being used by more than two people.
I hope that people realize that I'm not making these comments to give Pete a hard time or anything. I really am trying to understand the relative benefits and problems of modeling on class of cat with many properties vs. creating a class of cats for every property we care about. Clearly Pete's interest is in Taxon Concepts in the sense that he has defined them. OK, just to set up a straw man, let's say that I am interested in geography more than taxonomy. So I define a class and URI for every state and province in the world. I have no idea how many of those there are, but I'll guess 400. Now I want to describe other things in the biodiversity informatics world. So I mint classes http://baskaufgeo.org/lod/ohio#occurrence for occurrences that happen in Ohio, http://baskaufgeo.org/lod/swaziland#occurrence for occurrences that happen in Swaziland, http://baskaufgeo.org/lod/tennessee#occurrence, http://baskaufgeo.org/lod/ohio#taxon, http://baskaufgeo.org/lod/swaziland#taxon, http://baskaufgeo.org/lod/tennessee#taxon, etc. etc. for all 400 state/provinces and all seven basic types of things in the biodiversity domain. I can now do cool queries that involve geography.
OK, maybe I'm somebody else and I love thinking about temporal relationships. So I create http://baskauf-time.org/lod/1959may#occurrencefor occurrences that happen in May of 1959, http://baskauf-time.org/lod/2005may#occurrence for occurrences that happened in May of 2005, etc. Given a billion or so years of life on earth, that would give me about 12 billion classes for each of the six other basic kinds of things I want to model. I could do all kinds of cool queries that involve time now.
So which one of these three ontologies are we going to adopt? The taxon based one? The time based one? The geography based one? Now we are not just having to chose whether to model things as a single class of cats whose instance have many color and reproductiveMethod properties vs. many classes of cats each defined on the basis of its color. We must decide whether it's better to have many classes of colors each defined by the kind of animal that has that color, or many kinds of reproductive systems, each with different kinds of animals, etc. Where is it going to end and how could we agree on which system to use? It seems to me that it would be better to have a class of cats, a class of reproductive systems, etc. and connect their instances with properties.
Am I somehow thinking about this incorrectly? Steve
Bob Morris wrote:
See, for example,
Mungall et al., “Integrating phenotype ontologies across multiple species”, Genome Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2)
Ward Blondé et al. "Reasoning with bio-ontologies: using relational closure rules to enable practical querying", Bioinformatics (2011) doi: 10.1093/bioinformatics/btr164
Calder, et al. "Machine Reasoning about Anomalous Sensor Data"http://dx.doi.org/10.1016/j.ecoinf.2009.08.007 or in manuscript form at http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf
...
OK, so maybe these knowledge domains are all hypothesis-driven sciences (i.e., sciences), and <whatever dsw is modelling> is not. But that would be sad.
Bob p.s. I had almost finished something else on this thread when Hilmar beat me to the punch. But here's a slightly different expression of his point:
It turns out that the differences between instances and classes is mainly important in contexts in which you have declaimed interest, namely reasoning. In the RDF/RDFS/OWL stack, enforcing a distinction between classes and instances only occurs pretty high up in the stack, when one desires an OWL variant that will offer guarantees that reasoners will finish any inference they are asked to verify, preferably in less than exponential time . I guess, but am not certain, that even in an LOD context, if data are described with an OWL ontology that is known to be intractable, e.g. not in OWL DL, that it is possible to design SPARQL queries that will never complete. In fact, I believe that even with tractable ontologies, there are SPARQL queries that are fundamentally exponential in the number of variables.
p.p.s. Irrelevant, but equivalent, aside about mathematics. At the turn of the 20th century, Whitehead and Russell tried (and failed) to show that everything about numbers could be logically derived from an axiomatic description of the natural numbers (i.e. non-negative integers). It was later shown to be the case that you must include in your logical foundations something deeper, namely the ability to have sets that are elements of other sets (roughly, classes that are individuals in other classes.). Without this, and starting only with the natural numbers, you can logically derive all rational numbers (fractions) and their arithmetic properties, and even all the irrational numbers that are are the solutions of polynomial equations with integer coefficients ("algebraic numbers") such as sqrt(2), and even solutions of the polynomials that have coefficients that are algebraic numbers. But without introducing the notion of the set of subsets of a set, you cannot logically derive the all the interesting transcendental numbers (i.e. those which are not the roots of polynomials), such as e and pi. So if you love calculus, you better not insist on distinguishing instances from classes. But if you are content with polynomials, you can probably be ontologically sloppy. Or, if you don't care about logical foundations of your science, you can forget about the whole thing. :-)
On Tue, May 3, 2011 at 11:51 PM, Steve Baskaufsteve.baskauf@vanderbilt.edu steve.baskauf@vanderbilt.edu wrote:
[snip] OK, so let's imagine that we mark up several million records of specimens, tissue samples, and images as RDF. (We don't have to imagine very hard, I think the BiSciCol group is planning to actually do this within the next several months.) I would really like to hear from some of the people who actually use "DL reasoners" (a group which certainly does not include me) to know what it is that we could actually find out that would be useful about that big data blob using reasoners. I have already confessed that my primary concern is enabling data discovery, transfer, and aggregation using GUIDs and RDF. I'm still somewhat of a "semantic web" skeptic as far as the whole inferencing thing is concerned. Aside from inferring "duplicates", I'm really wanting to know what else there is useful that could be reasoned outside of the Taxon/TaxonConcept class. (I can imaging useful reasoning being done about things in that class like the relationships among names, concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15 article https://journals.ku.edu/index.php/jbi/article/view/25) I think this (data markup priority vs. inferencing priority) is an important discussion to have before the tdwg community can settle on some kind of consensus way of turning database records into RDF, particularly if it is going to have a big influence on the way the RDF model is set up. To me, there is a clear and immediate need to be able to mark data up in a straightforward way. If we can get the semantic part, too, that would be great but not at the expense of data markup. I just was at a meeting of a bunch of herbarium curators. They desperately need a way to implement GUIDs and aggregate data and they need it now. I really don't think they care one whit about inferencing. If we coalesce on a model that is great for doing cool things with 10 records but which can't handle hundreds of thousands of records easily and simply, then we are wasting our time. I don't think we need to dither about this for another five years.
I would hate to have to draw an RDF graph of that model
I would as much hate to have to draw an RDF graph of 1.7 million instances. The point being, in order to draw a graph of how someone models a domain you don't draw a graph of the entire RDF triple store.
That was the point I was trying to make (I think).
Thanks for the clarification, Hilmar. Steve
-hilmar
=========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : ===========================================================
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
OK, I think that I have already probably said more than people want to hear on this subject. So I will stop with this: 1. It does not appear that there is anything "wrong" with the taxonconcept.org from a technical standpoint. It does what Pete wants it to do and that is very cool. 2. I believe that there are aspects of the taxonconcept.org (introduced for convenience in querying) that make it much more complicated than I think are necessary to represent the core conceptual entities in the biodiversity informatics community. I believe (for reasons articulated previously) that some of those complexities may introduce problems in a distributed system where people of different institutions are linking to each other's URIs. 3. I believe that the way that taxonconcept.org conceptualizes some of these core entities is not congruent with the most common opinions that I have heard expressed on this list. Note that I am not saying that the taxonconcept.org conceptualization is "wrong". I am saying that in some ways it differs significantly from what I perceive to be the community consensus. On the issue of the representation of taxa and names I am going to have to defer to the opinion of others (and there is no shortage of people on the list who are experts on this subject). However, I will say that if one says:
And get results free of all inappropriate identifications.
Do you want the misidentifications showing up in these species lists?
How would a general user correctly determine which of these identifications are correct?
who is going to be the judge of "correct"? I don't want to be around when that cat fight erupts. I do think there ought to be some way that a determiner can indicate that they may have made a mistake on their own Identification. But I think multiple Identifications better be multiple opinions or else there will never be a system that will be supported by diverse participants.
Steve
On Wed, May 4, 2011 at 6:04 AM, Steve Baskauf <steve.baskauf@vanderbilt.edu mailto:steve.baskauf@vanderbilt.edu> wrote:
Thanks, Bob, for the examples. I will try to dig my way through them. I don't want to give the impression that Darwin-SW was not intended to facilitate any reasoning. That is, after all why it is called "Darwin-SW" instead of "Darwin-data-markup". I know that Cam is quite interested in the "semantic" end of it, and when he has Internet again I hope he will chime in on this. I'm simply confessing what my primary concern is (data markup). When we started working on the ontology, we decided to make it as simple as possible while still trying to permit every (or almost every) kind of class and relationship that was discussed in the Oct/Nov discussion. The result was to have a single class Occurrence whose instances are described by properties, not 1.7 million classes N#occurrence and so on for the other six classes in the model. The intention was that DSW 1.0 would be constructed in such a way that it could support the addition of more complex components (Cam has actually marked the posted version at version 0.2 which means that it is certainly subject to improvement) and possibly more complex reasoning. But the more complex stuff was not put into the model at the start because we wanted something that (hopefully) most people could agree represents reality reasonably well (at least a TDWG form of reality since it uses the structure of DwC as its basis) and hence it would actually have the possibility of being used by more than two people. I hope that people realize that I'm not making these comments to give Pete a hard time or anything. I really am trying to understand the relative benefits and problems of modeling on class of cat with many properties vs. creating a class of cats for every property we care about. Clearly Pete's interest is in Taxon Concepts in the sense that he has defined them. OK, just to set up a straw man, let's say that I am interested in geography more than taxonomy. So I define a class and URI for every state and province in the world. I have no idea how many of those there are, but I'll guess 400. Now I want to describe other things in the biodiversity informatics world. So I mint classes http://baskaufgeo.org/lod/ohio#occurrence for occurrences that happen in Ohio, http://baskaufgeo.org/lod/swaziland#occurrence for occurrences that happen in Swaziland, http://baskaufgeo.org/lod/tennessee#occurrence, http://baskaufgeo.org/lod/ohio#taxon, http://baskaufgeo.org/lod/swaziland#taxon, http://baskaufgeo.org/lod/tennessee#taxon, etc. etc. for all 400 state/provinces and all seven basic types of things in the biodiversity domain. I can now do cool queries that involve geography. OK, maybe I'm somebody else and I love thinking about temporal relationships. So I create http://baskauf-time.org/lod/1959may#occurrence for occurrences that happen in May of 1959, http://baskauf-time.org/lod/2005may#occurrence for occurrences that happened in May of 2005, etc. Given a billion or so years of life on earth, that would give me about 12 billion classes for each of the six other basic kinds of things I want to model. I could do all kinds of cool queries that involve time now. So which one of these three ontologies are we going to adopt? The taxon based one? The time based one? The geography based one? Now we are not just having to chose whether to model things as a single class of cats whose instance have many color and reproductiveMethod properties vs. many classes of cats each defined on the basis of its color. We must decide whether it's better to have many classes of colors each defined by the kind of animal that has that color, or many kinds of reproductive systems, each with different kinds of animals, etc. Where is it going to end and how could we agree on which system to use? It seems to me that it would be better to have a class of cats, a class of reproductive systems, etc. and connect their instances with properties. Am I somehow thinking about this incorrectly? Steve Bob Morris wrote:
See, for example, Mungall et al., “Integrating phenotype ontologies across multiple species”, Genome Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2) Ward Blondé et al. "Reasoning with bio-ontologies: using relational closure rules to enable practical querying", Bioinformatics (2011) doi: 10.1093/bioinformatics/btr164 Calder, et al. "Machine Reasoning about Anomalous Sensor Data" http://dx.doi.org/10.1016/j.ecoinf.2009.08.007 or in manuscript form at http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf ... OK, so maybe these knowledge domains are all hypothesis-driven sciences (i.e., sciences), and <whatever dsw is modelling> is not. But that would be sad. Bob p.s. I had almost finished something else on this thread when Hilmar beat me to the punch. But here's a slightly different expression of his point: It turns out that the differences between instances and classes is mainly important in contexts in which you have declaimed interest, namely reasoning. In the RDF/RDFS/OWL stack, enforcing a distinction between classes and instances only occurs pretty high up in the stack, when one desires an OWL variant that will offer guarantees that reasoners will finish any inference they are asked to verify, preferably in less than exponential time . I guess, but am not certain, that even in an LOD context, if data are described with an OWL ontology that is known to be intractable, e.g. not in OWL DL, that it is possible to design SPARQL queries that will never complete. In fact, I believe that even with tractable ontologies, there are SPARQL queries that are fundamentally exponential in the number of variables. p.p.s. Irrelevant, but equivalent, aside about mathematics. At the turn of the 20th century, Whitehead and Russell tried (and failed) to show that everything about numbers could be logically derived from an axiomatic description of the natural numbers (i.e. non-negative integers). It was later shown to be the case that you must include in your logical foundations something deeper, namely the ability to have sets that are elements of other sets (roughly, classes that are individuals in other classes.). Without this, and starting only with the natural numbers, you can logically derive all rational numbers (fractions) and their arithmetic properties, and even all the irrational numbers that are are the solutions of polynomial equations with integer coefficients ("algebraic numbers") such as sqrt(2), and even solutions of the polynomials that have coefficients that are algebraic numbers. But without introducing the notion of the set of subsets of a set, you cannot logically derive the all the interesting transcendental numbers (i.e. those which are not the roots of polynomials), such as e and pi. So if you love calculus, you better not insist on distinguishing instances from classes. But if you are content with polynomials, you can probably be ontologically sloppy. Or, if you don't care about logical foundations of your science, you can forget about the whole thing. :-) On Tue, May 3, 2011 at 11:51 PM, Steve Baskauf <steve.baskauf@vanderbilt.edu> <mailto:steve.baskauf@vanderbilt.edu> wrote:
[snip] OK, so let's imagine that we mark up several million records of specimens, tissue samples, and images as RDF. (We don't have to imagine very hard, I think the BiSciCol group is planning to actually do this within the next several months.) I would really like to hear from some of the people who actually use "DL reasoners" (a group which certainly does not include me) to know what it is that we could actually find out that would be useful about that big data blob using reasoners. I have already confessed that my primary concern is enabling data discovery, transfer, and aggregation using GUIDs and RDF. I'm still somewhat of a "semantic web" skeptic as far as the whole inferencing thing is concerned. Aside from inferring "duplicates", I'm really wanting to know what else there is useful that could be reasoned outside of the Taxon/TaxonConcept class. (I can imaging useful reasoning being done about things in that class like the relationships among names, concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15 article https://journals.ku.edu/index.php/jbi/article/view/25) I think this (data markup priority vs. inferencing priority) is an important discussion to have before the tdwg community can settle on some kind of consensus way of turning database records into RDF, particularly if it is going to have a big influence on the way the RDF model is set up. To me, there is a clear and immediate need to be able to mark data up in a straightforward way. If we can get the semantic part, too, that would be great but not at the expense of data markup. I just was at a meeting of a bunch of herbarium curators. They desperately need a way to implement GUIDs and aggregate data and they need it now. I really don't think they care one whit about inferencing. If we coalesce on a model that is great for doing cool things with 10 records but which can't handle hundreds of thousands of records easily and simply, then we are wasting our time. I don't think we need to dither about this for another five years. I would hate to have to draw an RDF graph of that model I would as much hate to have to draw an RDF graph of 1.7 million instances. The point being, in order to draw a graph of how someone models a domain you don't draw a graph of the entire RDF triple store. That was the point I was trying to make (I think). Thanks for the clarification, Hilmar. Steve -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org <http://informatics.nescent.org> : =========================================================== -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 343-6707 <tel:%28615%29%20343-6707> http://bioimages.vanderbilt.edu _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 343-6707 <tel:%28615%29%20343-6707> http://bioimages.vanderbilt.edu _______________________________________________ tdwg-content mailing list tdwg-content@lists.tdwg.org <mailto:tdwg-content@lists.tdwg.org> http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu mailto:pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecies http://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
Steve,
You are making this sound as if the alternative identifications are discarded, they are not.
For those that are interested in the identification history they can look at the information from the level of the individual or below.
If an individual was misidentified then the occurrence record can be updated to link to the corrected species concept.
If someone wants to create there own alternative set of identifications they can freely do so and those will be linked to the other data so people can choose.
There are a number of ways that alternative identifications could be handled.
For instance, lets say that TaxonomistA and TaxonomistB never agree on an identification.
These could be separated by the use of different predicates.
txn:occurrenceHasSpeciesConcept => Concept_A_URI bioimages:occurrenceHasSpeciesConcept => Concept_B_URI
Now these do not conflict.
In your arboretum are the tree's labeled with all the scientific names and concepts, including the incorrect ones, or just one?
If they were wouldn't visiting children and congressmen ask *so which one is it?*
Respectfully,
- Pete
On Wed, May 4, 2011 at 9:14 AM, Steve Baskauf steve.baskauf@vanderbilt.eduwrote:
OK, I think that I have already probably said more than people want to hear on this subject. So I will stop with this:
- It does not appear that there is anything "wrong" with the
taxonconcept.org from a technical standpoint. It does what Pete wants it to do and that is very cool. 2. I believe that there are aspects of the taxonconcept.org (introduced for convenience in querying) that make it much more complicated than I think are necessary to represent the core conceptual entities in the biodiversity informatics community. I believe (for reasons articulated previously) that some of those complexities may introduce problems in a distributed system where people of different institutions are linking to each other's URIs. 3. I believe that the way that taxonconcept.org conceptualizes some of these core entities is not congruent with the most common opinions that I have heard expressed on this list. Note that I am not saying that the taxonconcept.org conceptualization is "wrong". I am saying that in some ways it differs significantly from what I perceive to be the community consensus. On the issue of the representation of taxa and names I am going to have to defer to the opinion of others (and there is no shortage of people on the list who are experts on this subject). However, I will say that if one says:
And get results free of all inappropriate identifications.
Do you want the misidentifications showing up in these species lists?
How would a general user correctly determine which of these identifications are correct?
who is going to be the judge of "correct"? I don't want to be around when that cat fight erupts. I do think there ought to be some way that a determiner can indicate that they may have made a mistake on their own Identification. But I think multiple Identifications better be multiple opinions or else there will never be a system that will be supported by diverse participants.
Steve
On Wed, May 4, 2011 at 6:04 AM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:
Thanks, Bob, for the examples. I will try to dig my way through them.
I don't want to give the impression that Darwin-SW was not intended to facilitate any reasoning. That is, after all why it is called "Darwin-SW" instead of "Darwin-data-markup". I know that Cam is quite interested in the "semantic" end of it, and when he has Internet again I hope he will chime in on this. I'm simply confessing what my primary concern is (data markup). When we started working on the ontology, we decided to make it as simple as possible while still trying to permit every (or almost every) kind of class and relationship that was discussed in the Oct/Nov discussion. The result was to have a single class Occurrence whose instances are described by properties, not 1.7 million classes N#occurrence and so on for the other six classes in the model. The intention was that DSW 1.0 would be constructed in such a way that it could support the addition of more complex components (Cam has actually marked the posted version at version 0.2 which means that it is certainly subject to improvement) and possibly more complex reasoning. But the more complex stuff was not put into the model at the start because we wanted something that (hopefully) most people could agree represents reality reasonably well (at least a TDWG form of reality since it uses the structure of DwC as its basis) and hence it would actually have the possibility of being used by more than two people.
I hope that people realize that I'm not making these comments to give Pete a hard time or anything. I really am trying to understand the relative benefits and problems of modeling on class of cat with many properties vs. creating a class of cats for every property we care about. Clearly Pete's interest is in Taxon Concepts in the sense that he has defined them. OK, just to set up a straw man, let's say that I am interested in geography more than taxonomy. So I define a class and URI for every state and province in the world. I have no idea how many of those there are, but I'll guess 400. Now I want to describe other things in the biodiversity informatics world. So I mint classes http://baskaufgeo.org/lod/ohio#occurrence for occurrences that happen in Ohio, http://baskaufgeo.org/lod/swaziland#occurrence for occurrences that happen in Swaziland, http://baskaufgeo.org/lod/tennessee#occurrence, http://baskaufgeo.org/lod/ohio#taxon, http://baskaufgeo.org/lod/swaziland#taxon, http://baskaufgeo.org/lod/tennessee#taxon, etc. etc. for all 400 state/provinces and all seven basic types of things in the biodiversity domain. I can now do cool queries that involve geography.
OK, maybe I'm somebody else and I love thinking about temporal relationships. So I create http://baskauf-time.org/lod/1959may#occurrence for occurrences that happen in May of 1959, http://baskauf-time.org/lod/2005may#occurrence for occurrences that happened in May of 2005, etc. Given a billion or so years of life on earth, that would give me about 12 billion classes for each of the six other basic kinds of things I want to model. I could do all kinds of cool queries that involve time now.
So which one of these three ontologies are we going to adopt? The taxon based one? The time based one? The geography based one? Now we are not just having to chose whether to model things as a single class of cats whose instance have many color and reproductiveMethod properties vs. many classes of cats each defined on the basis of its color. We must decide whether it's better to have many classes of colors each defined by the kind of animal that has that color, or many kinds of reproductive systems, each with different kinds of animals, etc. Where is it going to end and how could we agree on which system to use? It seems to me that it would be better to have a class of cats, a class of reproductive systems, etc. and connect their instances with properties.
Am I somehow thinking about this incorrectly? Steve
Bob Morris wrote:
See, for example,
Mungall et al., “Integrating phenotype ontologies across multiple species”, Genome Biology 2010, 11:R2 doi:10.1186/gb-2010-11-1-r2)
Ward Blondé et al. "Reasoning with bio-ontologies: using relational closure rules to enable practical querying", Bioinformatics (2011) doi: 10.1093/bioinformatics/btr164
Calder, et al. "Machine Reasoning about Anomalous Sensor Data"http://dx.doi.org/10.1016/j.ecoinf.2009.08.007 or in manuscript form at http://efg.cs.umb.edu/pubs/SensorDataReasoning.pdf
...
OK, so maybe these knowledge domains are all hypothesis-driven sciences (i.e., sciences), and <whatever dsw is modelling> is not. But that would be sad.
Bob p.s. I had almost finished something else on this thread when Hilmar beat me to the punch. But here's a slightly different expression of his point:
It turns out that the differences between instances and classes is mainly important in contexts in which you have declaimed interest, namely reasoning. In the RDF/RDFS/OWL stack, enforcing a distinction between classes and instances only occurs pretty high up in the stack, when one desires an OWL variant that will offer guarantees that reasoners will finish any inference they are asked to verify, preferably in less than exponential time . I guess, but am not certain, that even in an LOD context, if data are described with an OWL ontology that is known to be intractable, e.g. not in OWL DL, that it is possible to design SPARQL queries that will never complete. In fact, I believe that even with tractable ontologies, there are SPARQL queries that are fundamentally exponential in the number of variables.
p.p.s. Irrelevant, but equivalent, aside about mathematics. At the turn of the 20th century, Whitehead and Russell tried (and failed) to show that everything about numbers could be logically derived from an axiomatic description of the natural numbers (i.e. non-negative integers). It was later shown to be the case that you must include in your logical foundations something deeper, namely the ability to have sets that are elements of other sets (roughly, classes that are individuals in other classes.). Without this, and starting only with the natural numbers, you can logically derive all rational numbers (fractions) and their arithmetic properties, and even all the irrational numbers that are are the solutions of polynomial equations with integer coefficients ("algebraic numbers") such as sqrt(2), and even solutions of the polynomials that have coefficients that are algebraic numbers. But without introducing the notion of the set of subsets of a set, you cannot logically derive the all the interesting transcendental numbers (i.e. those which are not the roots of polynomials), such as e and pi. So if you love calculus, you better not insist on distinguishing instances from classes. But if you are content with polynomials, you can probably be ontologically sloppy. Or, if you don't care about logical foundations of your science, you can forget about the whole thing. :-)
On Tue, May 3, 2011 at 11:51 PM, Steve Baskaufsteve.baskauf@vanderbilt.edu steve.baskauf@vanderbilt.edu wrote:
[snip] OK, so let's imagine that we mark up several million records of specimens, tissue samples, and images as RDF. (We don't have to imagine very hard, I think the BiSciCol group is planning to actually do this within the next several months.) I would really like to hear from some of the people who actually use "DL reasoners" (a group which certainly does not include me) to know what it is that we could actually find out that would be useful about that big data blob using reasoners. I have already confessed that my primary concern is enabling data discovery, transfer, and aggregation using GUIDs and RDF. I'm still somewhat of a "semantic web" skeptic as far as the whole inferencing thing is concerned. Aside from inferring "duplicates", I'm really wanting to know what else there is useful that could be reasoned outside of the Taxon/TaxonConcept class. (I can imaging useful reasoning being done about things in that class like the relationships among names, concepts, parent taxa, etc. e.g. Rod Page's Biodiversity Informatics 3:1-15 article https://journals.ku.edu/index.php/jbi/article/view/25) I think this (data markup priority vs. inferencing priority) is an important discussion to have before the tdwg community can settle on some kind of consensus way of turning database records into RDF, particularly if it is going to have a big influence on the way the RDF model is set up. To me, there is a clear and immediate need to be able to mark data up in a straightforward way. If we can get the semantic part, too, that would be great but not at the expense of data markup. I just was at a meeting of a bunch of herbarium curators. They desperately need a way to implement GUIDs and aggregate data and they need it now. I really don't think they care one whit about inferencing. If we coalesce on a model that is great for doing cool things with 10 records but which can't handle hundreds of thousands of records easily and simply, then we are wasting our time. I don't think we need to dither about this for another five years.
I would hate to have to draw an RDF graph of that model
I would as much hate to have to draw an RDF graph of 1.7 million instances. The point being, in order to draw a graph of how someone models a domain you don't draw a graph of the entire RDF triple store.
That was the point I was trying to make (I think).
Thanks for the clarification, Hilmar. Steve
-hilmar
=========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : ===========================================================
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing listtdwg-content@lists.tdwg.orghttp://lists.tdwg.org/mailman/listinfo/tdwg-content
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
Hi;
a little aside
On az., 2011.eko mairen 04a 04:34, Hilmar Lapp wrote:
On May 3, 2011, at 9:00 PM, Steve Baskauf wrote:
But I was under the impression that one models things by describing classes and the properties that connect them.
In OWL, properties connect instances, not classes. RDF allows metaclasses (things that are classes and instances), but doing this will throw most (all?) reasoners off the track.
Just to add to the discussion, puning can be used to model clases as instances. In fact, a common pattern I'm seing and I'm using is to use puning and have a class hierarchy, each class having an instance with the same URI, so one can refer to each class as a class or as an instance, depending on the context.
http://www.w3.org/2007/OWL/wiki/Punning
cheers
Classes are (to me) a very different thing than instances of classes. A model containing more than 13.6 million classes is at least 1.9 million times as complicated as a model with 7 classes.
Yes and no. I can model a taxonomy as a subclass hierarchy of classes, or as a property-based (memberOf or some such) hierarchy of individuals that all instantiate a single "Taxon" class. The former isn't 1 million times more complex than the latter. However, they are not identical either, and which approach one chooses has significant consequences for how easy it is to express things about those taxa, and for inferring new things from those with a DL reasoner.
I would hate to have to draw an RDF graph of that model
I would as much hate to have to draw an RDF graph of 1.7 million instances. The point being, in order to draw a graph of how someone models a domain you don't draw a graph of the entire RDF triple store.
-hilmar
--
: Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
On May 9, 2011, at 4:24 AM, Mikel Egaña Aranguren wrote:
Just to add to the discussion, puning can be used to model clases as instances. In fact, a common pattern I'm seing and I'm using is to use puning and have a class hierarchy, each class having an instance with the same URI, so one can refer to each class as a class or as an instance, depending on the context.
The punning feature in OWL2 is very nice indeed. But one should not forget that depending on the use the subject (or object) is either the class *or* the instance. I.e., there is a "class view" of the resource identified by the URI, and an instance view. This is different from referring to a class as an instance. So things you say about the URI using object properties you say about the instance identified by that URI, just as you would if you used a generated instance URI, and the rules that apply to that are no different from the rules that apply if you said those things about an instance with a randomly generated URI. For example, if you say conflicting things about it, you still say these about one and the same instance, not "some", possibly different, instances.
-hilmar
Information on the IdentificationTag for the Honey Bee
< http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%...
bit.ly http://bit.ly/lGQrbi
One such identification
< http://lsd.taxonconcept.org/describe/?url=http://ocs.taxonconcept.org/ocs/0d...
bit.ly http://bit.ly/jQCHia
The species concept for the Honey Bee as seen via my endpoint
< http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%...
bit.ly http://bit.ly/g1zzJC
Via the Sig.ma service http://sig.ma/search?pid=e15ef704529f423326f09a106862f978
- Pete
2011/5/9 Hilmar Lapp hlapp@nescent.org
On May 9, 2011, at 4:24 AM, Mikel Egaña Aranguren wrote:
Just to add to the discussion, puning can be used to model clases as instances. In fact, a common pattern I'm seing and I'm using is to use puning and have a class hierarchy, each class having an instance with the same URI, so one can refer to each class as a class or as an instance, depending on the context.
The punning feature in OWL2 is very nice indeed. But one should not forget that depending on the use the subject (or object) is either the class *or* the instance. I.e., there is a "class view" of the resource identified by the URI, and an instance view. This is different from referring to a class as an instance. So things you say about the URI using object properties you say about the instance identified by that URI, just as you would if you used a generated instance URI, and the rules that apply to that are no different from the rules that apply if you said those things about an instance with a randomly generated URI. For example, if you say conflicting things about it, you still say these about one and the same instance, not "some", possibly different, instances.
-hilmar
=========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : ===========================================================
tdwg-content mailing list tdwg-content@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-content
Just to add to the discussion, puning can be used to model clases as instances. In fact, a common pattern I'm seing and I'm using is to use puning and have a class hierarchy, each class having an instance with the same URI, so one can refer to each class as a class or as an instance, depending on the context.
The punning feature in OWL2 is very nice indeed. But one should not forget that depending on the use the subject (or object) is either the class *or* the instance. I.e., there is a "class view" of the resource identified by the URI, and an instance view. This is different from referring to a class as an instance.
Here's an example of Taxon as both a class (Organism, Animal, Dog, Tiger) and as a set of individuals (tOrganism, etc), done without punning. When I put this into protege and run the reasoner, it correctly infers that taxon tAnimal includes Simba and Fido, and that Fido hasTaxon Dog, Animal, and Organism. With punning, you'd use "Animal" rather than 'tAnimal" as the Taxon individuals corresponding to the taxon classes.
If you have received this transmission in error please notify us immediately by return e-mail and delete all copies. If this e-mail or any attachments have been sent to you in error, that error does not constitute waiver of any confidentiality, privilege or copyright in respect of information in the e-mail or attachments.
Please consider the environment before printing this email.
Hi Steve,
About occurrences.
It is the specimen (individual) that gets identified and it can have several identifications.
See http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Focs.taxonconcept.org%...
bit.ly http://bit.ly/fCGZdY
http://bit.ly/fCGZdYYou simply need to markup your own RDF that references that specimen and create your own identification.
There can be several different alternative identifications:
1) Different name but same species - That would be the same concept. 2) Different name and a different species. 3) Same name different species - The identification should be tied to a resolvable concept so a different species under the same name would have a different concept URI
There is an issue with occurrence records that we saw with one of the initial TDWG BioBlitz RDF Versions.
There were multiple identifications but there was no way of indicating which was the preferred.
Assuming that there was only one individual organism identified there is really one one species (or hybrid).
Different human identifiers make assertions about what that species is, those different identifications are then tied to the individual organism.
In other words it is the specimen that is identified not the occurrence.
If your additional identification RDF links to the specimen then it will be linked to the occurrence so someone can browse from the occurrence and look at the identification history of the specimen.
Does, this answer your question?
- Pete
On Mon, May 2, 2011 at 8:16 AM, Steve Baskauf steve.baskauf@vanderbilt.eduwrote:
OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence"http://lod.taxonconcept.org/ses/mCcSp#Occurrence. In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email ( http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (*Boloria selene*). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is *Boloria selene* . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"?
I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species *Bororia selene*. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce.
I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for *Bororia selene* (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurre... described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this.
The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything.
Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties.
Steve
Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I was thinking that it would be best to create a separate class that can be used for populations of a species.
This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities
http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar
See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Base View (http://bit.ly bit.ly/gMFqR1
The model mints URI's for the following related entities. See RDF. or KB View
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
Here is how a subset of these would relate to the new #Population Tag and related semantic entities.
This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.
<txn:SpeciesIndividualTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Individual%22%3E dcterms:titleA Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Individual </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesIndividualTag>
Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.
<txn:SpeciesPopulationTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Population%22%3E dcterms:titleA Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Population </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesPopulationTag>
This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars.
<owl:Class rdf:about=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation%22%3E <rdf:type rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Population%22/%3E skos:prefLabelThe population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource=" http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individ... "/> <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf%22/%3E </owl:Class>
Respectfully,
- Pete
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://lod.geospecies.org/ Knowledge Bases
A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
I beg to differ.
Peter DeVries wrote:
About occurrences.
It is the specimen (individual) that gets identified and it can have several identifications.
I do not believe it is a good idea to equate "specimen" with individual. I have reached the conclusion that a specimen CAN be an individual if one chooses to type a particular "thing" as both a specimen and and an Individual but I'm not comfortable with saying that they are owl:sameAs . It is my opinion, and I think one that was supported to some extent in the discussion of last Oct/Nov that when we "identify" some kind of evidence like a a branch cut from a tree we are really making a statement about what taxon we think the tree represents, not the branch per se. This is an important distinction, because if we are identifying the tree, then any branch that is dcterms:partOf the tree shares that identification. This may seem like an esoteric point, but it is one that has a fairly significant bearing on how one chooses to construct a model. I will stop there because I don't want to re-plough the same ground again. The relevant posts are summarized and referenced on the DSW pages explaining the IndividualOrganism and Specimen classes.
There is an issue with occurrence records that we saw with one of the initial TDWG BioBlitz RDF Versions.
There were multiple identifications but there was no way of indicating which was the preferred.
Again, I beg to differ. It was my understanding (again from the extensive conversation about this in Oct/Nov) that the creation of multiple Identifications were "preference agnostic". Each Identification represented an expressed opinion about the concept/TNU that the determiner asserted that the individual represented. They were not flagged as "right" or "wrong". In fact, it is possible that there could be several Identifications by different determiners that asserted the same concept/TNU, indicating that the determiners agreed with each other.
Assuming that there was only one individual organism identified there is really one one species (or hybrid).
Aaaaaaaaaaaaaah! Please Nico C., don't take this one up!
Different human identifiers make assertions about what that species is, those different identifications are then tied to the individual organism.
Yes I agree entirely.
In other words it is the specimen that is identified not the occurrence.
Yes, the occurrence was not identified. But I would say the individual, not the specimen (as explained above).
If your additional identification RDF links to the specimen then it will be linked to the occurrence so someone can browse from the occurrence and look at the identification history of the specimen.
Does, this answer your question?
Not really, but I think the issue here is that the way you are conceptualizing specimens, individuals, and identifications is NOT the same as DSW defines them. So I retract my earlier statement that the taxonconcept.org ontology and DSW are really the same thing with different term names .
Steve
- Pete
On Mon, May 2, 2011 at 8:16 AM, Steve Baskauf <steve.baskauf@vanderbilt.edu mailto:steve.baskauf@vanderbilt.edu> wrote:
OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence" <http://lod.taxonconcept.org/ses/mCcSp#Occurrence>. In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email (http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (/Boloria selene/). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is /Boloria selene/ . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"? I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species /Bororia selene/. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce. I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for /Bororia selene/ (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurrence is described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this. The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything. Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties. Steve Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click? I was thinking that it would be best to create a separate class that can be used for populations of a species. This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%2Fses%2Fv6n7p%23Species Knowledge Base View (http://bit.ly bit.ly/gMFqR1 <http://bit.ly%20bit.ly/gMFqR1> The model mints URI's for the following related entities. See RDF. or KB View http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI. Here is how a subset of these would relate to the new #Population Tag and related semantic entities. This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept. <txn:SpeciesIndividualTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Individual"> <dcterms:title>A Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> <skos:prefLabel>A Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel> <dcterms:identifier>http://lod.taxonconcept.org/ses/v6n7p#Individual</dcterms:identifier> <dcterms:description>A lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/> <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/> </txn:SpeciesIndividualTag> Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept. <txn:SpeciesPopulationTag rdf:about="http://lod.taxonconcept.org/ses/v6n7p#Population"> <dcterms:title>A Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> <skos:prefLabel>A Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel> <dcterms:identifier>http://lod.taxonconcept.org/ses/v6n7p#Population</dcterms:identifier> <dcterms:description>A lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Species"/> <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/ses/v6n7p.rdf"/> </txn:SpeciesPopulationTag> This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars. <owl:Class rdf:about="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation"> <rdf:type rdf:resource="http://lod.taxonconcept.org/ses/v6n7p#Population"/> <skos:prefLabel>The population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource="http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individual"/> <wdrs:describedby rdf:resource="http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf"/> </owl:Class> Respectfully, - Pete ------------------------------------------------------------------------------------- Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu <mailto:pdevries@wisc.edu> TaxonConcept <http://www.taxonconcept.org/> & GeoSpecies <http://lod.geospecies.org/> Knowledge Bases A Semantic Web, Linked Open Data <http://linkeddata.org/> Project ---------------------------------------------------------------------------------------
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582 <tel:%28615%29%20343-4582>, fax: (615) 343-6707 <tel:%28615%29%20343-6707> http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu mailto:pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecies http://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
Hi Steve,
I may have overloaded the term specimen to make the explanation easier to follow.
A specimen could be an individual or it could be part of an individual.
To some extent you need to think about how these models will be used.
If you subscribe to the model that a species is whatever a taxonomists says it is then it is difficult to make statements like.
X% of the world's species will be extinct by 2050.
If you mean a species as defined by the concept documented at this URI which is supported by these specimens, images, and DNA then you are on firmer ground.
Species in the natural world do a pretty good job recognizing those individuals that are appropriate mates. In other words members of their own species.
Are we modeling species or variations in human conceptualizations of species?
Assuming that there was only one individual organism identified there is
really one one species (or hybrid).
Aaaaaaaaaaaaaah! Please Nico C., don't take this one up!
I stick with this. Assuming you don't have a hybrid individual. That individual is one species. The fact that human may disagree on what species it is a human issue.
Again, Are we modeling species or variations in human conceptualizations of species?
Which of these is of primary importance to decision makers and non-taxonomist biologists?
Part of the problem with various publications relating to ontologies and taxonomy is that their species models entail a specific phylogenetic hypothesis.
In the real world taxa are not as clean as some would like to make them out to be.
Each individual is a unique combination of thousands of separate gene lineages which often do not follow clean monophyletic paths.
I would argue that most of those who work with species related data see them as useful typological constructs which in general follow the biological species model.
*Aedes triseriatus* owl:sameAs *Ochlerotatus triseriatus*
Others seem to see them as phylogenetic end nodes which entail a specific phylogenetic history.
*Aedes triseriatus* distinctFrom *Ochlerotatus triseriatus* * * If you are primarily interested in understanding issues of ecology, disease, diversity and conservation the former model is more appropriate than the later.
Respectfully,
- Pete
On Wed, May 4, 2011 at 8:35 AM, Steve Baskauf steve.baskauf@vanderbilt.eduwrote:
I beg to differ.
Peter DeVries wrote:
About occurrences.
It is the specimen (individual) that gets identified and it can have several identifications.
I do not believe it is a good idea to equate "specimen" with individual. I have reached the conclusion that a specimen CAN be an individual if one chooses to type a particular "thing" as both a specimen and and an Individual but I'm not comfortable with saying that they are owl:sameAs . It is my opinion, and I think one that was supported to some extent in the discussion of last Oct/Nov that when we "identify" some kind of evidence like a a branch cut from a tree we are really making a statement about what taxon we think the tree represents, not the branch per se. This is an important distinction, because if we are identifying the tree, then any branch that is dcterms:partOf the tree shares that identification. This may seem like an esoteric point, but it is one that has a fairly significant bearing on how one chooses to construct a model. I will stop there because I don't want to re-plough the same ground again. The relevant posts are summarized and referenced on the DSW pages explaining the IndividualOrganism and Specimen classes.
There is an issue with occurrence records that we saw with one of the initial TDWG BioBlitz RDF Versions.
There were multiple identifications but there was no way of indicating which was the preferred.
Again, I beg to differ. It was my understanding (again from the extensive conversation about this in Oct/Nov) that the creation of multiple Identifications were "preference agnostic". Each Identification represented an expressed opinion about the concept/TNU that the determiner asserted that the individual represented. They were not flagged as "right" or "wrong". In fact, it is possible that there could be several Identifications by different determiners that asserted the same concept/TNU, indicating that the determiners agreed with each other.
Assuming that there was only one individual organism identified there is really one one species (or hybrid).
Aaaaaaaaaaaaaah! Please Nico C., don't take this one up!
Different human identifiers make assertions about what that species is, those different identifications are then tied to the individual organism.
Yes I agree entirely.
In other words it is the specimen that is identified not the occurrence.
Yes, the occurrence was not identified. But I would say the individual, not the specimen (as explained above).
If your additional identification RDF links to the specimen then it will be linked to the occurrence so someone can browse from the occurrence and look at the identification history of the specimen.
Does, this answer your question?
Not really, but I think the issue here is that the way you are conceptualizing specimens, individuals, and identifications is NOT the same as DSW defines them. So I retract my earlier statement that the taxonconcept.org ontology and DSW are really the same thing with different term names .
Steve
- Pete
On Mon, May 2, 2011 at 8:16 AM, Steve Baskauf < steve.baskauf@vanderbilt.edu> wrote:
OK, Pete, I'm going to try to write the other email that I mentioned in the previous one. This email relates to the actual suggestion that you made in the email, that is to use the URIs of the form like: "http://lod.taxonconcept.org/ses/mCcSp#Occurrence"http://lod.taxonconcept.org/ses/mCcSp#Occurrence. In the RDF that defines what this URI means, the URI is described as "A lightweight tag that can be used to label occurrences of this species". What I'm not sure about is what exactly one is supposed to do with it. From the example that I was talking about in the previous email ( http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9.rdf), this "tag" is the object of the predicate txn:occurrenceHasSpeciesOccurrenceTag . So I guess that it is another way that one could query Occurrence records to find out which ones are Occurrences of the species having the identifier "ICmLC" (*Boloria selene *). But I'm not sure what the advantage of that is. The RDF for the Occurrence already tells me that the Occurrence has the txn:occurrenceHasSpeciesConcept property with object URI http://lod.taxonconcept.org/ses/ICmLC#Species . I can resolve that URI and "find out" that the "species concept" (sensu DeVries) is *Boloria selene* . But if I used the "lightweight tag" I'd also have to resolve its URI to find out about it and the RDF for the tag directs me to the http://lod.taxonconcept.org/ses/ICmLC#Species URI anyway via the dcterms:isPartOf property of the tag. I guess the point is that if one wants to "find out" about the Occurrence, it takes two steps to get to the species concept description if I use the tag (first through txn:occurrenceHasSpeciesOccurrenceTag, then through dcterms:isPartOf) which is no advantage over just getting there in one step (via txn:occurrenceHasSpeciesConcept). If the only point is to have something to put in as a search term, then why not just make the txn:occurrenceHasSpeciesOccurrenceTag a data property with the literal object the string "ICmLC"?
I suppose that one could say that an advantage of the "lightweight tag" approach would be that one is indicating that the particular Occurrence is an instance of a class that consists of all Occurrences of the species *Bororia selene*. That seems to be what the intention is. But this seems to be a case of creating many subclasses rather than having a general class and assigning it properties that help one to understand the nature of the instance of that class. It requires the creation of a class for every species on the planet. Instead of there being a relatively small number of classes that includes the basic kinds of resources (Occurrence, individual, Identification, taxon concept) there is a class for occurrences of every kind of taxon concept. Actually, there are several classes for every instance of taxon concept, because you are recommending that the "lightweight tag" approach be used for other types of things as well, such as individuals and (in your suggestion below, populations). There isn't anything intrinsically "wrong" with this approach, but with my bias toward preferring "well known" types/classes it just seems like a lot to expect consuming applications to "understand" what amounts to potentially millions of classes that this method would introduce.
I also don't quite understand what a txn:SpeciesOccurrenceTag is exactly. In the RDF that defines the txn:SpeciesOccurrenceTag instance for *Bororia selene* (http://lod.taxonconcept.org/ses/ICmLC#Occurrence) the dcterms:description says that it "allow species occurrences to be modeled as instances of SpeciesOccurrenceTag". But that doesn't seem to be what is actually occurring. When the Occurrence instance http://ocs.taxonconcept.org/ocs/f522444a-2dd9-400e-be59-47213ef38cb9#Occurre... described, it is not typed as the lightweight tag (which IS a txn:SpeciesOccurrenceTag because of the implicit typing caused by the XML container element name). The lightweight tag URI is the object of the txn:occurrenceHasSpeciesOccurrenceTag property, but that doesn't make the Occurrence an instance of SpeciesOcurrenceTag as would be the case (I think) if the lightweight tag URI were the object of a rdf:type property. Anyway, I'm confused about this.
The other issue that I would raise with this approach is that it brings up the same issue that I raised in the other email that I wrote. It essentially puts a burden of anticipating the results of a query onto the metadata provider. If one follows the model of allowing multiple Identifications for an organism, then it is possible that someone somewhere else might apply their own Identification instance to the individual represented in the Occurrence. As was the case in my earlier example, for txn:occurrenceHasSpeciesOccurrenceTag to be useful as a thing to be queried, the metadata provider would need to somehow know that this additional Identification had been made, and then create another txn:occurrenceHasSpeciesOccurrenceTag property for the Occurrence instance. This seems to somewhat at odds with the benefit that the Linked Data world has in allowing resources to be created by people all over the cloud and then linked rather than expecting a centralized authority to do everything.
Anyway, maybe you can explain what is going on so that I can understand it better and maybe explain why this approach is better than just creating a few classes and describing their instances by descriptive properties.
Steve
Peter DeVries wrote:
I am still somewhat puzzled why TDWG seems so opposed to adopting anything that comes from outside a small click?
I was thinking that it would be best to create a separate class that can be used for populations of a species.
This would require adding an additional tag to the TaxonConcept Species Concept Model, which currently includes several tags like entities
http://lod.taxonconcept.org/ses/mCcSp#Species <- The Species Concept for the Cougar
See http://lod.taxonconcept.org/ses/v6n7p.html HTML http://lod.taxonconcept.org/ses/v6n7p.rdf RDF
http://lsd.taxonconcept.org/describe/?url=http%3A%2F%2Flod.taxonconcept.org%... Base View (http://bit.ly bit.ly/gMFqR1 http://bit.ly%20bit.ly/gMFqR1
The model mints URI's for the following related entities. See RDF. or KB View
http://lod.taxonconcept.org/ses/mCcSp#Image - An image of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Occurrence - An occurrence of a Cougar http://lod.taxonconcept.org/ses/mCcSp#Individual - An individual Cougar http://lod.taxonconcept.org/ses/mCcSp#Taxonomy - A Basic Taxonomy for the Cougar, one alternative among many potential classifications http://lod.taxonconcept.org/ses/mCcSp#NCBI_Taxonomy - The NCBI Taxonomy for Cougar, or starting at the lowest available clade http://lod.taxonconcept.org/ses/mCcSp#OriginalDescription - The Original Description of the Cougar, ideally with links to the PDF or BHL URI.
Here is how a subset of these would relate to the new #Population Tag and related semantic entities.
This tag is used an individual organism that that is an instance of the species concept pecies concept RDF. This allows you to refer to a individual cougar in a way that is separate from the concept of cougar and retains links to other data relating to that species concept.
<txn:SpeciesIndividualTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Individual%22%3E dcterms:titleA Tag for individuals of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label individuals of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Individual </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label individuals of this species. These allow individual organisms to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesIndividualTag>
Add a tag for a species population to the species concept RDF. This allows you to refer to a population of cougars in a way that is separate for an individual cougar and retains links to other data relating to that species concept.
<txn:SpeciesPopulationTag rdf:about=" http://lod.taxonconcept.org/ses/v6n7p#Population%22%3E dcterms:titleA Tag for populations of the species concept Puma concolor se:v6n7p</dcterms:title> skos:prefLabelA Tag-like resource that is used to label populations of the species concept Puma concolor se:v6n7p</skos:prefLabel> dcterms:identifierhttp://lod.taxonconcept.org/ses/v6n7p#Population </dcterms:identifier> dcterms:descriptionA lightweight tag that can be used to label populations of this species. These allow populations of a species to be modeled as instances of SpeciesIndividualTag</dcterms:description> <dcterms:isPartOf rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Species%22/%3E <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p.rdf%22/%3E </txn:SpeciesPopulationTag>
This is the RDF for a population, it has as one of it's parts an individual organism. It is typed to indicate that it refers to a population of Cougars.
<owl:Class rdf:about=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation%22%3E <rdf:type rdf:resource=" http://lod.taxonconcept.org/ses/v6n7p#Population%22/%3E skos:prefLabelThe population of North American Cougars Puma concolor se:v6n7 </skos:prefLabel> <dcterms:hasPart rdf:resource=" http://ocs.taxonconcept.org/ocs/51cd124d-78c5-40aa-a7ff-2e3f58ca6ade#Individ... "/> <wdrs:describedby rdf:resource=" http://lod.taxonconcept.org/pops/NorthAmericanCougarPopulation.rdf%22/%3E </owl:Class>
Respectfully,
- Pete
Pete DeVries
Department of Entomology
University of Wisconsin - Madison
445 Russell Laboratories
1630 Linden Drive
Madison, WI 53706
Email: pdevries@wisc.edu
TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://lod.geospecies.org/ Knowledge Bases
A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
--
Pete DeVries Department of Entomology University of Wisconsin - Madison 445 Russell Laboratories 1630 Linden Drive Madison, WI 53706 Email: pdevries@wisc.edu TaxonConcept http://www.taxonconcept.org/ & GeoSpecieshttp://about.geospecies.org/ Knowledge Bases A Semantic Web, Linked Open Data http://linkeddata.org/ Project
-- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences
postal mail address: VU Station B 351634 Nashville, TN 37235-1634, U.S.A.
delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235
office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 343-6707http://bioimages.vanderbilt.edu
participants (6)
-
Bob Morris
-
Hilmar Lapp
-
Mikel Egaña Aranguren
-
Paul Murray
-
Peter DeVries
-
Steve Baskauf