[tdwg-tapir] TAPIR searches and entities
Hello, I had a talk with Gregor and the others this week to find out whether TAPIR is suitable for SDD. Apart from the known recusive issue we were concerned about one major search problem I would like to introduce you to: Imagine a simple concept "ontology" with a specimen linked to identifications with a taxon-name and an identifier (person). If I want to do a combined search on the identifier AND the taxon name I actually have 2 ways of asking the query. 1) use the same identification object to do the AND. In SQL this would be: SELECT * FROM specimen s JOIN identification i ON xxx WHERE i.taxon_name = "Abies alba" AND i.identifier="Markus" 2) dont require the identifier to have done the identification of the required name, but to be an independent constraint. In SQL you would use an alias of that table: SELECT * FROM specimen s JOIN identification i1 ON xxx JOIN identification i2 ON xxx WHERE i1.taxon_name = "Abies alba" AND i2.identifier="Markus" This gives different results. And it is a common query in SDD when searching on characters. As far as I can see TAPIR cannot handle this, as a TAPIR request doesnt know about the relations between concept. The entities, classes whatever you would like to call it. A loose list of independent concepts really doesnt seem to work. We might need to know about the class relations/multiplicity at least on a high level. In general this could be taken from an XML Schema as well as OWL, RDFS or alike. Or we could prescribe a TAPIR service what to do in such cases. If a complex AND query is received we could prescribe to resolve that query in way #1 (i think its the more common intention). But then noone would be able to ask questions of type2. Maybe we could discuss this further in the TAG meeting, Session 4: Integration of Existing Technologies with a Core Ontology? I do not have an answer yet to that question, but I wanted to spread the problem before the TAG meeting. Looking forward to meet you all in Edinburgh, Markus
Hi Markus, This is analogous to the example that Rob and I showed at the Madrid TAPIR meeting with relationships between records representing people and records representing the department(s) they work in. The issue in both cases is that, with complicated data of multiple types, global concepts don't always provide enough information for one to answer the kinds of queries they might like. To do queries like these, its our feeling that you have to know information about the different types of objects whose relationships you want to examine (in other words you have to know the domain of each property). Just as an aside, most triple stores that are implemented over databases process queries using the second approach you outlined below (repeated self joins on the triple table). The beauty of this approach is that, with minor modifications, you have the ability to represent combinations of ANDs and ORs which allow you to distinguish between mandatory and optional match conditions independently of what you want to return from the query. I'm also looking forward to talking about this and other issues in Edinburgh; see you next week. -Steve Döring, Markus wrote:
Hello, I had a talk with Gregor and the others this week to find out whether TAPIR is suitable for SDD. Apart from the known recusive issue we were concerned about one major search problem I would like to introduce you to:
Imagine a simple concept "ontology" with a specimen linked to identifications with a taxon-name and an identifier (person).
If I want to do a combined search on the identifier AND the taxon name I actually have 2 ways of asking the query.
1) use the same identification object to do the AND. In SQL this would be:
SELECT * FROM specimen s JOIN identification i ON xxx WHERE i.taxon_name = "Abies alba" AND i.identifier="Markus"
2) dont require the identifier to have done the identification of the required name, but to be an independent constraint. In SQL you would use an alias of that table:
SELECT * FROM specimen s JOIN identification i1 ON xxx JOIN identification i2 ON xxx WHERE i1.taxon_name = "Abies alba" AND i2.identifier="Markus"
This gives different results. And it is a common query in SDD when searching on characters.
As far as I can see TAPIR cannot handle this, as a TAPIR request doesnt know about the relations between concept. The entities, classes whatever you would like to call it. A loose list of independent concepts really doesnt seem to work. We might need to know about the class relations/multiplicity at least on a high level. In general this could be taken from an XML Schema as well as OWL, RDFS or alike.
Or we could prescribe a TAPIR service what to do in such cases. If a complex AND query is received we could prescribe to resolve that query in way #1 (i think its the more common intention). But then noone would be able to ask questions of type2.
Maybe we could discuss this further in the TAG meeting, Session 4: Integration of Existing Technologies with a Core Ontology?
I do not have an answer yet to that question, but I wanted to spread the problem before the TAG meeting.
Looking forward to meet you all in Edinburgh, Markus
_______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
Hi Steve/Markus, I am going to be contoversial and say the join should not be supported at all! The client should do the join themselves. They should get a set of nicely defined determination objects (on the basis of a query of their literal properties == easy stuff) and then go back and get the specimen objects (on the basis of GUIDS == easy stuff) if they want more information. The determination objects may contain LSIDs of specimens and taxon concepts or names that are held by different data sources so how can the data source make the join? Isn't this nature of the system we are trying to build? Some day down the line a grid service may do the join for the client and cleverly cache stuff from different data sources but that is a story for another day... Also this is an incredible difficult problem to solve so apply Brave Sir Robin's rule - run away! It will all be fun to talk about at the meeting but I suggest we hang off discussions on the list just now as some people will be in transit. Looking forward to seeing attendees next week and hope that we can feed discussions out to the email list as they happen - so the rest of you guys can join in. All the best, Roger On 07/04/06, Steven Perry <smperry@ku.edu> wrote:
Hi Markus,
This is analogous to the example that Rob and I showed at the Madrid TAPIR meeting with relationships between records representing people and records representing the department(s) they work in. The issue in both cases is that, with complicated data of multiple types, global concepts don't always provide enough information for one to answer the kinds of queries they might like. To do queries like these, its our feeling that you have to know information about the different types of objects whose relationships you want to examine (in other words you have to know the domain of each property).
Just as an aside, most triple stores that are implemented over databases process queries using the second approach you outlined below (repeated self joins on the triple table). The beauty of this approach is that, with minor modifications, you have the ability to represent combinations of ANDs and ORs which allow you to distinguish between mandatory and optional match conditions independently of what you want to return from the query.
I'm also looking forward to talking about this and other issues in Edinburgh; see you next week.
-Steve
Döring, Markus wrote:
Hello, I had a talk with Gregor and the others this week to find out whether TAPIR is suitable for SDD. Apart from the known recusive issue we were concerned about one major search problem I would like to introduce you to:
Imagine a simple concept "ontology" with a specimen linked to identifications with a taxon-name and an identifier (person).
If I want to do a combined search on the identifier AND the taxon name I actually have 2 ways of asking the query.
1) use the same identification object to do the AND. In SQL this would be:
SELECT * FROM specimen s JOIN identification i ON xxx WHERE i.taxon_name= "Abies alba" AND i.identifier="Markus"
2) dont require the identifier to have done the identification of the required name, but to be an independent constraint. In SQL you would use an alias of that table:
SELECT * FROM specimen s JOIN identification i1 ON xxx JOIN identification i2 ON xxx WHERE i1.taxon_name = "Abies alba" AND i2.identifier="Markus"
This gives different results. And it is a common query in SDD when searching on characters.
As far as I can see TAPIR cannot handle this, as a TAPIR request doesnt know about the relations between concept. The entities, classes whatever you would like to call it. A loose list of independent concepts really doesnt seem to work. We might need to know about the class relations/multiplicity at least on a high level. In general this could be taken from an XML Schema as well as OWL, RDFS or alike.
Or we could prescribe a TAPIR service what to do in such cases. If a complex AND query is received we could prescribe to resolve that query in way #1 (i think its the more common intention). But then noone would be able to ask questions of type2.
Maybe we could discuss this further in the TAG meeting, Session 4: Integration of Existing Technologies with a Core Ontology?
I do not have an answer yet to that question, but I wanted to spread the problem before the TAG meeting.
Looking forward to meet you all in Edinburgh, Markus
_______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
_______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
-- ------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org roger@tdwg.org +44 1578 722782 -------------------------------------
Hi all, I think WFS suffers of the same issue. I asked on the wfs-dev mailing list and I got already a response from the lat/lon people. They have a WFS server pretty similar to PyWrapper but in Java. Markus Schneider proposed the usage of indexes in xpath to help the server knows if the terms you are asking for are related or not. Is not the most elegant solution, and probably does not match the xpath semantics, but maybe it could work. Imagine a filter like: <filter> <like> <and> <equals> <concept id="abcd:/Datasets.../ Identifications/Identification[1]/ScientificName"/> <literal value="Abies alba"/> <equals> <equals> <concept id="abcd:/Datasets.../ Identifications/Identification[1]/Identifier"/> <literal value="Markus"/> <equals> </and> </like> </filter> This will create your 1st SQL statement. If in the second concept in the filter, the indentifier, does not have an index then it should be the second... But I have the impression that Digir2 would have serious problems dealing with this... Javier. On 07/04/2006, at 18:25, Döring, Markus wrote:
Hello, I had a talk with Gregor and the others this week to find out whether TAPIR is suitable for SDD. Apart from the known recusive issue we were concerned about one major search problem I would like to introduce you to:
Imagine a simple concept "ontology" with a specimen linked to identifications with a taxon-name and an identifier (person).
If I want to do a combined search on the identifier AND the taxon name I actually have 2 ways of asking the query.
1) use the same identification object to do the AND. In SQL this would be:
SELECT * FROM specimen s JOIN identification i ON xxx WHERE i.taxon_name = "Abies alba" AND i.identifier="Markus"
2) dont require the identifier to have done the identification of the required name, but to be an independent constraint. In SQL you would use an alias of that table:
SELECT * FROM specimen s JOIN identification i1 ON xxx JOIN identification i2 ON xxx WHERE i1.taxon_name = "Abies alba" AND i2.identifier="Markus"
This gives different results. And it is a common query in SDD when searching on characters.
As far as I can see TAPIR cannot handle this, as a TAPIR request doesnt know about the relations between concept. The entities, classes whatever you would like to call it. A loose list of independent concepts really doesnt seem to work. We might need to know about the class relations/multiplicity at least on a high level. In general this could be taken from an XML Schema as well as OWL, RDFS or alike.
Or we could prescribe a TAPIR service what to do in such cases. If a complex AND query is received we could prescribe to resolve that query in way #1 (i think its the more common intention). But then noone would be able to ask questions of type2.
Maybe we could discuss this further in the TAG meeting, Session 4: Integration of Existing Technologies with a Core Ontology?
I do not have an answer yet to that question, but I wanted to spread the problem before the TAG meeting.
Looking forward to meet you all in Edinburgh, Markus
_______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
participants (4)
-
"Döring, Markus"
-
Javier de la Torre
-
Roger Hyam
-
Steven Perry