Re: [tdwg-tapir] ideas & TapirLite
Roger, dont get me wrong. I do like the tapir lite idea cause some easy way will be needed. on the other hand I am concerned about becoming too flexible.
why do we want tapir at all and why shouldnt we build everything on top of just http calls?
The great advantage of a tapir service (so is biocase and digir) is that you can construct your own queries. we could surely build all our system on "simple" webservices, but we would have to define the valid set of them, your API. And this will change over time as we cant think of all questions people want to ask in the future. Another reason is to provide data providers with ready to use software so that they dont have to programm anything themselves. thats why we shouldnt really need a tapir wizard for .net
To my point the only reason for having separate non tapir services for TCS is the performance on "complex" databases. But I am not sure even about that. I guess IPNI will not grant their services direct access to their master db but use a copy instead. So they can denormalize things and bring the published db already closer to TCS via views for example. That does cost time, but definetely not as much as writing your own service which has to be maintained and updated.
You were asking earlier about use cases and statistics about digir/biocase queries today. I am not sure about that, but I guess there will be mainly portals and indexing queries accessing our providers. Very simple things we can actually easily emulate with get webservices. So should we maybe get rid of tapir at all? or just have one big instance in front of the gbif cache? then why not issue direct sql statements cause we know the schema of that db?
These are all questions that came to my mind over the past days, but I dont really wanna suggest anything yet.
But coming back to your 3 ways of dealing with tapir lite, my current favorite is full tapir services + wsdl soap services. what is a capability response good for that only tells me I dont understand you?
It starts snowing here in Berlin... Markus
-----Ursprüngliche Nachricht----- Von: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] Im Auftrag von Roger Hyam Gesendet: Freitag, 18. November 2005 13:35 An: Döring, Markus Cc: tdwg-tapir@lists.tdwg.org Betreff: Re: [tdwg-tapir] ideas & TapirLite
Hi Markus,
The original motivations behind TapirLite was a reaction to the custom response model (which seemed very difficult to implement) and the perceived need for a simple web service like system that could be implemented in a simple way on top of complex 'legacy' systems. The fact that the custom response models are now formally on more of an experimental footing has removed 50% of the motivation but the simple service motivation remains.
You are correct in that TapirLite isn't really Tapir at all. It IS just a GET based web service. The kind of thing that full Tapir implementations would have no problem in imitating.
Currently (well some of the time) I am trying to figure out a simple API for taxonomic data source that will enable people like Donald to crawl them in something of a meaningful way. I can't assume that data providers will be happy to install (or write) a full Tapir implementation (what if you are in a .Net only environment etc). They have their own agendas and the simpler the system they have to put in place the more likely they are to do it. My hope was to keep them within the Tapir fold.
So options are:
1. Define the API in terms of simple http calls. Data providers can either write their own script or they can get a full Tapir provider to imitate the taxonomic API. Advantage is it might actually be quick and easy both to define and implement. Disadvantage is it doesn't integrate with other Tapir providers in the long run - no metadata or capabilities responses. 2. Define the API in terms of templated Tapir calls and insure that any script that is written makes the data provider look like a very limited Tapir provider (the TapirLite approach). Advantage is that it provides consistent metadata and other calls in line with other Tapir providers. Disadvantage is that it actually adds complexity to the Tapir protocol by having too many things optional and adds complexity to the custom scripts. 3. Use another technology altogether such as SOAP or XML-RPC to expose the API. Advantage is that organisation and individuals involved are familiar with the technologies, easier to hire and outsource etc (VisualStudio doesn't yet provide a Tapir integration wizard!). Disadvantage is that it doesn't integrate with Tapir.
As I write this all three approaches look equally attractive so I am not advocating anything just rolling ideas around. I'd be grateful for any thoughts that help clarify this. If it would mean getting Tapir to version 1 quicker if option 2 above was dropped then it might be a good strategy. I assume Tapir's primary function is to unite DiGIR and BioCASE and the notion of TapirLite probably should not get in the way of this.
Roger
Döring, Markus wrote:
I would like to get into the lite idea a bit more in detail. Lets start with the list of expected "levels" of tapir compliant services: 1- a full TAPIR service incl an experimental dynamic custom output model 2- a full TAPIR service restricted to certain output models identfied via a list of URLs pointing to the output model definition documents 3- a TAPIR Lite that only wants to accept certain parameters for fixed queries. The main idea as I can see is to have a limited list (maybe only 1?) of query templates here (reminder: QTs are filters & a URL reference to an output model) that define the accepted parameters. I assume this service also only works via http GET and not through xml messaging. The difference between level 1 & 2 is quite small (not necessarily for implementations though). The list of accepted output models simply go into the capabilities of a provider and a client can easily identfy if it is able to communicate with a datasource service. A level 3 TAPIR lite service is quite different from the others. Essentially its a regular GET based webservice that can be described by a WSDL, cause no serialised filter is allowed and the response model is fixed. If we really want to define these kind of simple services with the same protocol schema, what should be its "capabilities"? - only http GET invocation, no xml messaging - the TAPIR envelope should be supported for responses - ping, metadata, capabilites should work - no inventory operation - no (complex) filters or variables, only parameters If only parameters are accepted, then this is not a real search. In the old protocol this was a distinct "view" operation. What we want here is exactly this again. A service only available via http get and parameters. A list of accepted query templates would be enough and no operators, variables and alike need to be supported. The current definition of capabilities does only allow to specify http-GET only services or the accepted list of query templates by the way! A new adhoc idea: what about defining these 3 levels and allowing no other intermediate compliance? Then we can reduce the capabilities a lot, a lot of burden would be removed from clients and we would get more interoperability? Just a quick thought when looking at the above. We could make it as simple as this: <FullService accept_custom_models="false"> <supportedModels> <model location="URI" namespace=""> ... </supportedModels> <concepts> <concept id="..." /> ... </concepts> </FullService> or <LiteService> <supportedQueryTemplates> <template location="URI"> <parameter name="" /> ... </template> ... </supportedQueryTemplates> </LiteService> What do you think? I am really a bit afraid of ending up with different services that implement only bits of the specification. We are about to move all burden towards the clients which for my feeling should be easy to create as a researcher with just simple programming knowledge. Sorry to raise this issue again and especially for this drastic new suggestion. It came up while writing this mail, so dont take it for a well thought idea. I just want to think a bit more about the problems involved in having variable and mutating tapir services. Markus -----Ursprüngliche Nachricht----- Von: Roger Hyam [mailto:roger@tdwg.org] Gesendet: Freitag, 18. November 2005 10:32 An: Donald Hobern Cc: Döring, Markus; tdwg-tapir@lists.tdwg.org Betreff: Re: [tdwg-tapir] ideas & TapirLite As TapirLite champion I do have a problem with having to implement all the operators. I just want to be pretty and dumb! I don't have a problem with the concept of a named subset protocol i.e. Tapir (the real thing) and TapirLite (the not so bright second cousin). Are there some use cases somewhere listing what Tapir clients are expected to call or some statistical break down of what kinds of queries are run against existing BioCASE and DiGIR providers? Roger Donald Hobern wrote:
Markus, Doesn't "all operations" imply that the provider must implement generic search operations? Isn't a large part of the reason for TAPIR Lite the need to support "databases" that cannot be mapped using the standard RDBMS mapping and which are just trying to emulate common views? I would say that these should be supported but that each TDWG content subgroup needs to define a set of (web service) interfaces that must be supported by any compliant provider. If they can handle this set of views, they may appear as TAPIR providers. Or did I miss something? Donald --------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 --------------------------------------------------------------- -----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of "Döring, Markus" Sent: 17 November 2005 16:25 To: tdwg-tapir@lists.tdwg.org Subject: [tdwg-tapir] ideas & TapirLite Hi, talking to Anton today we were wondering if it makes sense to allow a tapir client to embed its own request-id into the tapir headers for later identification of asynchronous and distributed messages. Currently we would need to identify a message by its sendtime (vague) and source. Does this make sense? Does anyone know how other people deal with this problem? ----------- The other thoughts were about TapirLite. We both think its a very bad idea to push all responsibility to the client by allowing any TAPIR service to be very minimalistic. If a client should be able to contact services that have different operators, operations and concepts, then I dont think we will get anything interoperable. I still prefer that these things must exist in the most basic TAPIR service. Otherwise we should call it different - maybe even TAPIR Lite as a valid subset: - all operations - all logical operators - the main COPs (<=> like) Cconcepts and response models can be optional without much problems I think. What do you think? should we sacrifice all this to have few clients but many providers? BTW, I think we didnt specify anywhere in capabilities if GET or XML Messaging is supported. So the idea is to always have both for all services, right? Markus _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
Markus,
One excellent reason for NOT "building everything on top of just http calls" is that this is a ridiculous amount of work in the situations that a standard out-of-the-box TAPIR provider will work. If I can just tell a piece of software how my database is configured and how it relates to a set of interesting concepts, I don't have to do anything else to be able to support an enormous range of different queries. TAPIR Lite would allow us to develop some standard basic access APIs that are really easy (plain TAPIR) for a standard RDBMS and which can still be implemented for other systems (e.g. where the only access layer is through some set of business objects).
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
-----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of "Döring, Markus" Sent: 18 November 2005 14:17 To: roger@tdwg.org Cc: tdwg-tapir@lists.tdwg.org Subject: Re: [tdwg-tapir] ideas & TapirLite
Roger, dont get me wrong. I do like the tapir lite idea cause some easy way will be needed. on the other hand I am concerned about becoming too flexible.
why do we want tapir at all and why shouldnt we build everything on top of just http calls?
The great advantage of a tapir service (so is biocase and digir) is that you can construct your own queries. we could surely build all our system on "simple" webservices, but we would have to define the valid set of them, your API. And this will change over time as we cant think of all questions people want to ask in the future. Another reason is to provide data providers with ready to use software so that they dont have to programm anything themselves. thats why we shouldnt really need a tapir wizard for .net
To my point the only reason for having separate non tapir services for TCS is the performance on "complex" databases. But I am not sure even about that. I guess IPNI will not grant their services direct access to their master db but use a copy instead. So they can denormalize things and bring the published db already closer to TCS via views for example. That does cost time, but definetely not as much as writing your own service which has to be maintained and updated.
You were asking earlier about use cases and statistics about digir/biocase queries today. I am not sure about that, but I guess there will be mainly portals and indexing queries accessing our providers. Very simple things we can actually easily emulate with get webservices. So should we maybe get rid of tapir at all? or just have one big instance in front of the gbif cache? then why not issue direct sql statements cause we know the schema of that db?
These are all questions that came to my mind over the past days, but I dont really wanna suggest anything yet.
But coming back to your 3 ways of dealing with tapir lite, my current favorite is full tapir services + wsdl soap services. what is a capability response good for that only tells me I dont understand you?
It starts snowing here in Berlin... Markus
-----Ursprüngliche Nachricht----- Von: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] Im Auftrag von Roger Hyam Gesendet: Freitag, 18. November 2005 13:35 An: Döring, Markus Cc: tdwg-tapir@lists.tdwg.org Betreff: Re: [tdwg-tapir] ideas & TapirLite
Hi Markus,
The original motivations behind TapirLite was a reaction to the custom response model (which seemed very difficult to implement) and the perceived need for a simple web service like system that could be implemented in a simple way on top of complex 'legacy' systems. The fact that the custom response models are now formally on more of an experimental footing has removed 50% of the motivation but the simple service motivation remains.
You are correct in that TapirLite isn't really Tapir at all. It IS just a GET based web service. The kind of thing that full Tapir implementations would have no problem in imitating.
Currently (well some of the time) I am trying to figure out a simple API for taxonomic data source that will enable people like Donald to crawl them in something of a meaningful way. I can't assume that data providers will be happy to install (or write) a full Tapir implementation (what if you are in a .Net only environment etc). They have their own agendas and the simpler the system they have to put in place the more likely they are to do it. My hope was to keep them within the Tapir fold.
So options are:
1. Define the API in terms of simple http calls. Data providers can either write their own script or they can get a full Tapir provider to imitate the taxonomic API. Advantage is it might actually be quick and easy both to define and implement. Disadvantage is it doesn't integrate with other Tapir providers in the long run - no metadata or capabilities responses. 2. Define the API in terms of templated Tapir calls and insure that any script that is written makes the data provider look like a very limited Tapir provider (the TapirLite approach). Advantage is that it provides consistent metadata and other calls in line with other Tapir providers. Disadvantage is that it actually adds complexity to the Tapir protocol by having too many things optional and adds complexity to the custom scripts. 3. Use another technology altogether such as SOAP or XML-RPC to expose the API. Advantage is that organisation and individuals involved are familiar with the technologies, easier to hire and outsource etc (VisualStudio doesn't yet provide a Tapir integration wizard!). Disadvantage is that it doesn't integrate with Tapir.
As I write this all three approaches look equally attractive so I am not advocating anything just rolling ideas around. I'd be grateful for any thoughts that help clarify this. If it would mean getting Tapir to version 1 quicker if option 2 above was dropped then it might be a good strategy. I assume Tapir's primary function is to unite DiGIR and BioCASE and the notion of TapirLite probably should not get in the way of this.
Roger
Döring, Markus wrote:
I would like to get into the lite idea a bit more in detail. Lets start with the list of expected "levels" of tapir compliant services: 1- a full TAPIR service incl an experimental dynamic custom output model 2- a full TAPIR service restricted to certain output models identfied via a list of URLs pointing to the output model definition documents 3- a TAPIR Lite that only wants to accept certain parameters for fixed queries. The main idea as I can see is to have a limited list (maybe only 1?) of query templates here (reminder: QTs are filters & a URL reference to an output model) that define the accepted parameters. I assume this service also only works via http GET and not through xml messaging. The difference between level 1 & 2 is quite small (not necessarily for implementations though). The list of accepted output models simply go into the capabilities of a provider and a client can easily identfy if it is able to communicate with a datasource service. A level 3 TAPIR lite service is quite different from the others. Essentially its a regular GET based webservice that can be described by a WSDL, cause no serialised filter is allowed and the response model is fixed. If we really want to define these kind of simple services with the same protocol schema, what should be its "capabilities"? - only http GET invocation, no xml messaging - the TAPIR envelope should be supported for responses - ping, metadata, capabilites should work - no inventory operation - no (complex) filters or variables, only parameters If only parameters are accepted, then this is not a real search. In the old protocol this was a distinct "view" operation. What we want here is exactly this again. A service only available via http get and parameters. A list of accepted query templates would be enough and no operators, variables and alike need to be supported. The current definition of capabilities does only allow to specify http-GET only services or the accepted list of query templates by the way! A new adhoc idea: what about defining these 3 levels and allowing no other intermediate compliance? Then we can reduce the capabilities a lot, a lot of burden would be removed from clients and we would get more interoperability? Just a quick thought when looking at the above. We could make it as simple as this: <FullService accept_custom_models="false"> <supportedModels> <model location="URI" namespace=""> ... </supportedModels> <concepts> <concept id="..." /> ... </concepts> </FullService> or <LiteService> <supportedQueryTemplates> <template location="URI"> <parameter name="" /> ... </template> ... </supportedQueryTemplates> </LiteService> What do you think? I am really a bit afraid of ending up with different services that implement only bits of the specification. We are about to move all burden towards the clients which for my feeling should be easy to create as a researcher with just simple programming knowledge. Sorry to raise this issue again and especially for this drastic new suggestion. It came up while writing this mail, so dont take it for a well thought idea. I just want to think a bit more about the problems involved in having variable and mutating tapir services. Markus -----Ursprüngliche Nachricht----- Von: Roger Hyam [mailto:roger@tdwg.org] Gesendet: Freitag, 18. November 2005 10:32 An: Donald Hobern Cc: Döring, Markus; tdwg-tapir@lists.tdwg.org Betreff: Re: [tdwg-tapir] ideas & TapirLite As TapirLite champion I do have a problem with having to implement all the operators. I just want to be pretty and dumb! I don't have a problem with the concept of a named subset protocol i.e. Tapir (the real thing) and TapirLite (the not so bright second cousin). Are there some use cases somewhere listing what Tapir clients are expected to call or some statistical break down of what kinds of queries are run against existing BioCASE and DiGIR providers? Roger Donald Hobern wrote:
Markus, Doesn't "all operations" imply that the provider must implement generic search operations? Isn't a large part of the reason for TAPIR Lite the need to support "databases" that cannot be mapped using the standard RDBMS mapping and which are just trying to emulate common views? I would say that these should be supported but that each TDWG content subgroup needs to define a set of (web service) interfaces that must be supported by any compliant provider. If they can handle this set of views, they may appear as TAPIR providers. Or did I miss something? Donald --------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 --------------------------------------------------------------- -----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of "Döring, Markus" Sent: 17 November 2005 16:25 To: tdwg-tapir@lists.tdwg.org Subject: [tdwg-tapir] ideas & TapirLite Hi, talking to Anton today we were wondering if it makes sense to allow a tapir client to embed its own request-id into the tapir headers for later identification of asynchronous and distributed messages. Currently we would need to identify a message by its sendtime (vague) and source. Does this make sense? Does anyone know how other people deal with this problem? ----------- The other thoughts were about TapirLite. We both think its a very bad idea to push all responsibility to the client by allowing any TAPIR service to be very minimalistic. If a client should be able to contact services that have different operators, operations and concepts, then I dont think we will get anything interoperable. I still prefer that these things must exist in the most basic TAPIR service. Otherwise we should call it different - maybe even TAPIR Lite as a valid subset: - all operations - all logical operators - the main COPs (<=> like) Cconcepts and response models can be optional without much problems I think. What do you think? should we sacrifice all this to have few clients but many providers? BTW, I think we didnt specify anywhere in capabilities if GET or XML Messaging is supported. So the idea is to always have both for all services, right? Markus _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
Hi Markus & others
just to say from the point of view of IPNI ...
We're very reluctant to do a denormalised ipni which can be wrapped via something flexible like TAPIR partly because we simply don't have enough disk space and the amount of data to keep pushing out to keep it up to date becomes a big management issue, with scripts to run, check, etc. etc. One of the big points of IPNI is that it's completely live - the data is there as soon as its submitted and we want to maintain that (_especially_ if we're going to be running some sort of GUID resolution service whereby names have to be visible as soon as they are submitted) OTOH We are probably more capable than a lot of other potential wrapper writers of writing a web service type wrapper over an existing http system without needing to use a pre-packaged wrapper like TAPIR. So we're pretty unusual & I'd hate to think you ended up tailoring Tapir or TapirLite just for us :-)
What we can do very easily is respond to any set of field-value pairs (genus=x species=y) which can be arbitrary within a set of predetermined values - you can have as many field-value pairs as you like in whatever combination, and we can construct a query on them (and if you send us some unexpected fields we will just ignore them) In this I don't suppose we're much different from any other web based database which responds to a form for people to fill in any number of search terms and returns pages of data ...
What we can do fairly easily, with a bit of work, is to serve up that data in a number of predetermined formats, obviously as long as it makes sense to ask for IPNI data in that format (TCS, for instance, is fine, ABCD or DarwinCore would be a bit patchy because we don't really handle specimens, GML would be a bit pointless)
What would be hard, but not impossible, would be to serve up arbitrary bits out of TCS (eg a question like give me everything with genus=poa and only send me the basionyms and the publication year)
What I can't see how to do (and maybe I'm not clever enough) would be to allow someone to ask for any combination of query term and output fields, ie. the sort of flexibility that a SQL type wrapper gives you. This is because without flattening IPNI out to a large degree, even if we did have a SQL generator, the results would make no sense for various reasons of historical data etc.
I sense that emails are crossing as I type so I'm going to send this as it is. I'm afraid I'm not immersed enough in Tapir to put it in tapir terms so I hope you can make sense of my ramblings in your own terms and see if that helps in your dilemma
Enjoy the snow Markus, It's just F. Freezing here Sally
Roger, dont get me wrong. I do like the tapir lite idea cause some easy way will be needed. on the other hand I am concerned about becoming too flexible.
why do we want tapir at all and why shouldnt we build everything on top of just http calls?
The great advantage of a tapir service (so is biocase and digir) is that you can construct your own queries. we could surely build all our system on "simple" webservices, but we would have to define the valid set of them, your API. And this will change over time as we cant think of all questions people want to ask in the future. Another reason is to provide data providers with ready to use software so that they dont have to programm anything themselves. thats why we shouldnt really need a tapir wizard for .net
To my point the only reason for having separate non tapir services for TCS is the performance on "complex" databases. But I am not sure even about that. I guess IPNI will not grant their services direct access to their master db but use a copy instead. So they can denormalize things and bring the published db already closer to TCS via views for example. That does cost time, but definetely not as much as writing your own service which has to be maintained and updated.
You were asking earlier about use cases and statistics about digir/biocase queries today. I am not sure about that, but I guess there will be mainly portals and indexing queries accessing our providers. Very simple things we can actually easily emulate with get webservices. So should we maybe get rid of tapir at all? or just have one big instance in front of the gbif cache? then why not issue direct sql statements cause we know the schema of that db?
These are all questions that came to my mind over the past days, but I dont really wanna suggest anything yet.
But coming back to your 3 ways of dealing with tapir lite, my current favorite is full tapir services + wsdl soap services. what is a capability response good for that only tells me I dont understand you?
It starts snowing here in Berlin... Markus
-----Ursprüngliche Nachricht----- Von: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] Im Auftrag von Roger Hyam Gesendet: Freitag, 18. November 2005 13:35 An: Döring, Markus Cc: tdwg-tapir@lists.tdwg.org Betreff: Re: [tdwg-tapir] ideas & TapirLite
Hi Markus,
The original motivations behind TapirLite was a reaction to the custom response model (which seemed very difficult to implement) and the perceived need for a simple web service like system that could be implemented in a simple way on top of complex 'legacy' systems. The fact that the custom response models are now formally on more of an experimental footing has removed 50% of the motivation but the simple service motivation remains.
You are correct in that TapirLite isn't really Tapir at all. It IS just a GET based web service. The kind of thing that full Tapir implementations would have no problem in imitating.
Currently (well some of the time) I am trying to figure out a simple API for taxonomic data source that will enable people like Donald to crawl them in something of a meaningful way. I can't assume that data providers will be happy to install (or write) a full Tapir implementation (what if you are in a .Net only environment etc). They have their own agendas and the simpler the system they have to put in place the more likely they are to do it. My hope was to keep them within the Tapir fold.
So options are:
Define the API in terms of simple http calls. Data providers can either write their own script or they can get a full Tapir provider to imitate the taxonomic API. Advantage is it might actually be quick and easy both to define and implement. Disadvantage is it doesn't integrate with other Tapir providers in the long run - no metadata or capabilities responses.
Define the API in terms of templated Tapir calls and insure that any script that is written makes the data provider look like a very limited Tapir provider (the TapirLite approach). Advantage is that it provides consistent metadata and other calls in line with other Tapir providers. Disadvantage is that it actually adds complexity to the Tapir protocol by having too many things optional and adds complexity to the custom scripts.
Use another technology altogether such as SOAP or XML-RPC to expose the API. Advantage is that organisation and individuals involved are familiar with the technologies, easier to hire and outsource etc (VisualStudio doesn't yet provide a Tapir integration wizard!). Disadvantage is that it doesn't integrate with Tapir.
As I write this all three approaches look equally attractive so I am not advocating anything just rolling ideas around. I'd be grateful for any thoughts that help clarify this. If it would mean getting Tapir to version 1 quicker if option 2 above was dropped then it might be a good strategy. I assume Tapir's primary function is to unite DiGIR and BioCASE and the notion of TapirLite probably should not get in the way of this.
Roger
Döring, Markus wrote:
I would like to get into the lite idea a bit more in detail. Lets start with the list of expected "levels" of tapir compliant services:
1- a full TAPIR service incl an experimental dynamic custom output model
2- a full TAPIR service restricted to certain output models identfied via a list of URLs pointing to the output model definition documents
3- a TAPIR Lite that only wants to accept certain parameters for fixed queries. The main idea as I can see is to have a limited list (maybe only 1?) of query templates here (reminder: QTs are filters & a URL reference to an output model) that define the accepted parameters. I assume this service also only works via http GET and not through xml messaging.
The difference between level 1 & 2 is quite small (not necessarily for implementations though). The list of accepted output models simply go into the capabilities of a provider and a client can easily identfy if it is able to communicate with a datasource service.
A level 3 TAPIR lite service is quite different from the others. Essentially its a regular GET based webservice that can be described by a WSDL, cause no serialised filter is allowed and the response model is fixed. If we really want to define these kind of simple services with the same protocol schema, what should be its "capabilities"? - only http GET invocation, no xml messaging - the TAPIR envelope should be supported for responses - ping, metadata, capabilites should work - no inventory operation - no (complex) filters or variables, only parameters
If only parameters are accepted, then this is not a real search. In the old protocol this was a distinct "view" operation. What we want here is exactly this again. A service only available via http get and parameters. A list of accepted query templates would be enough and no operators, variables and alike need to be supported.
The current definition of capabilities does only allow to specify http-GET only services or the accepted list of query templates by the way!
A new adhoc idea: what about defining these 3 levels and allowing no other intermediate compliance? Then we can reduce the capabilities a lot, a lot of burden would be removed from clients and we would get more interoperability? Just a quick thought when looking at the above.
We could make it as simple as this:
<FullService accept_custom_models="false"> <supportedModels> <model location="URI" namespace=""> ... </supportedModels> <concepts> <concept id="..." /> ... </concepts> </FullService>
or
<LiteService> <supportedQueryTemplates> <template location="URI"> <parameter name="" /> ... </template> ... </supportedQueryTemplates> </LiteService>
What do you think? I am really a bit afraid of ending up with different services that implement only bits of the specification. We are about to move all burden towards the clients which for my feeling should be easy to create as a researcher with just simple programming knowledge.
Sorry to raise this issue again and especially for this drastic new suggestion. It came up while writing this mail, so dont take it for a well thought idea. I just want to think a bit more about the problems involved in having variable and mutating tapir services.
Markus
-----Ursprüngliche Nachricht----- Von: Roger Hyam [mailto:roger@tdwg.org] Gesendet: Freitag, 18. November 2005 10:32 An: Donald Hobern Cc: Döring, Markus; tdwg-tapir@lists.tdwg.org Betreff: Re: [tdwg-tapir] ideas & TapirLite
As TapirLite champion I do have a problem with having to implement all the operators. I just want to be pretty and dumb! I don't have a problem with the concept of a named subset protocol i.e. Tapir (the real thing) and TapirLite (the not so bright second cousin).
Are there some use cases somewhere listing what Tapir clients are expected to call or some statistical break down of what kinds of queries are run against existing BioCASE and DiGIR providers?
Roger
Donald Hobern wrote:
Markus,
Doesn't "all operations" imply that the provider must implement generic search operations? Isn't a large part of the reason for TAPIR Lite the need to support "databases" that cannot be mapped using the standard RDBMS mapping and which are just trying to emulate common views? I would say that these should be supported but that each TDWG content subgroup needs to define a set of (web service) interfaces that must be supported by any compliant provider. If they can handle this set of views, they may appear as TAPIR providers. Or did I miss something? Donald --------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 --------------------------------------------------------------- -----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of "Döring, Markus" Sent: 17 November 2005 16:25 To: tdwg-tapir@lists.tdwg.org Subject: [tdwg-tapir] ideas & TapirLite Hi, talking to Anton today we were wondering if it makes sense to allow a tapir client to embed its own request-id into the tapir headers for later identification of asynchronous and distributed messages. Currently we would need to identify a message by its sendtime (vague) and source. Does this make sense? Does anyone know how other people deal with this problem? ----------- The other thoughts were about TapirLite. We both think its a very bad idea to push all responsibility to the client by allowing any TAPIR service to be very minimalistic. If a client should be able to contact services that have different operators, operations and concepts, then I dont think we will get anything interoperable. I still prefer that these things must exist in the most basic TAPIR service. Otherwise we should call it different - maybe even TAPIR Lite as a valid subset: - all operations - all logical operators - the main COPs (<=> like) Cconcepts and response models can be optional without much problems I think. What do you think? should we sacrifice all this to have few clients but many providers? BTW, I think we didnt specify anywhere in capabilities if GET or XML Messaging is supported. So the idea is to always have both for all services, right? Markus _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
--
Roger Hyam Technical Architect Taxonomic Databases Working Group
http://www.tdwg.org roger@tdwg.org
+44 1578 722782
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
*** Sally Hinchcliffe *** Computer section, Royal Botanic Gardens, Kew *** tel: +44 (0)20 8332 5708 *** S.Hinchcliffe@rbgkew.org.uk
participants (3)
-
"Döring, Markus"
-
Donald Hobern
-
Sally Hinchcliffe