[tdwg-tapir] ideas & TapirLite
Hi, talking to Anton today we were wondering if it makes sense to allow a tapir client to embed its own request-id into the tapir headers for later identification of asynchronous and distributed messages. Currently we would need to identify a message by its sendtime (vague) and source.
Does this make sense? Does anyone know how other people deal with this problem?
-----------
The other thoughts were about TapirLite. We both think its a very bad idea to push all responsibility to the client by allowing any TAPIR service to be very minimalistic. If a client should be able to contact services that have different operators, operations and concepts, then I dont think we will get anything interoperable.
I still prefer that these things must exist in the most basic TAPIR service. Otherwise we should call it different - maybe even TAPIR Lite as a valid subset: - all operations - all logical operators - the main COPs (<=> like)
Cconcepts and response models can be optional without much problems I think. What do you think? should we sacrifice all this to have few clients but many providers?
BTW, I think we didnt specify anywhere in capabilities if GET or XML Messaging is supported. So the idea is to always have both for all services, right?
Markus
Markus,
Doesn't "all operations" imply that the provider must implement generic search operations? Isn't a large part of the reason for TAPIR Lite the need to support "databases" that cannot be mapped using the standard RDBMS mapping and which are just trying to emulate common views?
I would say that these should be supported but that each TDWG content subgroup needs to define a set of (web service) interfaces that must be supported by any compliant provider. If they can handle this set of views, they may appear as TAPIR providers.
Or did I miss something?
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
-----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of "Döring, Markus" Sent: 17 November 2005 16:25 To: tdwg-tapir@lists.tdwg.org Subject: [tdwg-tapir] ideas & TapirLite
Hi, talking to Anton today we were wondering if it makes sense to allow a tapir client to embed its own request-id into the tapir headers for later identification of asynchronous and distributed messages. Currently we would need to identify a message by its sendtime (vague) and source.
Does this make sense? Does anyone know how other people deal with this problem?
-----------
The other thoughts were about TapirLite. We both think its a very bad idea to push all responsibility to the client by allowing any TAPIR service to be very minimalistic. If a client should be able to contact services that have different operators, operations and concepts, then I dont think we will get anything interoperable.
I still prefer that these things must exist in the most basic TAPIR service. Otherwise we should call it different - maybe even TAPIR Lite as a valid subset: - all operations - all logical operators - the main COPs (<=> like)
Cconcepts and response models can be optional without much problems I think. What do you think? should we sacrifice all this to have few clients but many providers?
BTW, I think we didnt specify anywhere in capabilities if GET or XML Messaging is supported. So the idea is to always have both for all services, right?
Markus
_______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
As TapirLite champion I do have a problem with having to implement all the operators. I just want to be pretty and dumb! I don't have a problem with the concept of a named subset protocol i.e. Tapir (the real thing) and TapirLite (the not so bright second cousin).
Are there some use cases somewhere listing what Tapir clients are expected to call or some statistical break down of what kinds of queries are run against existing BioCASE and DiGIR providers?
Roger
Donald Hobern wrote:
Markus,
Doesn't "all operations" imply that the provider must implement generic search operations? Isn't a large part of the reason for TAPIR Lite the need to support "databases" that cannot be mapped using the standard RDBMS mapping and which are just trying to emulate common views?
I would say that these should be supported but that each TDWG content subgroup needs to define a set of (web service) interfaces that must be supported by any compliant provider. If they can handle this set of views, they may appear as TAPIR providers.
Or did I miss something?
Donald
Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480
-----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of "Döring, Markus" Sent: 17 November 2005 16:25 To: tdwg-tapir@lists.tdwg.org Subject: [tdwg-tapir] ideas & TapirLite
Hi, talking to Anton today we were wondering if it makes sense to allow a tapir client to embed its own request-id into the tapir headers for later identification of asynchronous and distributed messages. Currently we would need to identify a message by its sendtime (vague) and source.
Does this make sense? Does anyone know how other people deal with this problem?
The other thoughts were about TapirLite. We both think its a very bad idea to push all responsibility to the client by allowing any TAPIR service to be very minimalistic. If a client should be able to contact services that have different operators, operations and concepts, then I dont think we will get anything interoperable.
I still prefer that these things must exist in the most basic TAPIR service. Otherwise we should call it different - maybe even TAPIR Lite as a valid subset:
- all operations
- all logical operators
- the main COPs (<=> like)
Cconcepts and response models can be optional without much problems I think. What do you think? should we sacrifice all this to have few clients but many providers?
BTW, I think we didnt specify anywhere in capabilities if GET or XML Messaging is supported. So the idea is to always have both for all services, right?
Markus
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
I suppose we could implement an optional "conversation id" or "request id". I've used this kind of thing before when implementing some of the FIPA protocols for distributed intelligent agent systems. This will at least provide a hook to build on top of, so long as it's optional.
What bothers me about it is that it starts to make the protocol look transactional. Right now TAPIR is a simple, stateless protocol. That makes it easy for us to implement it over the top of HTTP. Once we start talking about distributed, or especially asynchronous processing of TAPIR messages, things get complicated.
If TAPIR becomes asynchronous, clients will have to have a mechanism to cancel a request before they get a response (meaning free up state and move on). To do this properly, TAPIR requests should be given a time to live so that downstream services that process requests can also cancel and free up state when necessary. We may also need to give the client a method for telling the service it's talking to that it should not pass requests down or should only do so to a certain depth, in other words for the client to have some control over how it's request is serviced.
In summary, we may need several additional mechanisms to make this a stateful asychronous distributed protocol. I'm not against adding a request id attribute, but going much beyond that will require some serious thought as to the implications.
-Steve
Döring, Markus wrote:
Hi, talking to Anton today we were wondering if it makes sense to allow a tapir client to embed its own request-id into the tapir headers for later identification of asynchronous and distributed messages. Currently we would need to identify a message by its sendtime (vague) and source.
Does this make sense? Does anyone know how other people deal with this problem?
The other thoughts were about TapirLite. We both think its a very bad idea to push all responsibility to the client by allowing any TAPIR service to be very minimalistic. If a client should be able to contact services that have different operators, operations and concepts, then I dont think we will get anything interoperable.
I still prefer that these things must exist in the most basic TAPIR service. Otherwise we should call it different - maybe even TAPIR Lite as a valid subset:
- all operations
- all logical operators
- the main COPs (<=> like)
Cconcepts and response models can be optional without much problems I think. What do you think? should we sacrifice all this to have few clients but many providers?
BTW, I think we didnt specify anywhere in capabilities if GET or XML Messaging is supported. So the idea is to always have both for all services, right?
Markus
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
I tend to agree with Steve here, and I think we should be careful when trying to accommodate more different services with the same protocol. I'm not arguing about the usefulness of having services like:
* Give me a complete dump of this huge data source. * Do a search in all these networks using some criteria, keep trying for "n" days if any provider is down, and then return to me a report.
The problem is that there are certainly more things that would need to be considered in the protocol so that we can safely cover new scenarios.
If our data provider implementations will remain stateless (which is the case right now), even if we only include a new "request-id" element in the protocol, it would only be used to interact with these new top-level services, not with the "real" providers.
Maybe TAPIR could be wrapped by another generic asynchronous protocol? Or maybe those different services could try to use TAPIR extension hooks for their specific purposes? (the request-id could be inside /header/custom for instance). Although personally I would prefer to use a ticket-based approach instead of making the client send a request-id.
So (currently) I would suggest to keep TAPIR completely focused on a stateless world. -- Renato
On 17 Nov 2005 at 16:25, Döring, Markus wrote:
talking to Anton today we were wondering if it makes sense to allow a tapir client to embed its own request-id into the tapir headers for later identification of asynchronous and distributed messages. Currently we would need to identify a message by its sendtime (vague) and source.
Does this make sense? Does anyone know how other people deal with this problem?
Hi Markus,
I'm not sure that all client software will need to handle all possible types of TAPIR providers. When writing a client for a specific network, one could even assume the level of functionality available from all providers and rely on the network registry (UDDI, manual configuration file, or whatever) to point only to the compatible ones.
On the other hand, I do agree that the new "operators" section in the capabilities response is more complicated than necessary. I would prefer to see "operators" an optional element (TapirLite would simply not have it), but then we could make all other elements inside it mandatory (except the custom ones). This way we could easily distinguish minimalistic implementations, and at the same time get rid of weird situations when a provider could offer the "and" operator but not the "or", and things like that. In other words, if "operators" is present, we could safely assume a lot of things without adding much complexity to clients. -- Renato
On 17 Nov 2005 at 16:25, Döring, Markus wrote:
The other thoughts were about TapirLite. We both think its a very bad idea to push all responsibility to the client by allowing any TAPIR service to be very minimalistic. If a client should be able to contact services that have different operators, operations and concepts, then I dont think we will get anything interoperable.
I still prefer that these things must exist in the most basic TAPIR service. Otherwise we should call it different - maybe even TAPIR Lite as a valid subset: - all operations - all logical operators - the main COPs (<=> like)
Cconcepts and response models can be optional without much problems I think. What do you think? should we sacrifice all this to have few clients but many providers?
BTW, I think we didnt specify anywhere in capabilities if GET or XML Messaging is supported. So the idea is to always have both for all services, right?
Markus
Good Morning All,
I've been giving this some thought as well as I may be one of the people implementing client-side software. I have much of the same concerns as Markus, that with the capabilities response as it stands generic clients will need to be exceedingly intelligent to prevent portals from supporting only the lowest common denominator of the feature set of all of its providers.
That being said, Renato is correct in his assessment of how providers have been setup in the past. Most of the work with data sharing (at least in the US) has been through thematic networks (MaNIS, HerpNET, ORNIS and FishNET) with a small number of people configuring the data providers. For situtations like these it is most likely the case that there will already be some minimum requirements that data providers must have. It may be a close to full TAPIR implementation or even TAPIR Lite with query templates for that particular thematic network. Either way, the clients for these networks will likely not require the intelligence that a generic TAPIR client will because enough of the capabilities will be known ahead of time that a functional portal could be written.
While this is generally true, there are some obvious situations where a more generic portal (ala DiGIR, but with a friendlier customizable user interface) would be desired. Take the case where MaNIS is upgraded and John decided that each of the providers must implement TAPIR with search and operations and HerpNET is upgraded and decided to implement TAPIRLite with query templates. For each of those thematic networks, the portal is fine because the developers of it know the capabilities before hand, but now assume a regional or institution portal is being setup that uses both MaNIS and HerpNET, they'll also have to write a custom portal that understands both TAPIR and the HerpNET specific implementation of TAPIRLite. Further, say a portal wants to integrate specimen and taxon concept data, but the specimens implement TAPIR and all TCS providers are TAPIRLite. Once again, the developers must write an entirely custom portal.
It would be nice if there was a baseline set of functionality that TAPIR providers must have rather than absolutely everything being optional and defined within the capabilities so that a generic portal could be written with a customizable user interface.
My thoughts on all of this are that at minimum search should be required by TAPIR implementations and that all operations except "in" should be required (which is not supported by RDF, but can be translated using boolean and comparative operators). The question that seems to be the core of all of the recent concerns looks likes its boiling down too: "Are you a TAPIRLite implementation or a TAPIR implementation?"
Given that search and operations were mandatory for TAPIR implementations, it seems to me that the complexity of client-side code could be radically simplified if there was just a simple way for a provider to say "I'm a TAPIR Lite implementation." This very well could be as Renato suggested, that operations are disabled or even that search is disabled (given that it is mandatory for TAPIR implementations).
Assuming the above was present in the TAPIR protocol and a mixed network of TAPIR/TAPIR Lite implementations, the generic portal's basic logic might look something like the following:
If Provider x is TAPIRLite: Find all query templates that match my output model
As user constructs query, remove those query templates that could not possibly be used (missing concept, etc.)
/* Likely more logic here if > 1 query templates are left * possibly with user interaction */
Else If Provider x is TAPIR: Ensure x supports my output model
Can assume that search is available
Can assume that operations (exception maybe "in") are supported
/* If client is ambitious, may try to use any query templates defined * within search capabilities (would be same alg. as TAPIRLite), but * is optional */
The seems, at least on first glance by myself, to provide baseline functionality for both client/server side software that isn't just ping, capabilities and metadata. This should improve on issues with interoperability as well as provide the minimum set of functionality that both client/server must support and can be rigorously tested. Further because the logic is vastly simplified, it would be feasible to compose a generic, customizable portal.
Anyway, just some thoughts... - Rob
Renato De Giovanni wrote:
Hi Markus,
I'm not sure that all client software will need to handle all possible types of TAPIR providers. When writing a client for a specific network, one could even assume the level of functionality available from all providers and rely on the network registry (UDDI, manual configuration file, or whatever) to point only to the compatible ones.
On the other hand, I do agree that the new "operators" section in the capabilities response is more complicated than necessary. I would prefer to see "operators" an optional element (TapirLite would simply not have it), but then we could make all other elements inside it mandatory (except the custom ones). This way we could easily distinguish minimalistic implementations, and at the same time get rid of weird situations when a provider could offer the "and" operator but not the "or", and things like that. In other words, if "operators" is present, we could safely assume a lot of things without adding much complexity to clients. -- Renato
On 17 Nov 2005 at 16:25, Döring, Markus wrote:
The other thoughts were about TapirLite. We both think its a very bad idea to push all responsibility to the client by allowing any TAPIR service to be very minimalistic. If a client should be able to contact services that have different operators, operations and concepts, then I dont think we will get anything interoperable.
I still prefer that these things must exist in the most basic TAPIR service. Otherwise we should call it different - maybe even TAPIR Lite as a valid subset: - all operations - all logical operators - the main COPs (<=> like)
Cconcepts and response models can be optional without much problems I think. What do you think? should we sacrifice all this to have few clients but many providers?
BTW, I think we didnt specify anywhere in capabilities if GET or XML Messaging is supported. So the idea is to always have both for all services, right?
Markus
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
Hi Rob,
It seems we agree on everything, I just have some quick comments...
Further, say a portal wants to integrate specimen and taxon concept data, but the specimens implement TAPIR and all TCS providers are TAPIRLite. Once again, the developers must write an entirely custom portal.
This could become even more difficult if the TAPIR protocol leaves no room for TapirLite implementations. Then the TCS networks would probably be based on a completely different protocol, and the integration would be more complicated.
...
Assuming the above was present in the TAPIR protocol and a mixed network of TAPIR/TAPIR Lite implementations, the generic portal's basic logic might look something like the following:
If Provider x is TAPIRLite: Find all query templates that match my output model
As user constructs query, remove those query templates that could not possibly be used (missing concept, etc.)
/* Likely more logic here if > 1 query templates are left * possibly with user interaction */
Else If Provider x is TAPIR: Ensure x supports my output model
Can assume that search is available
Can assume that operations (exception maybe "in") are supported
/* If client is ambitious, may try to use any query templates defined * within search capabilities (would be same alg. as TAPIRLite), but * is optional */
That's an interesting exercise. In this scenario, it seems that the portal is configured to work with one specific output model (could be more than one if we want to complicate things, the user could choose one of them and the portal could know how to make transformations or to dynamically filter providers).
When you say "Ensure x supports my output model" that actually means "check that x supports custom output models and mapped the necessary concepts, otherwise check that x supports any query template associated to the output model I want", right?
By the way, looking at the capabilities response, is there any sense on having a provider that supports the search operation, does not accept custom output models, and does not define any query templates?
Regards, -- Renato
Hi Renato,
My comments inline...
Renato De Giovanni wrote:
Hi Rob,
It seems we agree on everything, I just have some quick comments...
Further, say a portal wants to integrate specimen and taxon concept data, but the specimens implement TAPIR and all TCS providers are TAPIRLite. Once again, the developers must write an entirely custom portal.
This could become even more difficult if the TAPIR protocol leaves no room for TapirLite implementations. Then the TCS networks would probably be based on a completely different protocol, and the integration would be more complicated.
...
I'm definitely not arguing of making TAPIR so there is no room for TAPIRLite, but that anything that is not TAPIRLite must at minimum support both search and operations so there is at least a baseline of useful functionality (ping, metadata and capabilities aside) that all clients can take advantage of without requiring a large body of code just to handle the logic of understanding the capabilities and running the risk that there is no interoperability as Marcus has shown concern about. If I missed a similar argument on this list my apologies, I only recall discussions about what filter operations should/should not be required but nothing about TAPIR operations.
Assuming the above was present in the TAPIR protocol and a mixed network of TAPIR/TAPIR Lite implementations, the generic portal's basic logic might look something like the following:
If Provider x is TAPIRLite: Find all query templates that match my output model
As user constructs query, remove those query templates that could not possibly be used (missing concept, etc.)
/* Likely more logic here if > 1 query templates are left
- possibly with user interaction */
Else If Provider x is TAPIR: Ensure x supports my output model
Can assume that search is available
Can assume that operations (exception maybe "in") are supported
/* If client is ambitious, may try to use any query templates defined
- within search capabilities (would be same alg. as TAPIRLite),
but * is optional */
That's an interesting exercise. In this scenario, it seems that the portal is configured to work with one specific output model (could be more than one if we want to complicate things, the user could choose one of them and the portal could know how to make transformations or to dynamically filter providers).
It would certainly complicate things depending on the output models. There are a couple of scenarios I see here. Scenario 1) Portals that support multiple models but of the same data object (ex. both DwC and ABCD) and 2) Portals that support multiple models of conceptually different data objects (ex. DwC and TCS).
In the first case, I see there are potentially three ways to handle such a situation. 1) The portal is customizable by having a plug-in architecture that can handle transformations between two different schemas via XSLT/code. 2) The providers must support both output models. 3) The experimental custom output model mode is used on those providers that support it.
There are benefits and drawbacks to all. In option 1) the portals architecture and configuration is a bit more complicated, but because the transformation can be accomplished by XSLT or code or a combination it can take into account semantic transformations. In option 2) there is a risk that there might be relevant data that cannot be accessed because a provider does not support the output models the portal desires, but this would be safest route. In option 3) potentially might run into some of the problems discussed in Madrid.
In the second case where the portal is integrating different types of information, the portal really has multiple sets of providers that are able to be queried depending on the output models and concepts those providers have available. In the example there is the pool of providers that serve specimen data and the pool that serve taxonomic data. One (probably naive) way this could be handled is that the portal is configured with some sort of mapping that says something like (in English because I haven't really sat down to sketch this out in pseudocode):
If User queries dwc:/ScientificName, also query providers with an output model of TCS using tcs:/DataSet/TaxonName/Simple
Anything that is unmapped would result in queries only being transmitted to those providers that support the output model from which the concept in the filter originated. So dwc:/YearCollected would only query those providers in the pool supporting an output model of DwC and tcs:/DataSet/TaxonConcept/AccordingTo/Simple would only query those providers in the pool supporting an output model of TCS.
What I haven't really thought much about is how to prevent (or what to do in the case it does happen) when someone creates a filter that uses concepts from each output model supported but are unmapped (as above not in terms of TAPIR mappings). Ex. dwc:/YearCollected = 1978 and tcs:/DataSet/TaxonConcept/AccordingTo/Simple = Jones. I guess one way to handle it would be to run each query on its pool of providers then merge them back based on the mappings. For example, with the the above example, the results after merging may contain only those specimens whose scientific name is equal to the TaxonName/Simple of those concepts authored by Jones. (Not sure if that is clear at all).
When you say "Ensure x supports my output model" that actually means "check that x supports custom output models and mapped the necessary concepts, otherwise check that x supports any query template associated to the output model I want", right?
Hmmm...ok, I just took a look at the schema again to refresh my memory. I thought there was a place in the capabilities to define the output models that are supported (either that or I was influenced by the <FullService>/<LiteService> recommendation Marcus made). I see now that the only place this is defined is in the query templates. So the portal would need to iterate through the query templates to find out which ones supported the output model the portal was configured to be interested in. For TAPIR implementations the portal would most likely favor those query templates where there was no filter already existing so it wouldn't have to deal with reconciliation of the filter being defined by the user and the filter within the query template. For a generic client/portal I don't know that supporting custom output models would be the route I would go, but like with TAPIR we could probably leave a hook for it.
By the way, looking at the capabilities response, is there any sense on having a provider that supports the search operation, does not accept custom output models, and does not define any query templates?
I can't see how any provider that fit this description would be useful. If it does not support custom output models and doesn't define any query templates the provider would have no idea into what structure it should be marshaling the response of the query.
Regards,
Renato
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
Anyway, just some more slowly coalescing thoughts bouncing around the gray matter void,
- rob
I'm afraid that I have been way too busy to keep up over the last few days, but I'd like to add my input one more time on TAPIRLite and TAPIR.
I really see these as part of a continuum. There are two points on this continuum to which we have given names:
TAPIRLite = ping + metadata + capabilities + a set of predefined parameterised scan or search (view) operations which have been defined somewhere using standard TAPIR scan and search definitions
TAPIR = all of the above plus free-format scan and search operations and ability to handle any properly formed TAPIR request
I can see really good uses for both of these with the TDWG standards and in the GBIF network, and quite frankly I will be in serious trouble if one of the two is not supported. I have a preference for keeping the name TAPIR somehow attached to both so that we emphasise what is common in our approaches.
However I am not convinced that it is worth our while taking time to exclude any functional subset from what we will regard as "TAPIR". It will always be possible to define sets of capabilities which include search and scan behaviour but which are functionally useless (essential elements not being searchable, etc.). I suggest that we leave it to different communities (i.e. different TDWG content subgroups) to develop application profiles for the use of TAPIR with their data. For example, there should be a TDWG standard defining which elements should be searchable within a DwC data set, and perhaps documenting the minimal acceptable set of views which all specimen data sets MUST be able to support if they are to be accepted as TAPIRLite specimen or taxon concept providers. There could be contexts in which a community decides to make use of TAPIR solely with scan requests for some purpose, or solely with search requests. Why should we try to prevent this?
Thanks for working so hard on all this.
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
-----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of Robert Gales Sent: 21 November 2005 23:25 To: Renato De Giovanni Cc: tdwg-tapir@lists.tdwg.org Subject: Re: [tdwg-tapir] TapirLite
Hi Renato,
My comments inline...
Renato De Giovanni wrote:
Hi Rob,
It seems we agree on everything, I just have some quick comments...
Further, say a portal wants to integrate specimen and taxon concept data, but the specimens implement TAPIR and all TCS providers are TAPIRLite. Once again, the developers must write an entirely custom portal.
This could become even more difficult if the TAPIR protocol leaves no room for TapirLite implementations. Then the TCS networks would probably be based on a completely different protocol, and the integration would be more complicated.
...
I'm definitely not arguing of making TAPIR so there is no room for TAPIRLite, but that anything that is not TAPIRLite must at minimum support both search and operations so there is at least a baseline of useful functionality (ping, metadata and capabilities aside) that all clients can take advantage of without requiring a large body of code just to handle the logic of understanding the capabilities and running the risk that there is no interoperability as Marcus has shown concern about. If I missed a similar argument on this list my apologies, I only recall discussions about what filter operations should/should not be required but nothing about TAPIR operations.
Assuming the above was present in the TAPIR protocol and a mixed network of TAPIR/TAPIR Lite implementations, the generic portal's basic logic might look something like the following:
If Provider x is TAPIRLite: Find all query templates that match my output model
As user constructs query, remove those query templates that could not possibly be used (missing concept, etc.)
/* Likely more logic here if > 1 query templates are left
- possibly with user interaction */
Else If Provider x is TAPIR: Ensure x supports my output model
Can assume that search is available
Can assume that operations (exception maybe "in") are supported
/* If client is ambitious, may try to use any query templates defined
- within search capabilities (would be same alg. as TAPIRLite),
but * is optional */
That's an interesting exercise. In this scenario, it seems that the portal is configured to work with one specific output model (could be more than one if we want to complicate things, the user could choose one of them and the portal could know how to make transformations or to dynamically filter providers).
It would certainly complicate things depending on the output models. There are a couple of scenarios I see here. Scenario 1) Portals that support multiple models but of the same data object (ex. both DwC and ABCD) and 2) Portals that support multiple models of conceptually different data objects (ex. DwC and TCS).
In the first case, I see there are potentially three ways to handle such a situation. 1) The portal is customizable by having a plug-in architecture that can handle transformations between two different schemas via XSLT/code. 2) The providers must support both output models. 3) The experimental custom output model mode is used on those providers that support it.
There are benefits and drawbacks to all. In option 1) the portals architecture and configuration is a bit more complicated, but because the transformation can be accomplished by XSLT or code or a combination it can take into account semantic transformations. In option 2) there is a risk that there might be relevant data that cannot be accessed because a provider does not support the output models the portal desires, but this would be safest route. In option 3) potentially might run into some of the problems discussed in Madrid.
In the second case where the portal is integrating different types of information, the portal really has multiple sets of providers that are able to be queried depending on the output models and concepts those providers have available. In the example there is the pool of providers that serve specimen data and the pool that serve taxonomic data. One (probably naive) way this could be handled is that the portal is configured with some sort of mapping that says something like (in English because I haven't really sat down to sketch this out in pseudocode):
If User queries dwc:/ScientificName, also query providers with an output model of TCS using tcs:/DataSet/TaxonName/Simple
Anything that is unmapped would result in queries only being transmitted to those providers that support the output model from which the concept in the filter originated. So dwc:/YearCollected would only query those providers in the pool supporting an output model of DwC and tcs:/DataSet/TaxonConcept/AccordingTo/Simple would only query those providers in the pool supporting an output model of TCS.
What I haven't really thought much about is how to prevent (or what to do in the case it does happen) when someone creates a filter that uses concepts from each output model supported but are unmapped (as above not in terms of TAPIR mappings). Ex. dwc:/YearCollected = 1978 and tcs:/DataSet/TaxonConcept/AccordingTo/Simple = Jones. I guess one way to handle it would be to run each query on its pool of providers then merge them back based on the mappings. For example, with the the above example, the results after merging may contain only those specimens whose scientific name is equal to the TaxonName/Simple of those concepts authored by Jones. (Not sure if that is clear at all).
When you say "Ensure x supports my output model" that actually means "check that x supports custom output models and mapped the necessary concepts, otherwise check that x supports any query template associated to the output model I want", right?
Hmmm...ok, I just took a look at the schema again to refresh my memory. I thought there was a place in the capabilities to define the output models that are supported (either that or I was influenced by the <FullService>/<LiteService> recommendation Marcus made). I see now that the only place this is defined is in the query templates. So the portal would need to iterate through the query templates to find out which ones supported the output model the portal was configured to be interested in. For TAPIR implementations the portal would most likely favor those query templates where there was no filter already existing so it wouldn't have to deal with reconciliation of the filter being defined by the user and the filter within the query template. For a generic client/portal I don't know that supporting custom output models would be the route I would go, but like with TAPIR we could probably leave a hook for it.
By the way, looking at the capabilities response, is there any sense on having a provider that supports the search operation, does not accept custom output models, and does not define any query templates?
I can't see how any provider that fit this description would be useful. If it does not support custom output models and doesn't define any query templates the provider would have no idea into what structure it should be marshaling the response of the query.
Regards,
Renato
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
Anyway, just some more slowly coalescing thoughts bouncing around the gray matter void,
- rob
participants (6)
-
"Döring, Markus"
-
Donald Hobern
-
Renato De Giovanni
-
Robert Gales
-
Roger Hyam
-
Steven Perry