Re: [tdwg-tapir] TapirLite
Hello, reading Donalds message I had yet another idea. Donald was saying we are having parameterised inventories, but thats not true. We can only use the filter parameter with scans, but no predefined filters (similar to query templates). But it would be great and straight forward to have them.
So I was thinking why didnt we keep filters and output models entirely separate from each other? Why not request a specific output model in a search and also request a certain filter-parameter template? This could then be reused for inventories where we dont need any output model.
It would probably also look better in capabilities, if the output model is listed separately for searches (see robs portal example) and the query templates (maybe better filter templates?) are a distinct section in capabilities that can be used in inventories ans searches?
It currently sounds like a good idea to me ! Markus
-----Ursprüngliche Nachricht----- Von: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] Im Auftrag von Donald Hobern Gesendet: Montag, 21. November 2005 23:43 An: rgales@ku.edu; 'Renato De Giovanni' Cc: tdwg-tapir@lists.tdwg.org Betreff: Re: [tdwg-tapir] TapirLite
I'm afraid that I have been way too busy to keep up over the last few days, but I'd like to add my input one more time on TAPIRLite and TAPIR.
I really see these as part of a continuum. There are two points on this continuum to which we have given names:
TAPIRLite = ping + metadata + capabilities + a set of predefined parameterised scan or search (view) operations which have been defined somewhere using standard TAPIR scan and search definitions
TAPIR = all of the above plus free-format scan and search operations and ability to handle any properly formed TAPIR request
I can see really good uses for both of these with the TDWG standards and in the GBIF network, and quite frankly I will be in serious trouble if one of the two is not supported. I have a preference for keeping the name TAPIR somehow attached to both so that we emphasise what is common in our approaches.
However I am not convinced that it is worth our while taking time to exclude any functional subset from what we will regard as "TAPIR". It will always be possible to define sets of capabilities which include search and scan behaviour but which are functionally useless (essential elements not being searchable, etc.). I suggest that we leave it to different communities (i.e. different TDWG content subgroups) to develop application profiles for the use of TAPIR with their data. For example, there should be a TDWG standard defining which elements should be searchable within a DwC data set, and perhaps documenting the minimal acceptable set of views which all specimen data sets MUST be able to support if they are to be accepted as TAPIRLite specimen or taxon concept providers. There could be contexts in which a community decides to make use of TAPIR solely with scan requests for some purpose, or solely with search requests. Why should we try to prevent this?
Thanks for working so hard on all this.
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
-----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of Robert Gales Sent: 21 November 2005 23:25 To: Renato De Giovanni Cc: tdwg-tapir@lists.tdwg.org Subject: Re: [tdwg-tapir] TapirLite
Hi Renato,
My comments inline...
Renato De Giovanni wrote:
Hi Rob,
It seems we agree on everything, I just have some quick comments...
Further, say a portal wants to integrate specimen and taxon concept data, but the specimens implement TAPIR and all TCS providers are TAPIRLite. Once again, the developers must write an entirely custom portal.
This could become even more difficult if the TAPIR protocol leaves no room for TapirLite implementations. Then the TCS networks would probably be based on a completely different protocol, and the integration would be more complicated.
...
I'm definitely not arguing of making TAPIR so there is no room for TAPIRLite, but that anything that is not TAPIRLite must at minimum support both search and operations so there is at least a baseline of useful functionality (ping, metadata and capabilities aside) that all clients can take advantage of without requiring a large body of code just to handle the logic of understanding the capabilities and running the risk that there is no interoperability as Marcus has shown concern about. If I missed a similar argument on this list my apologies, I only recall discussions about what filter operations should/should not be required but nothing about TAPIR operations.
Assuming the above was present in the TAPIR protocol and a mixed network of TAPIR/TAPIR Lite implementations, the generic portal's basic logic might look something like the following:
If Provider x is TAPIRLite: Find all query templates that match my output model
As user constructs query, remove those query templates that could not possibly be used (missing concept, etc.)
/* Likely more logic here if > 1 query templates are left
- possibly with user interaction */
Else If Provider x is TAPIR: Ensure x supports my output model
Can assume that search is available
Can assume that operations (exception maybe "in") are supported
/* If client is ambitious, may try to use any query templates defined
- within search capabilities (would be same alg. as TAPIRLite),
but * is optional */
That's an interesting exercise. In this scenario, it seems that the portal is configured to work with one specific output model (could be more than one if we want to complicate things, the user could choose one of them and the portal could know how to make transformations or to dynamically filter providers).
It would certainly complicate things depending on the output models. There are a couple of scenarios I see here. Scenario 1) Portals that support multiple models but of the same data object (ex. both DwC and ABCD) and 2) Portals that support multiple models of conceptually different data objects (ex. DwC and TCS).
In the first case, I see there are potentially three ways to handle such a situation. 1) The portal is customizable by having a plug-in architecture that can handle transformations between two different schemas via XSLT/code. 2) The providers must support both output models. 3) The experimental custom output model mode is used on those providers that support it.
There are benefits and drawbacks to all. In option 1) the portals architecture and configuration is a bit more complicated, but because the transformation can be accomplished by XSLT or code or a combination it can take into account semantic transformations. In option 2) there is a risk that there might be relevant data that cannot be accessed because a provider does not support the output models the portal desires, but this would be safest route. In option 3) potentially might run into some of the problems discussed in Madrid.
In the second case where the portal is integrating different types of information, the portal really has multiple sets of providers that are able to be queried depending on the output models and concepts those providers have available. In the example there is the pool of providers that serve specimen data and the pool that serve taxonomic data. One (probably naive) way this could be handled is that the portal is configured with some sort of mapping that says something like (in English because I haven't really sat down to sketch this out in pseudocode):
If User queries dwc:/ScientificName, also query providers with an output model of TCS using tcs:/DataSet/TaxonName/Simple
Anything that is unmapped would result in queries only being transmitted to those providers that support the output model from which the concept in the filter originated. So dwc:/YearCollected would only query those providers in the pool supporting an output model of DwC and tcs:/DataSet/TaxonConcept/AccordingTo/Simple would only query those providers in the pool supporting an output model of TCS.
What I haven't really thought much about is how to prevent (or what to do in the case it does happen) when someone creates a filter that uses concepts from each output model supported but are unmapped (as above not in terms of TAPIR mappings). Ex. dwc:/YearCollected = 1978 and tcs:/DataSet/TaxonConcept/AccordingTo/Simple = Jones. I guess one way to handle it would be to run each query on its pool of providers then merge them back based on the mappings. For example, with the the above example, the results after merging may contain only those specimens whose scientific name is equal to the TaxonName/Simple of those concepts authored by Jones. (Not sure if that is clear at all).
When you say "Ensure x supports my output model" that actually means "check that x supports custom output models and mapped the necessary concepts, otherwise check that x supports any query template associated to the output model I want", right?
Hmmm...ok, I just took a look at the schema again to refresh my memory. I thought there was a place in the capabilities to define the output models that are supported (either that or I was influenced by the <FullService>/<LiteService> recommendation Marcus made). I see now that the only place this is defined is in the query templates. So the portal would need to iterate through the query templates to find out which ones supported the output model the portal was configured to be interested in. For TAPIR implementations the portal would most likely favor those query templates where there was no filter already existing so it wouldn't have to deal with reconciliation of the filter being defined by the user and the filter within the query template. For a generic client/portal I don't know that supporting custom output models would be the route I would go, but like with TAPIR we could probably leave a hook for it.
By the way, looking at the capabilities response, is there any sense on having a provider that supports the search operation, does not accept custom output models, and does not define any query templates?
I can't see how any provider that fit this description would be useful. If it does not support custom output models and doesn't define any query templates the provider would have no idea into what structure it should be marshaling the response of the query.
Regards,
Renato
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
Anyway, just some more slowly coalescing thoughts bouncing around the gray matter void,
- rob
_______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
When I wrote this, I thought you did not support scan with TAPIRLite, but I included it because I would need at least to be able to fake some aspects of scan using search views.
As for your idea, this seems fine to me. Making things modular is always good.
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
-----Original Message----- From: "Döring, Markus" [mailto:m.doering@BGBM.org] Sent: 22 November 2005 09:33 To: Donald Hobern; rgales@ku.edu; Renato De Giovanni Cc: tdwg-tapir@lists.tdwg.org Subject: AW: [tdwg-tapir] TapirLite
Hello, reading Donalds message I had yet another idea. Donald was saying we are having parameterised inventories, but thats not true. We can only use the filter parameter with scans, but no predefined filters (similar to query templates). But it would be great and straight forward to have them.
So I was thinking why didnt we keep filters and output models entirely separate from each other? Why not request a specific output model in a search and also request a certain filter-parameter template? This could then be reused for inventories where we dont need any output model.
It would probably also look better in capabilities, if the output model is listed separately for searches (see robs portal example) and the query templates (maybe better filter templates?) are a distinct section in capabilities that can be used in inventories ans searches?
It currently sounds like a good idea to me ! Markus
-----Ursprüngliche Nachricht----- Von: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] Im Auftrag von Donald Hobern Gesendet: Montag, 21. November 2005 23:43 An: rgales@ku.edu; 'Renato De Giovanni' Cc: tdwg-tapir@lists.tdwg.org Betreff: Re: [tdwg-tapir] TapirLite
I'm afraid that I have been way too busy to keep up over the last few days, but I'd like to add my input one more time on TAPIRLite and TAPIR.
I really see these as part of a continuum. There are two points on this continuum to which we have given names:
TAPIRLite = ping + metadata + capabilities + a set of predefined parameterised scan or search (view) operations which have been defined somewhere using standard TAPIR scan and search definitions
TAPIR = all of the above plus free-format scan and search operations and ability to handle any properly formed TAPIR request
I can see really good uses for both of these with the TDWG standards and in the GBIF network, and quite frankly I will be in serious trouble if one of the two is not supported. I have a preference for keeping the name TAPIR somehow attached to both so that we emphasise what is common in our approaches.
However I am not convinced that it is worth our while taking time to exclude any functional subset from what we will regard as "TAPIR". It will always be possible to define sets of capabilities which include search and scan behaviour but which are functionally useless (essential elements not being searchable, etc.). I suggest that we leave it to different communities (i.e. different TDWG content subgroups) to develop application profiles for the use of TAPIR with their data. For example, there should be a TDWG standard defining which elements should be searchable within a DwC data set, and perhaps documenting the minimal acceptable set of views which all specimen data sets MUST be able to support if they are to be accepted as TAPIRLite specimen or taxon concept providers. There could be contexts in which a community decides to make use of TAPIR solely with scan requests for some purpose, or solely with search requests. Why should we try to prevent this?
Thanks for working so hard on all this.
Donald
--------------------------------------------------------------- Donald Hobern (dhobern@gbif.org) Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Mobile: +45-28751483 Fax: +45-35321480 ---------------------------------------------------------------
-----Original Message----- From: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] On Behalf Of Robert Gales Sent: 21 November 2005 23:25 To: Renato De Giovanni Cc: tdwg-tapir@lists.tdwg.org Subject: Re: [tdwg-tapir] TapirLite
Hi Renato,
My comments inline...
Renato De Giovanni wrote:
Hi Rob,
It seems we agree on everything, I just have some quick comments...
Further, say a portal wants to integrate specimen and taxon concept data, but the specimens implement TAPIR and all TCS providers are TAPIRLite. Once again, the developers must write an entirely custom portal.
This could become even more difficult if the TAPIR protocol leaves no room for TapirLite implementations. Then the TCS networks would probably be based on a completely different protocol, and the integration would be more complicated.
...
I'm definitely not arguing of making TAPIR so there is no room for TAPIRLite, but that anything that is not TAPIRLite must at minimum support both search and operations so there is at least a baseline of useful functionality (ping, metadata and capabilities aside) that all clients can take advantage of without requiring a large body of code just to handle the logic of understanding the capabilities and running the risk that there is no interoperability as Marcus has shown concern about. If I missed a similar argument on this list my apologies, I only recall discussions about what filter operations should/should not be required but nothing about TAPIR operations.
Assuming the above was present in the TAPIR protocol and a mixed network of TAPIR/TAPIR Lite implementations, the generic portal's basic logic might look something like the following:
If Provider x is TAPIRLite: Find all query templates that match my output model
As user constructs query, remove those query templates that could not possibly be used (missing concept, etc.)
/* Likely more logic here if > 1 query templates are left
- possibly with user interaction */
Else If Provider x is TAPIR: Ensure x supports my output model
Can assume that search is available
Can assume that operations (exception maybe "in") are supported
/* If client is ambitious, may try to use any query templates defined
- within search capabilities (would be same alg. as TAPIRLite),
but * is optional */
That's an interesting exercise. In this scenario, it seems that the portal is configured to work with one specific output model (could be more than one if we want to complicate things, the user could choose one of them and the portal could know how to make transformations or to dynamically filter providers).
It would certainly complicate things depending on the output models. There are a couple of scenarios I see here. Scenario 1) Portals that support multiple models but of the same data object (ex. both DwC and ABCD) and 2) Portals that support multiple models of conceptually different data objects (ex. DwC and TCS).
In the first case, I see there are potentially three ways to handle such a situation. 1) The portal is customizable by having a plug-in architecture that can handle transformations between two different schemas via XSLT/code. 2) The providers must support both output models. 3) The experimental custom output model mode is used on those providers that support it.
There are benefits and drawbacks to all. In option 1) the portals architecture and configuration is a bit more complicated, but because the transformation can be accomplished by XSLT or code or a combination it can take into account semantic transformations. In option 2) there is a risk that there might be relevant data that cannot be accessed because a provider does not support the output models the portal desires, but this would be safest route. In option 3) potentially might run into some of the problems discussed in Madrid.
In the second case where the portal is integrating different types of information, the portal really has multiple sets of providers that are able to be queried depending on the output models and concepts those providers have available. In the example there is the pool of providers that serve specimen data and the pool that serve taxonomic data. One (probably naive) way this could be handled is that the portal is configured with some sort of mapping that says something like (in English because I haven't really sat down to sketch this out in pseudocode):
If User queries dwc:/ScientificName, also query providers with an output model of TCS using tcs:/DataSet/TaxonName/Simple
Anything that is unmapped would result in queries only being transmitted to those providers that support the output model from which the concept in the filter originated. So dwc:/YearCollected would only query those providers in the pool supporting an output model of DwC and tcs:/DataSet/TaxonConcept/AccordingTo/Simple would only query those providers in the pool supporting an output model of TCS.
What I haven't really thought much about is how to prevent (or what to do in the case it does happen) when someone creates a filter that uses concepts from each output model supported but are unmapped (as above not in terms of TAPIR mappings). Ex. dwc:/YearCollected = 1978 and tcs:/DataSet/TaxonConcept/AccordingTo/Simple = Jones. I guess one way to handle it would be to run each query on its pool of providers then merge them back based on the mappings. For example, with the the above example, the results after merging may contain only those specimens whose scientific name is equal to the TaxonName/Simple of those concepts authored by Jones. (Not sure if that is clear at all).
When you say "Ensure x supports my output model" that actually means "check that x supports custom output models and mapped the necessary concepts, otherwise check that x supports any query template associated to the output model I want", right?
Hmmm...ok, I just took a look at the schema again to refresh my memory. I thought there was a place in the capabilities to define the output models that are supported (either that or I was influenced by the <FullService>/<LiteService> recommendation Marcus made). I see now that the only place this is defined is in the query templates. So the portal would need to iterate through the query templates to find out which ones supported the output model the portal was configured to be interested in. For TAPIR implementations the portal would most likely favor those query templates where there was no filter already existing so it wouldn't have to deal with reconciliation of the filter being defined by the user and the filter within the query template. For a generic client/portal I don't know that supporting custom output models would be the route I would go, but like with TAPIR we could probably leave a hook for it.
By the way, looking at the capabilities response, is there any sense on having a provider that supports the search operation, does not accept custom output models, and does not define any query templates?
I can't see how any provider that fit this description would be useful. If it does not support custom output models and doesn't define any query templates the provider would have no idea into what structure it should be marshaling the response of the query.
Regards,
Renato
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
Anyway, just some more slowly coalescing thoughts bouncing around the gray matter void,
- rob
_______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
Hi Markus and everybody,
So I was thinking why didnt we keep filters and output models entirely separate from each other? Why not request a specific output model in a search and also request a certain filter-parameter template? This could then be reused for inventories where we dont need any output model.
Output models and filters are already separated. The outputModelGroup is made of structure, indexingElement and mapping. What brings output models and filters together are the query templates.
It would probably also look better in capabilities, if the output model is listed separately for searches (see robs portal example) and the query templates (maybe better filter templates?) are a distinct section in capabilities that can be used in inventories ans searches?
I still don't see how separate pre-defined filters can address pre- canned inventory operations. If I understood correctly, Donald wants to be able to use query templates (output model + parameterised filter) associated to inventory operations, not searches. That's something we haven't discussed. A query template for an inventory operation would look different from the one we have for searches.
I think this goes more in the direction of making all operation elements belonging to an abstract substitution group called "operation". Requests could be a choice between any concrete operation or any template operation. In requests a template operation would be specified with an URI that should match one of the available templates advertised by the provider (unless the provider understands the basic underlying operation to dynamically parse and process the template definition).
So operation templates would be defined externally also according to the protocol schema. In principle their body could be a search template or an inventory template (defined by a search template group or an inventory template group - the same structures used to validate and specify standard non-template operations).
This really makes me feel that the current "query templates" (which I'm now calling operation templates in a more generic sense) should not belong to "search" in the capabilities response. So TapirLite providers could say that they do not support search and inventory, but they do support some "operation templates". On the other hand, the same search section under capabilities must provide a clear distinction for providers that do support one or more output models (like DiGIR2) and providers that support dynamic output models, leaving no space for the contradiction pointed in a previous message.
I won't try to sort out more details unless this sounds like a good direction for all of us.
After all changes that we recently proposed it seems we still need more effort to reach another "point of stability". Past experience shows that the process use to work like that, so I think we just need enough patience and more discussions. Hopefully the next point of stability is not so distant...
Best regards, -- Renato
participants (3)
-
"Döring, Markus"
-
Donald Hobern
-
Renato De Giovanni