Re: [tdwg-tapir] TapirLite

21 Nov 2005

      Hi Renato,
My comments inline...
Renato De Giovanni wrote:
...
Hi Rob,
It seems we agree on everything, I just have some quick comments...
...
Further, say a
portal wants to integrate specimen and taxon concept data, but the
specimens implement TAPIR and all TCS providers are TAPIRLite.  Once
again, the developers must write an entirely custom portal.
This could become even more difficult if the TAPIR protocol leaves no 
room for TapirLite implementations. Then the TCS networks would 
probably be based on a completely different protocol, and the 
integration would be more complicated.
...
I'm definitely not arguing of making TAPIR so there is no room for 
TAPIRLite, but that anything that is not TAPIRLite must at minimum 
support both search and operations so there is at least a baseline of 
useful functionality (ping, metadata and capabilities aside) that all 
clients can take advantage of without requiring a large body of code 
just to handle the logic of understanding the capabilities and running 
the risk that there is no interoperability as Marcus has shown concern 
about.  If I missed a similar argument on this list my apologies, I only 
recall discussions about what filter operations should/should not be 
required but nothing about TAPIR operations.
...
...
Assuming the above was present in the TAPIR protocol and a mixed
network of TAPIR/TAPIR Lite implementations, the generic portal's
basic logic might look something like the following:
If Provider x is TAPIRLite:
  Find all query templates that match my output model
As user constructs query, remove those query templates that could
  not possibly be used (missing concept, etc.)
/* Likely more logic here if > 1 query templates are left

possibly with user interaction */

Else If Provider x is TAPIR:
  Ensure x supports my output model
Can assume that search is available
Can assume that operations (exception maybe "in") are supported
/* If client is ambitious, may try to use any query templates
  defined

within search capabilities (would be same alg. as TAPIRLite),

but * is optional */
That's an interesting exercise. In this scenario, it seems that the 
portal is configured to work with one specific output model (could be 
more than one if we want to complicate things, the user could choose 
one of them and the portal could know how to make transformations or 
to dynamically filter providers).
It would certainly complicate things depending on the output models. 
There are a couple of scenarios I see here.  Scenario 1)  Portals that 
support multiple models but of the same data object (ex. both DwC and 
ABCD) and 2)  Portals that support multiple models of conceptually 
different data objects (ex. DwC and TCS).
In the first case, I see there are potentially three ways to handle such 
a situation.  1)  The portal is customizable by having a plug-in 
architecture that can handle transformations between two different 
schemas via XSLT/code.  2)  The providers must support both output 
models.  3)  The experimental custom output model mode is used on those 
providers that support it.
There are benefits and drawbacks to all.  In option 1)  the portals 
architecture and configuration is a bit more complicated, but because 
the transformation can be accomplished by XSLT or code or a combination 
it can take into account semantic transformations.  In option 2)  there 
is a risk that there might be relevant data that cannot be accessed 
because a provider does not support the output models the portal 
desires, but this would be safest route.  In option 3)  potentially 
might run into some of the problems discussed in Madrid.
In the second case where the portal is integrating different types of 
information, the portal really has multiple sets of providers that are 
able to be queried depending on the output models and concepts those 
providers have available.  In the example there is the pool of providers 
that serve specimen data and the pool that serve taxonomic data.  One 
(probably naive) way this could be handled is that the portal is 
configured with some sort of mapping that says something like (in 
English because I haven't really sat down to sketch this out in pseudocode):
If User queries dwc:/ScientificName, also query providers with an output 
model of TCS using tcs:/DataSet/TaxonName/Simple
Anything that is unmapped would result in queries only being transmitted 
to those providers that support the output model from which the concept 
in the filter originated.  So dwc:/YearCollected would only query those 
providers in the pool supporting an output model of DwC and 
tcs:/DataSet/TaxonConcept/AccordingTo/Simple would only query those 
providers in the pool supporting an output model of TCS.
What I haven't really thought much about is how to prevent (or what to 
do in the case it does happen) when someone creates a filter that uses 
concepts from each output model supported but are unmapped (as above not 
in terms of TAPIR mappings).  Ex.  dwc:/YearCollected = 1978 and 
tcs:/DataSet/TaxonConcept/AccordingTo/Simple = Jones.  I guess one way 
to handle it would be to run each query on its pool of providers then 
merge them back based on the mappings.  For example, with the the above 
example, the results after merging may contain only those specimens 
whose scientific name is equal to the TaxonName/Simple of those concepts 
authored by Jones.  (Not sure if that is clear at all).
...
When you say "Ensure x supports my output model" that actually means 
"check that x supports custom output models and mapped the necessary 
concepts, otherwise check that x supports any query template 
associated to the output model I want", right?
Hmmm...ok, I just took a look at the schema again to refresh my memory. 
  I thought there was a place in the capabilities to define the output 
models that are supported (either that or I was influenced by the 
<FullService>/<LiteService> recommendation Marcus made).  I see now that 
the only place this is defined is in the query templates.  So the portal 
would need to iterate through the query templates to find out which ones 
supported the output model the portal was configured to be interested 
in.  For TAPIR implementations the portal would most likely favor those 
query templates where there was no filter already existing so it 
wouldn't have to deal with reconciliation of the filter being defined by 
the user and the filter within the query template.  For a generic 
client/portal I don't know that supporting custom output models would be 
the route I would go, but like with TAPIR we could probably leave a hook 
for it.
...
By the way, looking at the capabilities response, is there any sense 
on having a provider that supports the search operation, does not 
accept custom output models, and does not define any query templates?
I can't see how any provider that fit this description would be useful. 
  If it does not support custom output models and doesn't define any 
query templates the provider would have no idea into what structure it 
should be marshaling the response of the query.
...
Regards,
Renato

tdwg-tapir mailing list
tdwg-tapir@lists.tdwg.org
http://lists.tdwg.org/mailman/listinfo/tdwg-tapir_lists.tdwg.org
Anyway, just some more slowly coalescing thoughts bouncing around the 
gray matter void,
- rob

Re: [tdwg-tapir] TapirLite

Robert Gales

Regards,