AW: [PyWrapper-devel] [tdwg-tapir] RE: WG: tapir: capabilities
I agree that logging via GUID doesnt help in many cases where the provider wants to know what was searched for.
But searching on a portal-cache to find data from 20 different providers in 1 search and then sending of 20 log requests could also be annoying. Plus the burden of the portal of checking the registry if a providers really wants logging.
The most efficient is probably a portal specific logging as Donald suggests. But then providers would have a hard time agglomerating the logging data from several totaly different portals.
To get a comparable logging across different portals though it seems to me that Renatos suggestions are worth a try. It would definitely need guidlines for portal developers to know when to use a log request and how to use it. How to treat paging and map data are good examples where there is no obvious correct behaviour.
-- Markus
-----Ursprüngliche Nachricht----- Von: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] Im Auftrag von John R. WIECZOREK Gesendet: Montag, 24. Juli 2006 18:28 An: roger@tdwg.org Cc: PyWrapper Developers mailing list; tdwg-tapir@lists.tdwg.org Betreff: Re: [PyWrapper-devel] [tdwg-tapir] RE: WG: tapir: capabilities
Logging that a GUID was used isn't sufficient; it doesn't tell how the data were used. What I'm after is to log the actual query that would have had to go to the provider to produce the results used. The example in your second example shouldn't happen, the query should specify a record limit per provider.
On 7/24/06, Roger Hyam roger@tdwg.org wrote:
I thought this sounded like a good idea but since reading Renato's message I am now confused.
If a user does a search on a portal and gets 100 results and looks at the first 10 does the provider of the 11th record get notified? Their data has been used because it has been given in the count. Another example would be if a portal gave a distribution map to 10km squares based in data from multiple providers. Each data point is made from several suppliers data and removal of any one supplier's data may not change the map. Do we notify them all?
I could envisage a GUID based system just about. The call to the log function would basically say "Some one has accessed the data that I got from you that you tag with this GUID" but I can't see how this would work on a search based system. The log call would mean "Some one searched for something that made used of data I got from a search I did on you once".
So really the only service we need is a GUID based one. Perhaps extending the LSID resolution spec would be more appropriate?
Roger
Renato De Giovanni wrote:
Hi John,
Implementing the log request was never a problem. We discussed about that again during the Madrid meeting, and only after that a feature freeze was suggested. It's true that PyWrapper is being adjusted now to conform to the new specs, and considering that DiGIR2 (or wasabi) postponed implementation of TAPIR, I suppose it should not be a big problem to make additional changes if necessary. The main problem I had with the log request was that it would probably not solve the issue behind it, which is to track usage by data aggregators. I still have the same feeling, and I can easily imagine situations when it would not be easy or even possible to translate searches on top of cached databases to TAPIR requests. But maybe I'm wrong, and if you all think it's a good feature then we can try to include it. However, I do think that providers should be able to advertise as the part of capabilities if they want to receive log requests or not. To me it also sounds like a new operation, especially if it's only related to search. It could make sense for view, inventory and metadata operations. Maybe capabilities too. But it doesn't make sense for ping. Well, maybe it could make sense for ping if the data aggregator monitors provider status and accepts similar requests on top of its results... So, yes, it could be a new attribute "logOnly" as part of the operationRequestGroup with an answer </received> (just after the response header). And we could add an attribute "acceptLogRequests" in the <operations> element in capabilities responses. The other option would be to include a new operation, but maybe it's better to just have it as an optional attribute for all operations. Best Regards, -- Renato On 19 Jul 2006 at 10:32, John R. WIECZOREK wrote:
I appreciate that you will consider
this request. I always thought it would be trivial to implement. Your simulation mode sounds very much like what I had in mind. I hadn't thought it necessary to get a response from a log request, but if there was a simple response, it could be used as a ping, or it could be used to retry logging until the provider did respond. So, something like <log request received>. I think the addition oflogOnly attribute is a good one, and could apply to every request type. Javi, I don't disagree that portals SHOULD log the data usage, especially to cover the situations where a provider doesn't respond. I also think that having the information logged at the provider is a responsible course of action, since they will have immediate access to the usage statistics that way. It will be much easier for a portal builder to send log requests than it will be to build the infrastructure and interfaces to logs, therefore it is more likely to actually get done. On 7/18/06, Javier de la Torre jatorre@mncn.csic.es mailto:jatorre@mncn.csic.es wrote: I am not sure about this, I still think that portals should be gathering this data and making it available for data providers... But in any case if you like it then I agree with MArkus that the best is to include another parameter in the operationRequestGroup. I havent checked but what happens if you do an extension there with an attribute that is implementation specific? A qualified attribute. Will this still validate against our schema? You were discussing about qualification of attributes before no? Javi. On 18/07/2006, at 10:41, Döring, Markus wrote: > John, > all changes going on with TAPIR right now are really only changes > in terminology or removing inconsistencies we did not detect before > we started the documentation and final implementation. > > > But nevertheless I would support your request. Especially from the > implementation side of view this is a trivial change to the code. > So why dont add it? Just some additional thoughts: > > > - Ive added a "simulation" mode already to my code where no SQL > gets executed but just logged. So you can test configurations > without risking sending off killer statements. Thats similar to > logOnly I guess, returning nothing but diagnostics. What would you > suggest to be returned for a logOnly request? just the empty TAPIR > envelope? Nothing? <OK>? > > - would this log-only request not be needed for all requests? at > least for inventories? So it would be easiest to have a new logOnly > parameter in the header or "request element" just after the header? > something like <search logOnly="true"> > > > > -- Markus > > >> -----Ursprüngliche Nachricht----- >> Von: pywrapper-devel-bounces@lists.sourceforge.net >> [mailto:pywrapper-devel-bounces@lists.sourceforge.net ] Im >> Auftrag von John R. WIECZOREK >> Gesendet: Montag, 17. Juli 2006 23:30 >> An: Renato De Giovanni >> Cc: pywrapper-devel@lists.sourceforge.net mailto:pywrapper-devel@lists.sourceforge.net ; tdwg- tapir@lists.tdwg.org >> Betreff: Re: [PyWrapper-devel] [tdwg-tapir] RE: WG: tapir: >> capabilities >> >> A little off topic, but it occurs to me that a great deal of >> work is still ongoing with TAPIR, which suggests to me that >> it may be warranted to re-state my request for a simple >> message type - a log request. This request would be the same >> as a search request, except that the caller doesn't need a >> response. Providers would use this type of request to log >> data usage if the data were retrieved from a cache elsewhere. >> I remember talking about this in Berlin, at which time there >> was supposed to be a feature freeze. Clearly we've gone >> beyond that, so I'm requesting it again. >> >> >> On 7/17/06, Renato De Giovanni renato@cria.org.br mailto:renato@cria.org.br wrote: >> >>Hi, >> >>If I remember well, the "view" operation was re-included in the >>protocol just to handle query templates, especifically >> for TapirLite >>providers. So if someone wants to query a provider using some >>external output model that should be dynamically >> parsed, then the >>"search" operation must be used instead (using either >> XML or simple >>GET request). View operations are really bound to query >> templates, >>and they are not allowed to specify "filter" or >> "partial" parameters. >>-- >>Renato >> >>On 17 Jul 2006 at 21:26, "Döring, Markus" wrote: >> >>> I was just about to edit the schema and realizing >> that output models >>> are only specified for searches. but what about >> views? they use >>> query templates, yes. but only the ones listed in >> capabilities? we >>> should have dynamic ones here as well I think. And >> they link back to >>> static/dynamic models. >>> >>> So should models maybe become a seperate section not tight to >>> search/view operations? I am going to modify the >> schema nevertheless >>> already to accomodate the changes below - ignoring >> views for now. >>> >>> Markus
_______________________________________________ tdwg-tapir mailing list
tdwg-tapir@lists.tdwg.org mailto:tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
--
Roger Hyam Technical Architect Taxonomic Databases Working Group
http://www.tdwg.org http://www.tdwg.org roger@tdwg.org +44 1578 722782
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
Why make this so hard? We're talking about a portal sending out one message to n providers and not having to wait for a response. Right now we would have to send out a message and wait for all of the responses (up to a timeout). This is a net improvement.
Why do we need metadata from providers to know if they want logging requests? We don't ask them if they want metadata requests, or how often. Just send the requests. If they want to log them, they will configure their provider to do so. Otherwise the provider will ignore them.
In the meantime, always log on the portal side. Not only will that give providers with flakey connections a place to see usage statistics, but it will also generate information will be interesting on its own - summary information about how the portal is used that you wouldn't get from the providers.
To me, this usage business is important enough that I would even go so far as to certify portals as being in compliance with meeting this social contract. That way a provider could release access to certified portals and disallow access for those who don't abide by the contract. Remember, the semblance of control is really important to a lot of our providers. If you don't think so, have someone do a survey of existing providers to see if they would want it or not. It would be a sample biased against needing logging (since they are already doing without). If that survey turned up interest in logging anyway, then it's worth doing. My feeling is that it is so easy to implement (if you don't try to get unnecessarily fancy) that it should just be done - it would be easier than conducting a survey about it.
On 7/25/06, "Döring, Markus" m.doering@bgbm.org wrote:
I agree that logging via GUID doesnt help in many cases where the provider wants to know what was searched for.
But searching on a portal-cache to find data from 20 different providers in 1 search and then sending of 20 log requests could also be annoying. Plus the burden of the portal of checking the registry if a providers really wants logging.
The most efficient is probably a portal specific logging as Donald suggests. But then providers would have a hard time agglomerating the logging data from several totaly different portals.
To get a comparable logging across different portals though it seems to me that Renatos suggestions are worth a try. It would definitely need guidlines for portal developers to know when to use a log request and how to use it. How to treat paging and map data are good examples where there is no obvious correct behaviour.
-- Markus
-----Ursprüngliche Nachricht----- Von: tdwg-tapir-bounces@lists.tdwg.org [mailto:tdwg-tapir-bounces@lists.tdwg.org] Im Auftrag von John R. WIECZOREK Gesendet: Montag, 24. Juli 2006 18:28 An: roger@tdwg.org Cc: PyWrapper Developers mailing list; tdwg-tapir@lists.tdwg.org Betreff: Re: [PyWrapper-devel] [tdwg-tapir] RE: WG: tapir: capabilities
Logging that a GUID was used isn't sufficient; it doesn't tell how the data were used. What I'm after is to log the actual query that would have had to go to the provider to produce the results used. The example in your second example shouldn't happen, the query should specify a record limit per provider.
On 7/24/06, Roger Hyam roger@tdwg.org wrote:
I thought this sounded like a good idea but since
reading Renato's message I am now confused.
If a user does a search on a portal and gets 100
results and looks at the first 10 does the provider of the 11th record get notified? Their data has been used because it has been given in the count. Another example would be if a portal gave a distribution map to 10km squares based in data from multiple providers. Each data point is made from several suppliers data and removal of any one supplier's data may not change the map. Do we notify them all?
I could envisage a GUID based system just about. The
call to the log function would basically say "Some one has accessed the data that I got from you that you tag with this GUID" but I can't see how this would work on a search based system. The log call would mean "Some one searched for something that made used of data I got from a search I did on you once".
So really the only service we need is a GUID based one.
Perhaps extending the LSID resolution spec would be more appropriate?
Roger Renato De Giovanni wrote: Hi John, Implementing the log request was never a
problem. We discussed about that again during the Madrid meeting, and only after that a feature freeze was suggested. It's true that PyWrapper is being adjusted now
to conform to the new specs, and considering
that DiGIR2 (or wasabi) postponed implementation of TAPIR, I suppose it should not be a big problem to make additional changes if necessary.
The main problem I had with the log request was
that it would
probably not solve the issue behind it, which
is to track usage by data aggregators. I still have the same feeling, and I can easily imagine situations when it would not be easy or even possible to translate searches on top of cached databases to TAPIR requests.
But maybe I'm wrong, and if you all think it's
a good feature then we can try to include it. However, I do think that providers should be able to advertise as the part of capabilities if they want to receive
log requests or not. To me it also sounds like a new operation,
especially if it's only related to search. It could make sense for view, inventory and metadata operations. Maybe capabilities too. But it doesn't make
sense for ping. Well, maybe it could make sense
for ping if the data aggregator monitors provider status and accepts similar requests on top of its results...
So, yes, it could be a new attribute "logOnly"
as part of the
operationRequestGroup with an answer
</received> (just after the response header). And we could add an attribute "acceptLogRequests" in the <operations> element in capabilities responses. The other
option would be to include a new operation, but
maybe it's better to just have it as an optional attribute for all operations.
Best Regards, -- Renato On 19 Jul 2006 at 10:32, John R. WIECZOREK wrote: I appreciate that you will consider
this request. I always thought it would be trivial to implement. Your simulation mode sounds very much like what I had in mind. I hadn't thought it necessary to get a
response from a log request, but if
there was a simple response, it could be used as a ping, or it could be used to retry logging until the provider did respond. So, something like <log request received>.
I think the addition oflogOnly
attribute is a good one, and could apply to every request type.
Javi, I don't disagree that portals
SHOULD log the data usage, especially to cover the situations where a provider doesn't respond.
I also think that having the
information logged at the provider is a responsible course of action, since they will have immediate access to the usage statistics that way. It will be much easier for a portal
builder to send log requests than it
will be to build the infrastructure and interfaces to logs, therefore it is more likely to actually get done.
On 7/18/06, Javier de la Torre <jatorre@mncn.csic.es>
mailto:jatorre@mncn.csic.es wrote: I am not sure about this,
I still think that portals should
be gathering this data and making it available for data providers...
But in any case if you like it then
I agree with MArkus that the
best is to include another parameter in
the operationRequestGroup.
I havent checked but what happens
if you do an extension there with an attribute that is implementation specific? A qualified
attribute. Will this still validate against
our schema? You were discussing about qualification of attributes before no?
Javi. On 18/07/2006, at 10:41, Döring,
Markus wrote:
> John, > all changes going on with TAPIR
right now are really only changes > in terminology or removing inconsistencies we did not detect before > we started the documentation and final implementation.
> > > But nevertheless I would support
your request. Especially from the > implementation side of view this is a trivial change to the code. > So why dont add it? Just some additional thoughts:
> > > - Ive added a "simulation" mode
already to my code where no SQL > gets executed but just logged. So you can test configurations > without risking sending off killer statements. Thats similar
to > logOnly I guess, returning
nothing but diagnostics. What would you > suggest to be returned for a logOnly request? just the empty TAPIR > envelope? Nothing? <OK>? >
> - would this log-only request not
be needed for all requests? at > least for inventories? So it would be easiest to have a new logOnly > parameter in the header or "request element" just after the
header? > something like <search logOnly="true"> > > > > -- Markus > > >> -----Ursprüngliche Nachricht----- >> Von: pywrapper-devel-bounces@lists.sourceforge.net >>
[mailto:pywrapper-devel-bounces@lists.sourceforge.net ] Im >> Auftrag von John R. WIECZOREK >> Gesendet: Montag, 17. Juli 2006 23:30 >> An: Renato De Giovanni >> Cc: pywrapper-devel@lists.sourceforge.net mailto:pywrapper-devel@lists.sourceforge.net ; tdwg- tapir@lists.tdwg.org
>> Betreff: Re: [PyWrapper-devel]
[tdwg-tapir] RE: WG: tapir: >> capabilities >> >> A little off topic, but it occurs to me that a great deal of >> work is still ongoing with TAPIR, which suggests to me that
>> it may be warranted to re-state
my request for a simple >> message type - a log request. This request would be the same >> as a search request, except that the caller doesn't need a
>> response. Providers would use
this type of request to log >> data usage if the data were retrieved from a cache elsewhere. >> I remember talking about this in Berlin, at which time
there >> was supposed to be a feature
freeze. Clearly we've gone >> beyond that, so I'm requesting it again. >> >> >> On 7/17/06, Renato De Giovanni renato@cria.org.br mailto:renato@cria.org.br wrote: >> >>Hi, >> >>If I remember well, the "view" operation was re-included in the >>protocol just to handle query templates, especifically
>> for TapirLite >>providers. So if someone wants to
query a provider using some >>external output model that should be dynamically >> parsed, then the >>"search" operation must be used instead (using either
>> XML or simple >>GET request). View operations are
really bound to query >> templates, >>and they are not allowed to specify "filter" or >> "partial" parameters.
>>-- >>Renato >> >>On 17 Jul 2006 at 21:26, "Döring,
Markus" wrote: >> >>> I was just about to edit the schema and realizing
>> that output models >>> are only specified for
searches. but what about >> views? they use >>> query templates, yes. but only the ones listed in >> capabilities? we
>>> should have dynamic ones here
as well I think. And >> they link back to >>> static/dynamic models. >>> >>> So should models maybe become a seperate section not tight
to >>> search/view operations? I am
going to modify the >> schema nevertheless >>> already to accomodate the changes below - ignoring >> views for now.
>>> >>> Markus _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org
mailto:tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
-- ------------------------------------- Roger Hyam Technical Architect Taxonomic Databases Working Group ------------------------------------- http://www.tdwg.org <http://www.tdwg.org> roger@tdwg.org +44 1578 722782 ------------------------------------- _______________________________________________ tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
tdwg-tapir mailing list tdwg-tapir@lists.tdwg.org http://lists.tdwg.org/mailman/listinfo/tdwg-tapir
participants (2)
-
"Döring, Markus"
-
John R. WIECZOREK