[Tdwg-guid] Throttling searches [ Scanned for viruses ]

Mon Jun 19 11:34:15 CEST 2006

How can we pass a token with an LSID?

> 
> I think the only way to throttle in these situations is to have some 
> notion of who the client is and the only way to do that is to have some 
> kind of token exchange over HTTP saying who they are. Basically you have 
> to have some kind of client registration system or you can never 
> distinguish between a call from a new client and a repeat call. The use 
> of IP address is a pain because so many people are now behind some kind 
> of NAT gateway.
> 
> How about this for a plan:
> 
> You could give a degraded services to people who don't pass a token (a 5 
> second delay perhaps) and offer a quicker service to registered users 
> who pass a token (but then perhaps limit the number of calls they make). 
> This would mean you could offer a universal service even to those with 
> naive client software but a better service to those who play nicely. You 
> could also get better stats on who is using the service.
> 
> So there are ways that this could be done. I expect people will come up 
> with a host of different ways. It is outside LSIDs though.
> 
> Roger
> 
> Sally Hinchcliffe wrote:
> > It's not an LSID issue per se, but LSIDs will make it harder to slow 
> > searches down. For instance, Google restricts use of its spell 
> > checker to 1000 a day by use of a key which is passed in with each 
> > request. Obviously this can't be done with LSIDs as then they 
> > wouldn't be the same for each user.
> > The other reason why it's relevant to LSIDs is simply that providing 
> > a list of all relevant IPNI LSIDs (not necessary to the LSID 
> > implementation but a nice to have for caching / lookups for other 
> > systems using our LSIDs) also makes life easier for the datascrapers 
> > to operate
> >
> > Also I thought ... here's a list full of clever people perhaps they 
> > will have some suggestions 
> >
> > Sally
> >
> >   
> >> Is this an LSID issue? LSIDs essential provide a binding service between 
> >> an name and one or more web services (we default to HTTP GET bindings). 
> >> It isn't really up to the LSID authority to administer any policies 
> >> regarding the web service but simply to point at it. It is up to the web 
> >> service to do things like throttling, authentication and authorization.
> >>
> >> Imagine, for example, that the different services had different 
> >> policies. It may be reasonable not to restrict the getMetadata() calls 
> >> but to restrict the getData() calls.
> >>
> >> The use of LSIDs does not create any new problems that weren't there 
> >> with web page scraping - or scraping of any other web service.
> >>
> >> Just my thoughts...
> >>
> >> Roger
> >>
> >>
> >> Ricardo Scachetti Pereira wrote:
> >>     
> >>>     Sally,
> >>>
> >>>     You raised a really important issue that we had not really addressed 
> >>> at the meeting. Thanks for that.
> >>>
> >>>     I would say that we should not constrain the resolution of LSIDs if 
> >>> we expect our LSID infrastructure to work. LSIDs will be the basis of 
> >>> our architecture so we better have good support for that.
> >>>
> >>>     However, that is sure a limiting factor. Also server efficiency will 
> >>> likely vary quite a lot, depending on underlying system optimizations 
> >>> and all.
> >>>
> >>>     So I think that the solution for this problem is in caching LSID 
> >>> responses on the server LSID stack. Basically, after resolving an LSID 
> >>> once, your server should be able to resolve it again and again really 
> >>> quickly, until something on the metadata that is related to that id changes.
> >>>
> >>>     I haven't looked at this aspect of the LSID software stack, but 
> >>> maybe others can say something about it. In any case I'll do some 
> >>> research on it and get back to you.
> >>>
> >>>     Again, thanks for bringing it up.
> >>>
> >>>     Cheers,
> >>>
> >>> Ricardo
> >>>
> >>>
> >>> Sally Hinchcliffe wrote:
> >>>   
> >>>       
> >>>> There are enough discontinuities in IPNI ids that 1,2,3 would quickly 
> >>>> run into the sand. I agree it's not a new problem - I just hate to 
> >>>> think I'm making life easier for the data scrapers
> >>>> Sally
> >>>>
> >>>>
> >>>>   
> >>>>     
> >>>>         
> >>>>> It can be a problem but I'm not sure if there is a simple solution ... and how different is the LSID crawler scenario from an http://www.ipni.org/ipni/plantsearch?id= 1,2,3,4,5 ... 9999999 scenario?
> >>>>>
> >>>>> Paul
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: tdwg-guid-bounces at mailman.nhm.ku.edu
> >>>>> [mailto:tdwg-guid-bounces at mailman.nhm.ku.edu]On Behalf Of Sally
> >>>>> Hinchcliffe
> >>>>> Sent: 15 June 2006 12:08
> >>>>> To: tdwg-guid at mailman.nhm.ku.edu
> >>>>> Subject: [Tdwg-guid] Throttling searches [ Scanned for viruses ]
> >>>>>
> >>>>>
> >>>>> Hi all
> >>>>> another question that has come up here. 
> >>>>>
> >>>>> As discussed at the meeting, we're thinking of providing a complete 
> >>>>> download of all IPNI LSIDs plus a label (name and author, probably) 
> >>>>> which will be available as an annually produced download
> >>>>>
> >>>>> Most people will play nice and just resolve one or two LSIDs as 
> >>>>> required, but by providing a complete list, we're making it very easy 
> >>>>> for someone to write a crawler that hits every LSID in turn and 
> >>>>> basically brings our server to its knees
> >>>>>
> >>>>> Anybody know of a good way of enforcing more polite behaviour? We can 
> >>>>> make the download only available under a data supply agreement that 
> >>>>> includes a clause limiting hit rates, or we could limit by IP address 
> >>>>> (but this would ultimately block out services like Rod's simple 
> >>>>> resolver). I beleive Google's spell checker uses a key which has to 
> >>>>> be passed in as part of the query - obviously we can't do that with 
> >>>>> LSIDs
> >>>>>
> >>>>> Any thoughts? Anyone think this is a problem? 
> >>>>>
> >>>>> Sally
> >>>>> *** Sally Hinchcliffe
> >>>>> *** Computer section, Royal Botanic Gardens, Kew
> >>>>> *** tel: +44 (0)20 8332 5708
> >>>>> *** S.Hinchcliffe at rbgkew.org.uk
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> TDWG-GUID mailing list
> >>>>> TDWG-GUID at mailman.nhm.ku.edu
> >>>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
> >>>>>
> >>>>> _______________________________________________
> >>>>> TDWG-GUID mailing list
> >>>>> TDWG-GUID at mailman.nhm.ku.edu
> >>>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
> >>>>>     
> >>>>>       
> >>>>>           
> >>>> *** Sally Hinchcliffe
> >>>> *** Computer section, Royal Botanic Gardens, Kew
> >>>> *** tel: +44 (0)20 8332 5708
> >>>> *** S.Hinchcliffe at rbgkew.org.uk
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> TDWG-GUID mailing list
> >>>> TDWG-GUID at mailman.nhm.ku.edu
> >>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
> >>>>
> >>>>   
> >>>>     
> >>>>         
> >>> _______________________________________________
> >>> TDWG-GUID mailing list
> >>> TDWG-GUID at mailman.nhm.ku.edu
> >>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
> >>>
> >>>   
> >>>       
> >> -- 
> >>
> >> -------------------------------------
> >>  Roger Hyam
> >>  Technical Architect
> >>  Taxonomic Databases Working Group
> >> -------------------------------------
> >>  http://www.tdwg.org
> >>  roger at tdwg.org
> >>  +44 1578 722782
> >> -------------------------------------
> >>
> >>
> >>     
> >
> > *** Sally Hinchcliffe
> > *** Computer section, Royal Botanic Gardens, Kew
> > *** tel: +44 (0)20 8332 5708
> > *** S.Hinchcliffe at rbgkew.org.uk
> >
> >
> >   
> 
> 
> -- 
> 
> -------------------------------------
>  Roger Hyam
>  Technical Architect
>  Taxonomic Databases Working Group
> -------------------------------------
>  http://www.tdwg.org
>  roger at tdwg.org
>  +44 1578 722782
> -------------------------------------
> 
> 

*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk