[Tdwg-guid] Throttling searches [ Scanned for viruses ]

Roger Hyam roger at tdwg.org
Mon Jun 19 10:33:58 CEST 2006


Is this an LSID issue? LSIDs essential provide a binding service between 
an name and one or more web services (we default to HTTP GET bindings). 
It isn't really up to the LSID authority to administer any policies 
regarding the web service but simply to point at it. It is up to the web 
service to do things like throttling, authentication and authorization.

Imagine, for example, that the different services had different 
policies. It may be reasonable not to restrict the getMetadata() calls 
but to restrict the getData() calls.

The use of LSIDs does not create any new problems that weren't there 
with web page scraping - or scraping of any other web service.

Just my thoughts...

Roger


Ricardo Scachetti Pereira wrote:
>     Sally,
>
>     You raised a really important issue that we had not really addressed 
> at the meeting. Thanks for that.
>
>     I would say that we should not constrain the resolution of LSIDs if 
> we expect our LSID infrastructure to work. LSIDs will be the basis of 
> our architecture so we better have good support for that.
>
>     However, that is sure a limiting factor. Also server efficiency will 
> likely vary quite a lot, depending on underlying system optimizations 
> and all.
>
>     So I think that the solution for this problem is in caching LSID 
> responses on the server LSID stack. Basically, after resolving an LSID 
> once, your server should be able to resolve it again and again really 
> quickly, until something on the metadata that is related to that id changes.
>
>     I haven't looked at this aspect of the LSID software stack, but 
> maybe others can say something about it. In any case I'll do some 
> research on it and get back to you.
>
>     Again, thanks for bringing it up.
>
>     Cheers,
>
> Ricardo
>
>
> Sally Hinchcliffe wrote:
>   
>> There are enough discontinuities in IPNI ids that 1,2,3 would quickly 
>> run into the sand. I agree it's not a new problem - I just hate to 
>> think I'm making life easier for the data scrapers
>> Sally
>>
>>
>>   
>>     
>>> It can be a problem but I'm not sure if there is a simple solution ... and how different is the LSID crawler scenario from an http://www.ipni.org/ipni/plantsearch?id= 1,2,3,4,5 ... 9999999 scenario?
>>>
>>> Paul
>>>
>>> -----Original Message-----
>>> From: tdwg-guid-bounces at mailman.nhm.ku.edu
>>> [mailto:tdwg-guid-bounces at mailman.nhm.ku.edu]On Behalf Of Sally
>>> Hinchcliffe
>>> Sent: 15 June 2006 12:08
>>> To: tdwg-guid at mailman.nhm.ku.edu
>>> Subject: [Tdwg-guid] Throttling searches [ Scanned for viruses ]
>>>
>>>
>>> Hi all
>>> another question that has come up here. 
>>>
>>> As discussed at the meeting, we're thinking of providing a complete 
>>> download of all IPNI LSIDs plus a label (name and author, probably) 
>>> which will be available as an annually produced download
>>>
>>> Most people will play nice and just resolve one or two LSIDs as 
>>> required, but by providing a complete list, we're making it very easy 
>>> for someone to write a crawler that hits every LSID in turn and 
>>> basically brings our server to its knees
>>>
>>> Anybody know of a good way of enforcing more polite behaviour? We can 
>>> make the download only available under a data supply agreement that 
>>> includes a clause limiting hit rates, or we could limit by IP address 
>>> (but this would ultimately block out services like Rod's simple 
>>> resolver). I beleive Google's spell checker uses a key which has to 
>>> be passed in as part of the query - obviously we can't do that with 
>>> LSIDs
>>>
>>> Any thoughts? Anyone think this is a problem? 
>>>
>>> Sally
>>> *** Sally Hinchcliffe
>>> *** Computer section, Royal Botanic Gardens, Kew
>>> *** tel: +44 (0)20 8332 5708
>>> *** S.Hinchcliffe at rbgkew.org.uk
>>>
>>>
>>> _______________________________________________
>>> TDWG-GUID mailing list
>>> TDWG-GUID at mailman.nhm.ku.edu
>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>>
>>> _______________________________________________
>>> TDWG-GUID mailing list
>>> TDWG-GUID at mailman.nhm.ku.edu
>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>>     
>>>       
>> *** Sally Hinchcliffe
>> *** Computer section, Royal Botanic Gardens, Kew
>> *** tel: +44 (0)20 8332 5708
>> *** S.Hinchcliffe at rbgkew.org.uk
>>
>>
>> _______________________________________________
>> TDWG-GUID mailing list
>> TDWG-GUID at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>
>>   
>>     
>
>
> _______________________________________________
> TDWG-GUID mailing list
> TDWG-GUID at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>
>   


-- 

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger at tdwg.org
 +44 1578 722782
-------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20060619/ab8f2001/attachment.html 


More information about the tdwg-tag mailing list