[Tdwg-guid] Throttling searches [ Scanned for viruses ]

Dave Vieglais vieglais at ku.edu
Mon Jun 19 20:28:09 CEST 2006


A somewhat related issue: Does the LSID spec provide guidelines for when a
resolver is not accessible, such as when they are overloaded (I would read the
spec myself but I can't seem to access the OMG site this morning)?

Dave V.

Roger Hyam said the following on 6/19/2006 8:33 PM:
> 
> Is this an LSID issue? LSIDs essential provide a binding service between
> an name and one or more web services (we default to HTTP GET bindings).
> It isn't really up to the LSID authority to administer any policies
> regarding the web service but simply to point at it. It is up to the web
> service to do things like throttling, authentication and authorization.
> 
> Imagine, for example, that the different services had different
> policies. It may be reasonable not to restrict the getMetadata() calls
> but to restrict the getData() calls.
> 
> The use of LSIDs does not create any new problems that weren't there
> with web page scraping - or scraping of any other web service.
> 
> Just my thoughts...
> 
> Roger
> 
> 
> Ricardo Scachetti Pereira wrote:
>>     Sally,
>>
>>     You raised a really important issue that we had not really addressed 
>> at the meeting. Thanks for that.
>>
>>     I would say that we should not constrain the resolution of LSIDs if 
>> we expect our LSID infrastructure to work. LSIDs will be the basis of 
>> our architecture so we better have good support for that.
>>
>>     However, that is sure a limiting factor. Also server efficiency will 
>> likely vary quite a lot, depending on underlying system optimizations 
>> and all.
>>
>>     So I think that the solution for this problem is in caching LSID 
>> responses on the server LSID stack. Basically, after resolving an LSID 
>> once, your server should be able to resolve it again and again really 
>> quickly, until something on the metadata that is related to that id changes.
>>
>>     I haven't looked at this aspect of the LSID software stack, but 
>> maybe others can say something about it. In any case I'll do some 
>> research on it and get back to you.
>>
>>     Again, thanks for bringing it up.
>>
>>     Cheers,
>>
>> Ricardo
>>
>>
>> Sally Hinchcliffe wrote:
>>   
>>> There are enough discontinuities in IPNI ids that 1,2,3 would quickly 
>>> run into the sand. I agree it's not a new problem - I just hate to 
>>> think I'm making life easier for the data scrapers
>>> Sally
>>>
>>>
>>>   
>>>     
>>>> It can be a problem but I'm not sure if there is a simple solution ... and how different is the LSID crawler scenario from an http://www.ipni.org/ipni/plantsearch?id= 1,2,3,4,5 ... 9999999 scenario?
>>>>
>>>> Paul
>>>>
>>>> -----Original Message-----
>>>> From: tdwg-guid-bounces at mailman.nhm.ku.edu
>>>> [mailto:tdwg-guid-bounces at mailman.nhm.ku.edu]On Behalf Of Sally
>>>> Hinchcliffe
>>>> Sent: 15 June 2006 12:08
>>>> To: tdwg-guid at mailman.nhm.ku.edu
>>>> Subject: [Tdwg-guid] Throttling searches [ Scanned for viruses ]
>>>>
>>>>
>>>> Hi all
>>>> another question that has come up here. 
>>>>
>>>> As discussed at the meeting, we're thinking of providing a complete 
>>>> download of all IPNI LSIDs plus a label (name and author, probably) 
>>>> which will be available as an annually produced download
>>>>
>>>> Most people will play nice and just resolve one or two LSIDs as 
>>>> required, but by providing a complete list, we're making it very easy 
>>>> for someone to write a crawler that hits every LSID in turn and 
>>>> basically brings our server to its knees
>>>>
>>>> Anybody know of a good way of enforcing more polite behaviour? We can 
>>>> make the download only available under a data supply agreement that 
>>>> includes a clause limiting hit rates, or we could limit by IP address 
>>>> (but this would ultimately block out services like Rod's simple 
>>>> resolver). I beleive Google's spell checker uses a key which has to 
>>>> be passed in as part of the query - obviously we can't do that with 
>>>> LSIDs
>>>>
>>>> Any thoughts? Anyone think this is a problem? 
>>>>
>>>> Sally
>>>> *** Sally Hinchcliffe
>>>> *** Computer section, Royal Botanic Gardens, Kew
>>>> *** tel: +44 (0)20 8332 5708
>>>> *** S.Hinchcliffe at rbgkew.org.uk
>>>>
>>>>
>>>> _______________________________________________
>>>> TDWG-GUID mailing list
>>>> TDWG-GUID at mailman.nhm.ku.edu
>>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>>>
>>>> _______________________________________________
>>>> TDWG-GUID mailing list
>>>> TDWG-GUID at mailman.nhm.ku.edu
>>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>>>     
>>>>       
>>> *** Sally Hinchcliffe
>>> *** Computer section, Royal Botanic Gardens, Kew
>>> *** tel: +44 (0)20 8332 5708
>>> *** S.Hinchcliffe at rbgkew.org.uk
>>>
>>>
>>> _______________________________________________
>>> TDWG-GUID mailing list
>>> TDWG-GUID at mailman.nhm.ku.edu
>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>>
>>>   
>>>     
>>
>>
>> _______________________________________________
>> TDWG-GUID mailing list
>> TDWG-GUID at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>
>>   
> 
> 
> -- 
> 
> -------------------------------------
>  Roger Hyam
>  Technical Architect
>  Taxonomic Databases Working Group
> -------------------------------------
>  http://www.tdwg.org
>  roger at tdwg.org
>  +44 1578 722782
> -------------------------------------
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> TDWG-GUID mailing list
> TDWG-GUID at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vieglais.vcf
Type: text/x-vcard
Size: 385 bytes
Desc: not available
Url : http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20060620/424516d1/attachment.vcf 


More information about the tdwg-tag mailing list