[Tdwg-guid] Throttling searches [ Scanned for viruses ]
Dave Vieglais
vieglais at ku.edu
Mon Jun 19 20:28:09 CEST 2006
A somewhat related issue: Does the LSID spec provide guidelines for when a
resolver is not accessible, such as when they are overloaded (I would read the
spec myself but I can't seem to access the OMG site this morning)?
Dave V.
Roger Hyam said the following on 6/19/2006 8:33 PM:
>
> Is this an LSID issue? LSIDs essential provide a binding service between
> an name and one or more web services (we default to HTTP GET bindings).
> It isn't really up to the LSID authority to administer any policies
> regarding the web service but simply to point at it. It is up to the web
> service to do things like throttling, authentication and authorization.
>
> Imagine, for example, that the different services had different
> policies. It may be reasonable not to restrict the getMetadata() calls
> but to restrict the getData() calls.
>
> The use of LSIDs does not create any new problems that weren't there
> with web page scraping - or scraping of any other web service.
>
> Just my thoughts...
>
> Roger
>
>
> Ricardo Scachetti Pereira wrote:
>> Sally,
>>
>> You raised a really important issue that we had not really addressed
>> at the meeting. Thanks for that.
>>
>> I would say that we should not constrain the resolution of LSIDs if
>> we expect our LSID infrastructure to work. LSIDs will be the basis of
>> our architecture so we better have good support for that.
>>
>> However, that is sure a limiting factor. Also server efficiency will
>> likely vary quite a lot, depending on underlying system optimizations
>> and all.
>>
>> So I think that the solution for this problem is in caching LSID
>> responses on the server LSID stack. Basically, after resolving an LSID
>> once, your server should be able to resolve it again and again really
>> quickly, until something on the metadata that is related to that id changes.
>>
>> I haven't looked at this aspect of the LSID software stack, but
>> maybe others can say something about it. In any case I'll do some
>> research on it and get back to you.
>>
>> Again, thanks for bringing it up.
>>
>> Cheers,
>>
>> Ricardo
>>
>>
>> Sally Hinchcliffe wrote:
>>
>>> There are enough discontinuities in IPNI ids that 1,2,3 would quickly
>>> run into the sand. I agree it's not a new problem - I just hate to
>>> think I'm making life easier for the data scrapers
>>> Sally
>>>
>>>
>>>
>>>
>>>> It can be a problem but I'm not sure if there is a simple solution ... and how different is the LSID crawler scenario from an http://www.ipni.org/ipni/plantsearch?id= 1,2,3,4,5 ... 9999999 scenario?
>>>>
>>>> Paul
>>>>
>>>> -----Original Message-----
>>>> From: tdwg-guid-bounces at mailman.nhm.ku.edu
>>>> [mailto:tdwg-guid-bounces at mailman.nhm.ku.edu]On Behalf Of Sally
>>>> Hinchcliffe
>>>> Sent: 15 June 2006 12:08
>>>> To: tdwg-guid at mailman.nhm.ku.edu
>>>> Subject: [Tdwg-guid] Throttling searches [ Scanned for viruses ]
>>>>
>>>>
>>>> Hi all
>>>> another question that has come up here.
>>>>
>>>> As discussed at the meeting, we're thinking of providing a complete
>>>> download of all IPNI LSIDs plus a label (name and author, probably)
>>>> which will be available as an annually produced download
>>>>
>>>> Most people will play nice and just resolve one or two LSIDs as
>>>> required, but by providing a complete list, we're making it very easy
>>>> for someone to write a crawler that hits every LSID in turn and
>>>> basically brings our server to its knees
>>>>
>>>> Anybody know of a good way of enforcing more polite behaviour? We can
>>>> make the download only available under a data supply agreement that
>>>> includes a clause limiting hit rates, or we could limit by IP address
>>>> (but this would ultimately block out services like Rod's simple
>>>> resolver). I beleive Google's spell checker uses a key which has to
>>>> be passed in as part of the query - obviously we can't do that with
>>>> LSIDs
>>>>
>>>> Any thoughts? Anyone think this is a problem?
>>>>
>>>> Sally
>>>> *** Sally Hinchcliffe
>>>> *** Computer section, Royal Botanic Gardens, Kew
>>>> *** tel: +44 (0)20 8332 5708
>>>> *** S.Hinchcliffe at rbgkew.org.uk
>>>>
>>>>
>>>> _______________________________________________
>>>> TDWG-GUID mailing list
>>>> TDWG-GUID at mailman.nhm.ku.edu
>>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>>>
>>>> _______________________________________________
>>>> TDWG-GUID mailing list
>>>> TDWG-GUID at mailman.nhm.ku.edu
>>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>>>
>>>>
>>> *** Sally Hinchcliffe
>>> *** Computer section, Royal Botanic Gardens, Kew
>>> *** tel: +44 (0)20 8332 5708
>>> *** S.Hinchcliffe at rbgkew.org.uk
>>>
>>>
>>> _______________________________________________
>>> TDWG-GUID mailing list
>>> TDWG-GUID at mailman.nhm.ku.edu
>>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> TDWG-GUID mailing list
>> TDWG-GUID at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
>>
>>
>
>
> --
>
> -------------------------------------
> Roger Hyam
> Technical Architect
> Taxonomic Databases Working Group
> -------------------------------------
> http://www.tdwg.org
> roger at tdwg.org
> +44 1578 722782
> -------------------------------------
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> TDWG-GUID mailing list
> TDWG-GUID at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vieglais.vcf
Type: text/x-vcard
Size: 385 bytes
Desc: not available
Url : http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20060620/424516d1/attachment.vcf
More information about the tdwg-tag
mailing list