[Tdwg-guid] Throttling searches [ Scanned for viruses ]

Kevin Richards RichardsK at landcareresearch.co.nz
Mon Jun 19 23:59:38 CEST 2006


The LSID framework has provision for basic security (HTTP credentials),
so this could be used for identifying end users and controlling the
degree of their activity.
Kevin

>>> Roger Hyam <roger at tdwg.org> 19/06/2006 9:30 p.m. >>>


I think the only way to throttle in these situations is to have some
notion of who the client is and the only way to do that is to have some
kind of token exchange over HTTP saying who they are. Basically you have
to have some kind of client registration system or you can never
distinguish between a call from a new client and a repeat call. The use
of IP address is a pain because so many people are now behind some kind
of NAT gateway.

How about this for a plan:

You could give a degraded services to people who don't pass a token (a
5 second delay perhaps) and offer a quicker service to registered users
who pass a token (but then perhaps limit the number of calls they make).
This would mean you could offer a universal service even to those with
naive client software but a better service to those who play nicely. You
could also get better stats on who is using the service.

So there are ways that this could be done. I expect people will come up
with a host of different ways. It is outside LSIDs though.

Roger

Sally Hinchcliffe wrote: It's not an LSID issue per se, but LSIDs will
make it harder to slow searches down. For instance, Google restricts use
of its spell checker to 1000 a day by use of a key which is passed in
with each request. Obviously this can't be done with LSIDs as then they
wouldn't be the same for each user.The other reason why it's relevant to
LSIDs is simply that providing a list of all relevant IPNI LSIDs (not
necessary to the LSID implementation but a nice to have for caching /
lookups for other systems using our LSIDs) also makes life easier for
the datascrapers to operateAlso I thought ... here's a list full of
clever people perhaps they will have some suggestions Sally  Is this an
LSID issue? LSIDs essential provide a binding service between an name
and one or more web services (we default to HTTP GET bindings). It isn't
really up to the LSID authority to administer any policies regarding the
web service but simply to point at it. It is up to the web service to do
things like throttling, authentication and authorization.Imagine, for
example, that the different services had different policies. It may be
reasonable not to restrict the getMetadata() calls but to restrict the
getData() calls.The use of LSIDs does not create any new problems that
weren't there with web page scraping - or scraping of any other web
service.Just my thoughts...RogerRicardo Scachetti Pereira wrote:       
Sally,    You raised a really important issue that we had not really
addressed at the meeting. Thanks for that.    I would say that we should
not constrain the resolution of LSIDs if we expect our LSID
infrastructure to work. LSIDs will be the basis of our architecture so
we better have good support for that.    However, that is sure a
limiting factor. Also server efficiency will likely vary quite a lot,
depending on underlying system optimizations and all.    So I think that
the solution for this problem is in caching LSID responses on the server
LSID stack. Basically, after resolving an LSID once, your server should
be able to resolve it again and again really quickly, until something on
the metadata that is related to that id changes.    I haven't looked at
this aspect of the LSID software stack, but maybe others can say
something about it. In any case I'll do some research on it and get back
to you.    Again, thanks for bringing it up.    Cheers,RicardoSally
Hinchcliffe wrote:        There are enough discontinuities in IPNI ids
that 1,2,3 would quickly run into the sand. I agree it's not a new
problem - I just hate to think I'm making life easier for the data
scrapersSally              It can be a problem but I'm not sure if there
is a simple solution ... and how different is the LSID crawler scenario
from an http://www.ipni.org/ipni/plantsearch?id= 1,2,3,4,5 ... 9999999
scenario?Paul-----Original Message-----From:
tdwg-guid-bounces at mailman.nhm.ku.edu[mailto:tdwg-guid-bounces at mailman.nhm.ku.edu]On
Behalf Of SallyHinchcliffeSent: 15 June 2006 12:08To:
tdwg-guid at mailman.nhm.ku.eduSubject: [Tdwg-guid] Throttling searches [
Scanned for viruses ]Hi allanother question that has come up here. As
discussed at the meeting, we're thinking of providing a complete
download of all IPNI LSIDs plus a label (name and author, probably)
which will be available as an annually produced downloadMost people will
play nice and just resolve one or two LSIDs as required, but by
providing a complete list, we're making it very easy for someone to
write a crawler that hits every LSID in turn and basically brings our
server to its kneesAnybody know of a good way of enforcing more polite
behaviour? We can make the download only available under a data supply
agreement that includes a clause limiting hit rates, or we could limit
by IP address (but this would ultimately block out services like Rod's
simple resolver). I beleive Google's spell checker uses a key which has
to be passed in as part of the query - obviously we can't do that with
LSIDsAny thoughts? Anyone think this is a problem? Sally*** Sally
Hinchcliffe*** Computer section, Royal Botanic Gardens, Kew*** tel: +44
(0)20 8332 5708***
S.Hinchcliffe at rbgkew.org.uk_______________________________________________TDWG-GUID
mailing
listTDWG-GUID at mailman.nhm.ku.eduhttp://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid_______________________________________________TDWG-GUID
mailing
listTDWG-GUID at mailman.nhm.ku.eduhttp://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
                   *** Sally Hinchcliffe*** Computer section, Royal
Botanic Gardens, Kew*** tel: +44 (0)20 8332 5708***
S.Hinchcliffe at rbgkew.org.uk_______________________________________________TDWG-GUID
mailing
listTDWG-GUID at mailman.nhm.ku.eduhttp://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
             _______________________________________________TDWG-GUID
mailing
listTDWG-GUID at mailman.nhm.ku.eduhttp://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid
       -- ------------------------------------- Roger Hyam Technical
Architect Taxonomic Databases Working
Group------------------------------------- http://www.tdwg.org
roger at tdwg.org +44 1578 722782-------------------------------------   
*** Sally Hinchcliffe*** Computer section, Royal Botanic Gardens, Kew***
tel: +44 (0)20 8332 5708*** S.Hinchcliffe at rbgkew.org.uk  

-- ------------------------------------- Roger Hyam Technical Architect
Taxonomic Databases Working Group-------------------------------------
http://www.tdwg.org roger at tdwg.org +44 1578
722782-------------------------------------

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
WARNING: This email and any attachments may be confidential and/or
privileged. They are intended for the addressee only and are not to be read,
used, copied or disseminated by anyone receiving them in error.  If you are
not the intended recipient, please notify the sender by return email and
delete this message and any attachments.

The views expressed in this email are those of the sender and do not
necessarily reflect the official views of Landcare Research.  

Landcare Research
http://www.landcareresearch.co.nz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tdwg.org/pipermail/tdwg-tag/attachments/20060620/95f8db67/attachment.html 


More information about the tdwg-tag mailing list