I think the only way to throttle in these situations is to have some notion of who the client is and the only way to do that is to have some kind of token exchange over HTTP saying who they are. Basically you have to have some kind of client registration system or you can never distinguish between a call from a new client and a repeat call. The use of IP address is a pain because so many people are now behind some kind of NAT gateway.

How about this for a plan:

You could give a degraded services to people who don't pass a token (a 5 second delay perhaps) and offer a quicker service to registered users who pass a token (but then perhaps limit the number of calls they make). This would mean you could offer a universal service even to those with naive client software but a better service to those who play nicely. You could also get better stats on who is using the service.

So there are ways that this could be done. I expect people will come up with a host of different ways. It is outside LSIDs though.

Roger

Sally Hinchcliffe wrote:

It's not an LSID issue per se, but LSIDs will make it harder to slow 
searches down. For instance, Google restricts use of its spell 
checker to 1000 a day by use of a key which is passed in with each 
request. Obviously this can't be done with LSIDs as then they 
wouldn't be the same for each user.
The other reason why it's relevant to LSIDs is simply that providing 
a list of all relevant IPNI LSIDs (not necessary to the LSID 
implementation but a nice to have for caching / lookups for other 
systems using our LSIDs) also makes life easier for the datascrapers 
to operate

Also I thought ... here's a list full of clever people perhaps they 
will have some suggestions 

Sally

Is this an LSID issue? LSIDs essential provide a binding service between 
an name and one or more web services (we default to HTTP GET bindings). 
It isn't really up to the LSID authority to administer any policies 
regarding the web service but simply to point at it. It is up to the web 
service to do things like throttling, authentication and authorization.

Imagine, for example, that the different services had different 
policies. It may be reasonable not to restrict the getMetadata() calls 
but to restrict the getData() calls.

The use of LSIDs does not create any new problems that weren't there 
with web page scraping - or scraping of any other web service.

Just my thoughts...

Roger


Ricardo Scachetti Pereira wrote:

    Sally,

    You raised a really important issue that we had not really addressed 
at the meeting. Thanks for that.

    I would say that we should not constrain the resolution of LSIDs if 
we expect our LSID infrastructure to work. LSIDs will be the basis of 
our architecture so we better have good support for that.

    However, that is sure a limiting factor. Also server efficiency will 
likely vary quite a lot, depending on underlying system optimizations 
and all.

    So I think that the solution for this problem is in caching LSID 
responses on the server LSID stack. Basically, after resolving an LSID 
once, your server should be able to resolve it again and again really 
quickly, until something on the metadata that is related to that id changes.

    I haven't looked at this aspect of the LSID software stack, but 
maybe others can say something about it. In any case I'll do some 
research on it and get back to you.

    Again, thanks for bringing it up.

    Cheers,

Ricardo


Sally Hinchcliffe wrote:

There are enough discontinuities in IPNI ids that 1,2,3 would quickly 
run into the sand. I agree it's not a new problem - I just hate to 
think I'm making life easier for the data scrapers
Sally

It can be a problem but I'm not sure if there is a simple solution ... and how different is the LSID crawler scenario from an http://www.ipni.org/ipni/plantsearch?id= 1,2,3,4,5 ... 9999999 scenario?

Paul

-----Original Message-----
From: tdwg-guid-bounces@mailman.nhm.ku.edu
[mailto:tdwg-guid-bounces@mailman.nhm.ku.edu]On Behalf Of Sally
Hinchcliffe
Sent: 15 June 2006 12:08
To: tdwg-guid@mailman.nhm.ku.edu
Subject: [Tdwg-guid] Throttling searches [ Scanned for viruses ]

Hi all
another question that has come up here. 

As discussed at the meeting, we're thinking of providing a complete 
download of all IPNI LSIDs plus a label (name and author, probably) 
which will be available as an annually produced download

Most people will play nice and just resolve one or two LSIDs as 
required, but by providing a complete list, we're making it very easy 
for someone to write a crawler that hits every LSID in turn and 
basically brings our server to its knees

Anybody know of a good way of enforcing more polite behaviour? We can 
make the download only available under a data supply agreement that 
includes a clause limiting hit rates, or we could limit by IP address 
(but this would ultimately block out services like Rod's simple 
resolver). I beleive Google's spell checker uses a key which has to 
be passed in as part of the query - obviously we can't do that with 
LSIDs

Any thoughts? Anyone think this is a problem? 

Sally
*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe@rbgkew.org.uk

_______________________________________________
TDWG-GUID mailing list
TDWG-GUID@mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

_______________________________________________
TDWG-GUID mailing list
TDWG-GUID@mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe@rbgkew.org.uk


_______________________________________________
TDWG-GUID mailing list
TDWG-GUID@mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

_______________________________________________
TDWG-GUID mailing list
TDWG-GUID@mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid

-- 

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger@tdwg.org
 +44 1578 722782
-------------------------------------


*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe@rbgkew.org.uk

-- 

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 http://www.tdwg.org
 roger@tdwg.org
 +44 1578 722782
-------------------------------------