[Tdwg-guid] Throttling searches [ Scanned for viruses ]

Paul Kirk p.kirk at cabi.org
Thu Jun 15 13:21:46 CEST 2006


It can be a problem but I'm not sure if there is a simple solution ... and how different is the LSID crawler scenario from an http://www.ipni.org/ipni/plantsearch?id= 1,2,3,4,5 ... 9999999 scenario?

Paul

-----Original Message-----
From: tdwg-guid-bounces at mailman.nhm.ku.edu
[mailto:tdwg-guid-bounces at mailman.nhm.ku.edu]On Behalf Of Sally
Hinchcliffe
Sent: 15 June 2006 12:08
To: tdwg-guid at mailman.nhm.ku.edu
Subject: [Tdwg-guid] Throttling searches [ Scanned for viruses ]


Hi all
another question that has come up here. 

As discussed at the meeting, we're thinking of providing a complete 
download of all IPNI LSIDs plus a label (name and author, probably) 
which will be available as an annually produced download

Most people will play nice and just resolve one or two LSIDs as 
required, but by providing a complete list, we're making it very easy 
for someone to write a crawler that hits every LSID in turn and 
basically brings our server to its knees

Anybody know of a good way of enforcing more polite behaviour? We can 
make the download only available under a data supply agreement that 
includes a clause limiting hit rates, or we could limit by IP address 
(but this would ultimately block out services like Rod's simple 
resolver). I beleive Google's spell checker uses a key which has to 
be passed in as part of the query - obviously we can't do that with 
LSIDs

Any thoughts? Anyone think this is a problem? 

Sally
*** Sally Hinchcliffe
*** Computer section, Royal Botanic Gardens, Kew
*** tel: +44 (0)20 8332 5708
*** S.Hinchcliffe at rbgkew.org.uk


_______________________________________________
TDWG-GUID mailing list
TDWG-GUID at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid




More information about the tdwg-tag mailing list