<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<br>

You don't! The LSID resolves to the binding to the getMetadata() method

- which is a plain old fashioned URL. At this point the LSID authority

has done its duty and we are just on a plain HTTP GET call so you can

do whatever you can do with any regular HTTP GET. You could stipulate

another header field or (more simply) give priority service for those

who append a valid user id to the URL (&amp;user_id=12345)<br>

<br>

So there is no throttle on resolving the LSID to the getMetadata

binding (which is cheap) but there is a throttle on the actual call to

get the metadata method. Really you need to do this because bad people

may be able to tell from the URL how to scrape the source and bypass

the LSID resolver after the first call anyhow. This is especially true

if the URL contains the IPNI record ID which is likely.<br>

<br>

Here is an example using Rod's tester.<br>

<br>

<a class="moz-txt-link-freetext" href="http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:ubio.org:namebank:11815">http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:ubio.org:namebank:11815</a><br>

<br>

The getMetadata() method for this LSID:<br>

<br>

&nbsp;urn:lsid:ubio.org:namebank:11815<br>

<br>

Is bound to this URL:<br>

<br>

<a class="moz-txt-link-freetext" href="http://names.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:11815">http://names.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:11815</a><br>

<br>

So ubio would just have to give preferential services to calls like

this:<br>

<br>

<a class="moz-txt-link-freetext" href="http://names.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:11815&user_id=rogerhyam1392918790">http://names.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:11815&amp;user_id=rogerhyam1392918790</a><br>

<br>

If rogerhyam had paid his membership fees this year.<br>

<br>

Does this make sense?<br>

<br>

Roger<br>

p.s. You could do this on the web pages as well with a clever little

thing to write dynamic tokens into the links so that it doesn't degrade

the regular browsing experience and only stops scrapers - but that is

beyond my remit at the moment ;)<br>

<br>

p.p.s. You could wrap this in https if you were paranoid about people

stealing tokens - but this is highly unlikely I believe.<br>

<br>

Sally Hinchcliffe wrote:

<blockquote cite="mid44967DA7.11065.77BC38@localhost" type="cite">

  <pre wrap="">How can we pass a token with an LSID?

  </pre>

  <blockquote type="cite">

    <pre wrap="">I think the only way to throttle in these situations is to have some 

notion of who the client is and the only way to do that is to have some 

kind of token exchange over HTTP saying who they are. Basically you have 

to have some kind of client registration system or you can never 

distinguish between a call from a new client and a repeat call. The use 

of IP address is a pain because so many people are now behind some kind 

of NAT gateway.

How about this for a plan:

You could give a degraded services to people who don't pass a token (a 5 

second delay perhaps) and offer a quicker service to registered users 

who pass a token (but then perhaps limit the number of calls they make). 

This would mean you could offer a universal service even to those with 

naive client software but a better service to those who play nicely. You 

could also get better stats on who is using the service.

So there are ways that this could be done. I expect people will come up 

with a host of different ways. It is outside LSIDs though.

Roger

Sally Hinchcliffe wrote:

    </pre>

    <blockquote type="cite">

      <pre wrap="">It's not an LSID issue per se, but LSIDs will make it harder to slow 

searches down. For instance, Google restricts use of its spell 

checker to 1000 a day by use of a key which is passed in with each 

request. Obviously this can't be done with LSIDs as then they 

wouldn't be the same for each user.

The other reason why it's relevant to LSIDs is simply that providing 

a list of all relevant IPNI LSIDs (not necessary to the LSID 

implementation but a nice to have for caching / lookups for other 

systems using our LSIDs) also makes life easier for the datascrapers 

to operate

Also I thought ... here's a list full of clever people perhaps they 

will have some suggestions 

Sally

      </pre>

      <blockquote type="cite">

        <pre wrap="">Is this an LSID issue? LSIDs essential provide a binding service between 

an name and one or more web services (we default to HTTP GET bindings). 

It isn't really up to the LSID authority to administer any policies 

regarding the web service but simply to point at it. It is up to the web 

service to do things like throttling, authentication and authorization.

Imagine, for example, that the different services had different 

policies. It may be reasonable not to restrict the getMetadata() calls 

but to restrict the getData() calls.

The use of LSIDs does not create any new problems that weren't there 

with web page scraping - or scraping of any other web service.

Just my thoughts...

Roger

Ricardo Scachetti Pereira wrote:

        </pre>

        <blockquote type="cite">

          <pre wrap="">    Sally,

    You raised a really important issue that we had not really addressed 

at the meeting. Thanks for that.

    I would say that we should not constrain the resolution of LSIDs if 

we expect our LSID infrastructure to work. LSIDs will be the basis of 

our architecture so we better have good support for that.

    However, that is sure a limiting factor. Also server efficiency will 

likely vary quite a lot, depending on underlying system optimizations 

and all.

    So I think that the solution for this problem is in caching LSID 

responses on the server LSID stack. Basically, after resolving an LSID 

once, your server should be able to resolve it again and again really 

quickly, until something on the metadata that is related to that id changes.

    I haven't looked at this aspect of the LSID software stack, but 

maybe others can say something about it. In any case I'll do some 

research on it and get back to you.

    Again, thanks for bringing it up.

    Cheers,

Ricardo

Sally Hinchcliffe wrote:

          </pre>

          <blockquote type="cite">

            <pre wrap="">There are enough discontinuities in IPNI ids that 1,2,3 would quickly 

run into the sand. I agree it's not a new problem - I just hate to 

think I'm making life easier for the data scrapers

Sally

            </pre>

            <blockquote type="cite">

              <pre wrap="">It can be a problem but I'm not sure if there is a simple solution ... and how different is the LSID crawler scenario from an <a class="moz-txt-link-freetext" href="http://www.ipni.org/ipni/plantsearch?id=">http://www.ipni.org/ipni/plantsearch?id=</a> 1,2,3,4,5 ... 9999999 scenario?

Paul

-----Original Message-----

From: <a class="moz-txt-link-abbreviated" href="mailto:tdwg-guid-bounces@mailman.nhm.ku.edu">tdwg-guid-bounces@mailman.nhm.ku.edu</a>

[<a class="moz-txt-link-freetext" href="mailto:tdwg-guid-bounces@mailman.nhm.ku.edu">mailto:tdwg-guid-bounces@mailman.nhm.ku.edu</a>]On Behalf Of Sally

Hinchcliffe

Sent: 15 June 2006 12:08

To: <a class="moz-txt-link-abbreviated" href="mailto:tdwg-guid@mailman.nhm.ku.edu">tdwg-guid@mailman.nhm.ku.edu</a>

Subject: [Tdwg-guid] Throttling searches [ Scanned for viruses ]

Hi all

another question that has come up here. 

As discussed at the meeting, we're thinking of providing a complete 

download of all IPNI LSIDs plus a label (name and author, probably) 

which will be available as an annually produced download

Most people will play nice and just resolve one or two LSIDs as 

required, but by providing a complete list, we're making it very easy 

for someone to write a crawler that hits every LSID in turn and 

basically brings our server to its knees

Anybody know of a good way of enforcing more polite behaviour? We can 

make the download only available under a data supply agreement that 

includes a clause limiting hit rates, or we could limit by IP address 

(but this would ultimately block out services like Rod's simple 

resolver). I beleive Google's spell checker uses a key which has to 

be passed in as part of the query - obviously we can't do that with 

LSIDs

Any thoughts? Anyone think this is a problem? 

Sally

*** Sally Hinchcliffe

*** Computer section, Royal Botanic Gardens, Kew

*** tel: +44 (0)20 8332 5708

*** <a class="moz-txt-link-abbreviated" href="mailto:S.Hinchcliffe@rbgkew.org.uk">S.Hinchcliffe@rbgkew.org.uk</a>

_______________________________________________

TDWG-GUID mailing list

<a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>

<a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>

_______________________________________________

TDWG-GUID mailing list

<a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>

<a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>

              </pre>

            </blockquote>

            <pre wrap="">*** Sally Hinchcliffe

*** Computer section, Royal Botanic Gardens, Kew

*** tel: +44 (0)20 8332 5708

*** <a class="moz-txt-link-abbreviated" href="mailto:S.Hinchcliffe@rbgkew.org.uk">S.Hinchcliffe@rbgkew.org.uk</a>

_______________________________________________

TDWG-GUID mailing list

<a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>

<a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>

            </pre>

          </blockquote>

          <pre wrap="">_______________________________________________

TDWG-GUID mailing list

<a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>

<a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>

          </pre>

        </blockquote>

        <pre wrap="">-- 

-------------------------------------

 Roger Hyam

 Technical Architect

 Taxonomic Databases Working Group

-------------------------------------

 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>

 <a class="moz-txt-link-abbreviated" href="mailto:roger@tdwg.org">roger@tdwg.org</a>

 +44 1578 722782

-------------------------------------

        </pre>

      </blockquote>

      <pre wrap="">*** Sally Hinchcliffe

*** Computer section, Royal Botanic Gardens, Kew

*** tel: +44 (0)20 8332 5708

*** <a class="moz-txt-link-abbreviated" href="mailto:S.Hinchcliffe@rbgkew.org.uk">S.Hinchcliffe@rbgkew.org.uk</a>

      </pre>

    </blockquote>

    <pre wrap="">

-- 

-------------------------------------

 Roger Hyam

 Technical Architect

 Taxonomic Databases Working Group

-------------------------------------

 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>

 <a class="moz-txt-link-abbreviated" href="mailto:roger@tdwg.org">roger@tdwg.org</a>

 +44 1578 722782

-------------------------------------

    </pre>

  </blockquote>

  <pre wrap=""><!---->

*** Sally Hinchcliffe

*** Computer section, Royal Botanic Gardens, Kew

*** tel: +44 (0)20 8332 5708

*** <a class="moz-txt-link-abbreviated" href="mailto:S.Hinchcliffe@rbgkew.org.uk">S.Hinchcliffe@rbgkew.org.uk</a>

  </pre>

</blockquote>

<br>

<br>

<pre class="moz-signature" cols="72">-- 

-------------------------------------

 Roger Hyam

 Technical Architect

 Taxonomic Databases Working Group

-------------------------------------

 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>

 <a class="moz-txt-link-abbreviated" href="mailto:roger@tdwg.org">roger@tdwg.org</a>

 +44 1578 722782

-------------------------------------

</pre>

</body>

</html>