<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
Hi Chuck,<br>
<br>
The 'stack' I have in my mind (which isn't really a stack but a
'slider' because they don't wrap each other) goes like this:<br>
<ol>
  <li>Resolution = Have identifier get object = LSID (and to a lesser
extent URL for things we don't care about so much like logos).</li>
  <li>Harvest = Give me what has changed since... = OAI? ( This is
still to be fully investigated but is a half way house to a full blown
query language).</li>
  <li>Query = Ask whatever you like = BioCASe, TAPIR, DiGIR or SPARQL. </li>
</ol>
We have unified on 1. We don't disagree on 2 but it may not be
necessary. We are moving towards unifying 3 by unifying the vocabulary
used by all the protocols. Different query protocols will probably
always be needed for different purposes but the query terms should map
to the same place. RDF will figure large going forward as it is the
default return type for LSID and so we need to be able to express all
our objects in it. We may also need to express them in other ways such
as GML.<br>
<br>
Appendix B of the TAG-1 report summarizes the current technology.<br>
<br>
<a class="moz-txt-link-freetext" href="http://wiki.tdwg.org/twiki/pub/TAG/TagMeeting1Report/TAG-1_Report_Final.pdf">http://wiki.tdwg.org/twiki/pub/TAG/TagMeeting1Report/TAG-1_Report_Final.pdf</a><br>
<br>
Hope this helps,<br>
<br>
Roger<br>
<br>
<br>
Chuck Miller wrote:
<blockquote
 cite="mid769697AE3E25EF4FBC0763CD91AB1B02DA8C45@MBGMail01.mobot.org"
 type="cite">
  <pre wrap="">Sounds to me that we have a multi-layer communications protocol stack in development here, but we aren't spelling out the layers very well.  Discussing LSID in the context of biodiversity systems/databases without a clear definition of the necessary underlying layers is confusing me.  
 
Can someone do a more expanded elucidation of the complete LSID/RDF protocol stack?  What exactly are we proposing to standardize on besides just the syntax of an LSID.
 
Chuck

________________________________

From: Roger Hyam [<a class="moz-txt-link-freetext" href="mailto:roger@tdwg.org">mailto:roger@tdwg.org</a>]
Sent: Mon 6/19/2006 9:37 AM
To: David Remsen
Cc: <a class="moz-txt-link-abbreviated" href="mailto:tdwg-guid@mailman.nhm.ku.edu">tdwg-guid@mailman.nhm.ku.edu</a>
Subject: Re: [Tdwg-guid] Throttling searches



Yes it would be violating the LSID ethos to use the version number as a different version number means a different LSID - also what would happen if the LSID already had a version number? Really this stuff is not to do with the LSID 'layer' at all - it is the web services the LSIDs resolve to. There may be all sorts of authentication and authorization wrapped round the web services and we don't want to go trying to leaver that into the GUID technology - in my opinion.

Roger


David Remsen wrote: 


        We do some of this already with our web services.  SOAP methods required a keycode.  We use the code so we have a contact in case we need to send a message out as well as to provide a better accounting to sources of how we pass on their content.  Patrick (uBio programmer and nice guy) asked why not use the LSID version number as a way to pass a token.  If it's not passed you can fall back to one level of processing else give it the extra special treatment with the userID.   Or is this violating something sacred in the LSID ethos?

        David Remsen


        On Jun 19, 2006, at 6:07 AM, Roger Hyam wrote:



                You don't! The LSID resolves to the binding to the getMetadata() method - which is a plain old fashioned URL. At this point the LSID authority has done its duty and we are just on a plain HTTP GET call so you can do whatever you can do with any regular HTTP GET. You could stipulate another header field or (more simply) give priority service for those who append a valid user id to the URL (&amp;user_id=12345)
                
                So there is no throttle on resolving the LSID to the getMetadata binding (which is cheap) but there is a throttle on the actual call to get the metadata method. Really you need to do this because bad people may be able to tell from the URL how to scrape the source and bypass the LSID resolver after the first call anyhow. This is especially true if the URL contains the IPNI record ID which is likely.
                
                Here is an example using Rod's tester.
                
                <a class="moz-txt-link-freetext" href="http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:ubio.org:namebank:11815">http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/?q=urn:lsid:ubio.org:namebank:11815</a> <a class="moz-txt-link-rfc2396E" href="http://linnaeus.zoology.gla.ac.uk/%7Erpage/lsid/tester/?q=urn:lsid:ubio.org:namebank:11815">&lt;http://linnaeus.zoology.gla.ac.uk/%7Erpage/lsid/tester/?q=urn:lsid:ubio.org:namebank:11815&gt;</a> 
                
                The getMetadata() method for this LSID:
                
                 urn:lsid:ubio.org:namebank:11815
                
                Is bound to this URL:
                
                <a class="moz-txt-link-freetext" href="http://names.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:11815">http://names.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:11815</a>
                
                So ubio would just have to give preferential services to calls like this:
                
                <a class="moz-txt-link-freetext" href="http://names.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:11815&user_id=rogerhyam1392918790">http://names.ubio.org/authority/metadata.php?lsid=urn:lsid:ubio.org:namebank:11815&amp;user_id=rogerhyam1392918790</a>
                
                If rogerhyam had paid his membership fees this year.
                
                Does this make sense?
                
                Roger
                p.s. You could do this on the web pages as well with a clever little thing to write dynamic tokens into the links so that it doesn't degrade the regular browsing experience and only stops scrapers - but that is beyond my remit at the moment ;)
                
                p.p.s. You could wrap this in https if you were paranoid about people stealing tokens - but this is highly unlikely I believe.
                
                Sally Hinchcliffe wrote: 

                        How can we pass a token with an LSID?
                        
                        
                          

                                I think the only way to throttle in these situations is to have some 
                                notion of who the client is and the only way to do that is to have some 
                                kind of token exchange over HTTP saying who they are. Basically you have 
                                to have some kind of client registration system or you can never 
                                distinguish between a call from a new client and a repeat call. The use 
                                of IP address is a pain because so many people are now behind some kind 
                                of NAT gateway.
                                
                                How about this for a plan:
                                
                                You could give a degraded services to people who don't pass a token (a 5 
                                second delay perhaps) and offer a quicker service to registered users 
                                who pass a token (but then perhaps limit the number of calls they make). 
                                This would mean you could offer a universal service even to those with 
                                naive client software but a better service to those who play nicely. You 
                                could also get better stats on who is using the service.
                                
                                So there are ways that this could be done. I expect people will come up 
                                with a host of different ways. It is outside LSIDs though.
                                
                                Roger
                                
                                Sally Hinchcliffe wrote:
                                    

                                        It's not an LSID issue per se, but LSIDs will make it harder to slow 
                                        searches down. For instance, Google restricts use of its spell 
                                        checker to 1000 a day by use of a key which is passed in with each 
                                        request. Obviously this can't be done with LSIDs as then they 
                                        wouldn't be the same for each user.
                                        The other reason why it's relevant to LSIDs is simply that providing 
                                        a list of all relevant IPNI LSIDs (not necessary to the LSID 
                                        implementation but a nice to have for caching / lookups for other 
                                        systems using our LSIDs) also makes life easier for the datascrapers 
                                        to operate
                                        
                                        Also I thought ... here's a list full of clever people perhaps they 
                                        will have some suggestions 
                                        
                                        Sally
                                        
                                          
                                              

                                        Is this an LSID issue? LSIDs essential provide a binding service between 
                                        an name and one or more web services (we default to HTTP GET bindings). 
                                        It isn't really up to the LSID authority to administer any policies 
                                        regarding the web service but simply to point at it. It is up to the web 
                                        service to do things like throttling, authentication and authorization.
                                        
                                        Imagine, for example, that the different services had different 
                                        policies. It may be reasonable not to restrict the getMetadata() calls 
                                        but to restrict the getData() calls.
                                        
                                        The use of LSIDs does not create any new problems that weren't there 
                                        with web page scraping - or scraping of any other web service.
                                        
                                        Just my thoughts...
                                        
                                        Roger
                                        
                                        
                                        Ricardo Scachetti Pereira wrote:
                                            
                                                

                                            Sally,
                                        
                                            You raised a really important issue that we had not really addressed 
                                        at the meeting. Thanks for that.
                                        
                                            I would say that we should not constrain the resolution of LSIDs if 
                                        we expect our LSID infrastructure to work. LSIDs will be the basis of 
                                        our architecture so we better have good support for that.
                                        
                                            However, that is sure a limiting factor. Also server efficiency will 
                                        likely vary quite a lot, depending on underlying system optimizations 
                                        and all.
                                        
                                            So I think that the solution for this problem is in caching LSID 
                                        responses on the server LSID stack. Basically, after resolving an LSID 
                                        once, your server should be able to resolve it again and again really 
                                        quickly, until something on the metadata that is related to that id changes.
                                        
                                            I haven't looked at this aspect of the LSID software stack, but 
                                        maybe others can say something about it. In any case I'll do some 
                                        research on it and get back to you.
                                        
                                            Again, thanks for bringing it up.
                                        
                                            Cheers,
                                        
                                        Ricardo
                                        
                                        
                                        Sally Hinchcliffe wrote:
                                          
                                              
                                                  

                                        There are enough discontinuities in IPNI ids that 1,2,3 would quickly 
                                        run into the sand. I agree it's not a new problem - I just hate to 
                                        think I'm making life easier for the data scrapers
                                        Sally
                                        
                                        
                                          
                                            
                                                
                                                    

                                        It can be a problem but I'm not sure if there is a simple solution ... and how different is the LSID crawler scenario from an <a class="moz-txt-link-freetext" href="http://www.ipni.org/ipni/plantsearch?id=">http://www.ipni.org/ipni/plantsearch?id=</a> 1,2,3,4,5 ... 9999999 scenario?
                                        
                                        Paul
                                        
                                        -----Original Message-----
                                        From: <a class="moz-txt-link-abbreviated" href="mailto:tdwg-guid-bounces@mailman.nhm.ku.edu">tdwg-guid-bounces@mailman.nhm.ku.edu</a>
                                        [<a class="moz-txt-link-freetext" href="mailto:tdwg-guid-bounces@mailman.nhm.ku.edu">mailto:tdwg-guid-bounces@mailman.nhm.ku.edu</a>]On Behalf Of Sally
                                        Hinchcliffe
                                        Sent: 15 June 2006 12:08
                                        To: <a class="moz-txt-link-abbreviated" href="mailto:tdwg-guid@mailman.nhm.ku.edu">tdwg-guid@mailman.nhm.ku.edu</a>
                                        Subject: [Tdwg-guid] Throttling searches [ Scanned for viruses ]
                                        
                                        
                                        Hi all
                                        another question that has come up here. 
                                        
                                        As discussed at the meeting, we're thinking of providing a complete 
                                        download of all IPNI LSIDs plus a label (name and author, probably) 
                                        which will be available as an annually produced download
                                        
                                        Most people will play nice and just resolve one or two LSIDs as 
                                        required, but by providing a complete list, we're making it very easy 
                                        for someone to write a crawler that hits every LSID in turn and 
                                        basically brings our server to its knees
                                        
                                        Anybody know of a good way of enforcing more polite behaviour? We can 
                                        make the download only available under a data supply agreement that 
                                        includes a clause limiting hit rates, or we could limit by IP address 
                                        (but this would ultimately block out services like Rod's simple 
                                        resolver). I beleive Google's spell checker uses a key which has to 
                                        be passed in as part of the query - obviously we can't do that with 
                                        LSIDs
                                        
                                        Any thoughts? Anyone think this is a problem? 
                                        
                                        Sally
                                        *** Sally Hinchcliffe
                                        *** Computer section, Royal Botanic Gardens, Kew
                                        *** tel: +44 (0)20 8332 5708
                                        *** <a class="moz-txt-link-abbreviated" href="mailto:S.Hinchcliffe@rbgkew.org.uk">S.Hinchcliffe@rbgkew.org.uk</a>
                                        
                                        
                                        _______________________________________________
                                        TDWG-GUID mailing list
                                        <a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>
                                        <a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>
                                        
                                        _______________________________________________
                                        TDWG-GUID mailing list
                                        <a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>
                                        <a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>
                                            
                                              
                                                  
                                                      

                                        *** Sally Hinchcliffe
                                        *** Computer section, Royal Botanic Gardens, Kew
                                        *** tel: +44 (0)20 8332 5708
                                        *** <a class="moz-txt-link-abbreviated" href="mailto:S.Hinchcliffe@rbgkew.org.uk">S.Hinchcliffe@rbgkew.org.uk</a>
                                        
                                        
                                        _______________________________________________
                                        TDWG-GUID mailing list
                                        <a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>
                                        <a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>
                                        
                                          
                                            
                                                
                                                    

                                        _______________________________________________
                                        TDWG-GUID mailing list
                                        <a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>
                                        <a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>
                                        
                                          
                                              
                                                  

                                        -- 
                                        
                                        -------------------------------------
                                         Roger Hyam
                                         Technical Architect
                                         Taxonomic Databases Working Group
                                        -------------------------------------
                                         <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a> <a class="moz-txt-link-rfc2396E" href="http://www.tdwg.org/">&lt;http://www.tdwg.org/&gt;</a> 
                                         <a class="moz-txt-link-abbreviated" href="mailto:roger@tdwg.org">roger@tdwg.org</a>
                                         +44 1578 722782
                                        -------------------------------------
                                        
                                        
                                            
                                                

                                        *** Sally Hinchcliffe
                                        *** Computer section, Royal Botanic Gardens, Kew
                                        *** tel: +44 (0)20 8332 5708
                                        *** <a class="moz-txt-link-abbreviated" href="mailto:S.Hinchcliffe@rbgkew.org.uk">S.Hinchcliffe@rbgkew.org.uk</a>
                                        
                                        
                                          
                                              

                                -- 
                                
                                -------------------------------------
                                 Roger Hyam
                                 Technical Architect
                                 Taxonomic Databases Working Group
                                -------------------------------------
                                 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a> <a class="moz-txt-link-rfc2396E" href="http://www.tdwg.org/">&lt;http://www.tdwg.org/&gt;</a> 
                                 <a class="moz-txt-link-abbreviated" href="mailto:roger@tdwg.org">roger@tdwg.org</a>
                                 +44 1578 722782
                                -------------------------------------
                                
                                
                                    

                        *** Sally Hinchcliffe
                        *** Computer section, Royal Botanic Gardens, Kew
                        *** tel: +44 (0)20 8332 5708
                        *** <a class="moz-txt-link-abbreviated" href="mailto:S.Hinchcliffe@rbgkew.org.uk">S.Hinchcliffe@rbgkew.org.uk</a>
                        
                        
                          



                -- 
                
                -------------------------------------
                 Roger Hyam
                 Technical Architect
                 Taxonomic Databases Working Group
                -------------------------------------
                 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a> <a class="moz-txt-link-rfc2396E" href="http://www.tdwg.org/">&lt;http://www.tdwg.org/&gt;</a> 
                 <a class="moz-txt-link-abbreviated" href="mailto:roger@tdwg.org">roger@tdwg.org</a>
                 +44 1578 722782
                -------------------------------------
                    
                _______________________________________________
                TDWG-GUID mailing list
                <a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>
                <a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>


        _______________________________________________

        David Remsen

        uBio Project Manager

        Marine Biological Laboratory

        Woods Hole, MA 02543

        508-289-7632



        
________________________________


        _______________________________________________
        TDWG-GUID mailing list
        <a class="moz-txt-link-abbreviated" href="mailto:TDWG-GUID@mailman.nhm.ku.edu">TDWG-GUID@mailman.nhm.ku.edu</a>
        <a class="moz-txt-link-freetext" href="http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid">http://mailman.nhm.ku.edu/mailman/listinfo/tdwg-guid</a>
          



  </pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">-- 

-------------------------------------
 Roger Hyam
 Technical Architect
 Taxonomic Databases Working Group
-------------------------------------
 <a class="moz-txt-link-freetext" href="http://www.tdwg.org">http://www.tdwg.org</a>
 <a class="moz-txt-link-abbreviated" href="mailto:roger@tdwg.org">roger@tdwg.org</a>
 +44 1578 722782
-------------------------------------
</pre>
</body>
</html>