Re: [tdwg-tapir] TAPIRLink and memory

1 Aug 2007

      Hi Roger,

I do think it's worth adding your changes related to the "fopen" 
workaround. Please don't hesitate.

I should say that I'm actually surprised that TapirLink requires only 
10M with such a complex RDF output model. I wonder how many records 
were being returned in your request?

I also don't have much experience with profiling, but I'm sure 
there's room for improvements since I didn't pay much attention to 
optimization. By the way, the main new feature for the next version 
will be caching. I'm expecting significant improvements in 
performance since query templates, output models, and response 
structures will be all cached by default as serialized PHP. When 
TapirLink can use cached content, I suppose this will also reduce 
memory use. However, the first time it receives a particular output 
model in a request, then it will require the same memory if we don't 
make additional optimizations and if we want it to run below the 8M 
limit.

Anyway, I also wonder how many people and organizations will need to 
run a TAPIR provider software under the conditions you described 
(external ISP with such a low memory use limit). I still never heard 
of any case in our community (maybe someone from GBIF could give us a 
better picture?).

Best Regards,
--
Renato

On 1 Aug 2007 at 14:11, Roger Hyam wrote:
...
Hi All
I spent a couple of hours this morning adapting TAPIRLink so that it  
uses a work around of fopen() because many ISPs will not support  
opening remote files (there is a php config option to stop it).
Anyhow I got past this and found that my output model still wouldn't  
run on my ISP account (Easyspace.com) but would run on my local  
machine. After a while I found that it was running out of memory.
My ISP limits memory to 8meg per running script. That seems pretty  
tight until you imagine having a hundred scripts running  
simultaneously. It was the default setting prior to php 5.2 when it  
jumped to 128meg! This may explain why ISPs are very slow to migrate  
to PHP5.*
The TAPIRLink request was using almost 10meg under PHP 5.2 according  
to memory_get_peak_usage()  to parse the rather complex  
TaxonOccurrence output model. There is no peak usage method on  
earlier PHP versions. Ten meg seems quite reasonable considering the  
cost of RAM these days - but there you have it.
Anyhow I am nervous because this means that deployers might need to  
mess with php.ini to get scripts running which means shared servers  
may be problematic for deployments that use complex output models.  
Basically it won't run everywhere php is available but only where php  
in a certain config is available. It also means that if you are being  
crawled by a 10 threaded robot you will be using close to 100meg to  
service the requests plus memory allocation and deallocation etc.
It is all seems pretty trivial if you have a newer machine and even  
more so if you install PHP 5.2+ on it but it does mean that that old  
departmental webserver that has an old install of PHP on may not run  
the RDF based output models out of the box.
Never done any profiling with PHP and wouldn't like to get into it. I  
guess Python will have a similar memory footprint as it is doing a  
similar job but the install scenario is different for PyWrapper - you  
really need shell access.
It may not be worth adding the fopen() work round if the deployment  
environment requires access to php.ini.
Would be grateful for your thoughts on this.
Roger

Re: [tdwg-tapir] TAPIRLink and memory

Renato De Giovanni