What's even more frustrating is the common method of reading a file line by line causes huge memory leaks.
Here's my findings and solutions: (feel free to correct me if I'm wrong, even though it worked for me)
Common method fopen+fgets fails:
fopen() with fgets() to read line by line on files that contains almost a million lines will cause crazy memory leak. It takes only 10 seconds before it consumes pretty much 100% of the system memory and go into swap.
My solution:
use "head -1000 {file} | tail -1000" is much less memory intensive. The exact number of lines to process varies depending on the system speed. I had it set to 2000 and was running very smoothly.
Garbage Collector fails:
PHP's garbage collector fails to clean up memory after each loop iteration even if unset() is used (or set variables to null). The memory just keep on piling up. Unfortunately "gc_collect_cycles" which forces the garbage collector cycle to run, is only available in PHP 5.3 branch.
Example Code:
for ($i=2000; $i<=1000000; $i+=2000) {
$data = explode("\n", shell_exec("head -$i blah.xml | tail -2000"));
//parse using simplexml
unset($data);
}
$data = explode("\n", shell_exec("head -$i blah.xml | tail -2000"));
//parse using simplexml
unset($data);
}
My Solution
You can FORCE the garbage collector to run by wrapping a process in a function. PHP does clean up memory after each function call. So for the above code example, if re-written, memory will happily hover over 0.5% constantly.
Example Code:
for ($i=2000; $i<=1000000; $i+=2000) {
$data = shell_exec("head -$i blah.xml | tail -2000");
process($data);
unset($data);
}
function process($data) {
$data = explode("\n", $data);
//parse using simplexml
unset($data);
}
$data = shell_exec("head -$i blah.xml | tail -2000");
process($data);
unset($data);
}
function process($data) {
$data = explode("\n", $data);
//parse using simplexml
unset($data);
}