PHP file_exists() Performance
For one of our projects we need a large amount of cheap storage, an NFS mounted NAS suits us. But we also want to load most recently used files from the server’s local disk, so that the notoriously slow NFS protocol doesn’t get the better of us during peak times. We’ll simply run a cron each hour to mirror our most recently used files to the local disk. Think of it as a very simplified reverse proxy cache.
Now, we could store the current location of the files in the database, e.g. file1 is on the nas, file2 is on the nas and the local disk but that would mean more work, and an extra query on the database.
Databases are generally more expensive to scale than other website components, so we are simply going to check if the file exists on the local disk and if it does then serve it from there, and if not then load it from the NAS directory.
This raised the question of whether checking if the file exists on the local disk take too long, especially when checking several million times per day. The answer is no. Using PHP to check whether 10 million different files exist on the local disk takes approximately 40 seconds, that’s about 4 millionths of a second per check which is acceptable for our needs of 100,000 image loads per day.
Here’s the code we used to check:
<?php
$x=0;
$start_time=time();
while($x<10000000)
{
$x++;
}
print "loop without checks took ";
echo time() - $start_time;
print " seconds <br /><br />";
$x=0;
$start_time=TIME();
while($x<10000000)
{
$name=$x . ".wmv";
if(file_exists($name))
{
//do nothing
}
$x++;
}
print "loop with checks took ";
echo time() - $start_time;
print " seconds <br /><br />";
?>
And here was the output from the script:
loop without checks took 1 seconds
loop with checks took 41 seconds
Obviously the next step to increase performance would be to have the “local disk” which caches the files actually be a RAM disk, however we’re many millions of image loads per day away from needing to do that. For now we’ve solved our limited storage issues without taking a performance hit.
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply