PHP: How to Write a High-Performance Script
Web agency » Digital news » PHP: How to Write a High-Performance Script

PHP: How to Write a High-Performance Script

PHP may well be a script-like language and criticized for being slow, but it remains one of the most widely used web server languages. Over time, the major web companies that use PHP have sought to optimize the engine. The introduction of the concept of Garbage Collector in version 5.3 was a notable advance. But there are many ways to'optimize the performance of a script.

Mainly used within the framework of the simple generation of page, these optimizations, although being able to bring a certain gain, are not visible enough for the user. The optimization of databases, web servers with the use of Nginx, engine optimization with the arrival of HHVM or even HippyVM will allow you to simply speed up the rendering of your pages and optimize the time of answers to your queries in a simpler way. Despite this, we sometimes develop PHP scripts with the aim of making heavy treatment, or web servers can't do anything.

In this post, I will detail three areas of optimization which I recently applied to a script that had to process CSV or XLS files containing a large amount of information. The amount of memory used reached 1GB without worries and could last more than 1/2 hour.

Before I present to you the three PHP notions that can allow you tooptimize execution time as well as the amount of memory taken, know that before complaining about an X language, the algorithmic of your script is responsible for a good part of your processing. C++ may be infinitely faster than PHP, but if your C++ algorithms are bad, it won't immediately solve your problems.

Deallocate its variables, limit memory consumption

Before PHP 5.3 was released, PHP's problem was the excessive memory consumption. Besides, I have often heard that the memory consumption of a PHP script could never be reduced… If that was true for a while, it is fortunately no longer true. This management uses the notion of Garbage Collector that we will see a little further down.

A good habit lost...

In some languages, this was mandatory. In PHP, this notion has been completely forgotten! This little native function unset(), is toa formidable utility. It is equivalent to the free() function in C++ and allows to deallocate and therefore of immediately free the memory used by the variable. The variable is completely destroyed. This initially frees up PHP from unused variables. This has another interest, helping us to better structure our algorithms. Of course, when a method ends the variables are automatically deallocated but you will see below that this allows you to optimize the time the "garbage collector" spends working or simply doing its job in the event that it is deactivated . There is therefore also a gain in terms of speed. And believe me, the GC consumes a lot of resources to free up memory.

As I said in the previous paragraph, at the end of a function or method, PHP removes unused variables. But in the case of a script processing mass data, it is possible for one function to administer the others. As in some languages, if it's a main function, you'll probably tend to store variables back from the function before passing them to other functions. We can quickly end up with a large amount of data. As soon as it is no longer used, there is no point in "carrying" it around, we deallocate it.

Recently, while writing a script where I was extracting a whole file larger than 15 MB, once I had used it, I deleted it and allowed to gain memory !

Understanding the Garbage Collector

It was while researching memory management that I came across a colleague's article. In this article which is now a bit old, Pascal Martin explains the novelty which is no longer one, the Garbage Collector.

What is Garbage Collector?

For those who do not know or who have already heard of it but have never had the opportunity to use it, the Garbage Collector which is translated as "crumb pick-up" in French, low level functionality which at interval X, stops the running process, and performs a scan of the memory allocated by the script to detect which variables are no longer accessible to delete them.

I don't understand, earlier I was told that at the end of a method PHP destroys the variables.
This is not entirely true. In the same way as C or C++ (parent of PHP), at the end of a method, PHP exits and loses the reference it had on the object. In C, only the notion of pointer exists, in C++ the notions of pointer and reference coexist.

Once the reference to the object has been lost, and once the Garbage Collector starts, the latter will determine that the variable is no longer accessible and will destroy it.

This concept is very present in JAVA as well as in other more recent languages ​​and consequently of higher level.

What did the Garbage Collector bring?

The GC provided considerable development flexibility. It is obvious that developing in C is more technical than in Java, PHP or others. Despite this, some facets are now forgotten such as memory management and it is our code that sometimes empathizes. If this brought flexibility in our development, it also slowed down the execution of our programs.

With these tools, what we must not forget is that we can always control the memory in some way. Question of will, or of necessity…

Finally, I think that this type of tools are very practical but they cannot replace the instructions given by the developers. For my part, I find that the C++ notion of shared_ptr is much more interesting and could offer outsized performance gains in the future if Zend decides to use this method in some cases.

And in PHP?

Since PHP 5.3, four functions have been added.

  • gc_enable
  • gc_enabled
  • gc_disable
  • gc_collect_cycles
    I leave you the leisure to read the documentation, but to be quick, they allow you to activate the GC, know if it is active, deactivate it, and manually start the collection.

Pascal Martin published in the post that I linked to you above, a script where he performs a test battery. We can clearly see in the test report that the version without GC reaches enormous memory consumption, even reaching the limit and crashing.

Benchmark

Here is the executed code. This was taken from Pascal Martin's blog and more particularly from his article.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Node {
public insurance $parentNode;
public insurance $childNodes = array();
function Node() {
$ this->nodeValue = str_repeat('0123456789', 128);
}
}
function createRelationship() {
$parent= new Node ();
$child = new Node ();
$parent->childNodes[] = $child;
$child->parentNode = $parent;
}
for ($i = 0; $i 500000; $i++) {
createRelationship();
// This part is executed during the second test
if ($options[“gc_manual”]) {
gc_collect_cycles();
}
}

I perform three tests separately:

  1. I perform this test with the basic options. The Garbage Collector is enabled and PHP runs it on a regular basis. The script has duration 7.55 seconds and the memory used has reached 20.98 MB.
  2. I perform the same test by disabling the Garbage Collector and calling it on each loop turn. The script lasts 3.79 seconds and the memory used peaked at I 244.77.
  3. I perform a third test by disabling the Garbage Collector and never manually collecting. The memory must therefore fill up strongly. The script lasts 4.46 seconds and the memory has reached Go 1.98.

With this test, we can clearly see the importance of memory management. In our example, which is taken to the extreme, we can clearly see the importance. On the one hand, the Garbage Collector does a lot of work on memory consumption, but that will tend to slow down script execution. We can see it very well by comparing the 1st test (with GC) and the 3rd test without management.

Where our performance work is done is on test 2. No automatic memory management, manual management optimized within this script. We manage to accelerate the processing speed by almost twice (7.55s / 3.79s = 1.99). In this way, we also limited our memory to 245 KB against 21 MB for automatic management. That is a coefficient of almost 88 ( (20.98 * 1024) / 244.77 = 87.77).

Conclusion

Although the above example pushes memory management to the limit, this test is intended to show us just how much it can be. important to study memory management in our scripts. The payouts can be impressive in some cases. Processing time, memory and all that goes with it can be saved.

Understand and use references

As I told you a little earlier, PHP implements variable passing by reference, and as I told you above this type of passing is more or less equivalent to passing by pointer.

  • Understanding pass-by-reference
  • Understanding pointer passing
    This is an important part, not only for PHP because this notion existed long before, but for IT. Basically, when you declare a function, the passage is done by copying.

    // Pass by public copy function foo($arg) {…}

    That implies some pros and cons depending on your code. The object is copied, which means that the allocated memory will increase by the weight of the passed object. This may be ridiculous if you pass a boolean, but it may be much more important if you pass an array for example.

1
2
3
4
5
6
7
8
9
10
11
public insurance function foo() {
$a = 1;
bar($a);
threw out $a; //$a is 1
}
public insurance function bar($arg) {
$arg = 2;
}

In this case, we see that the value has not been modified, this may be interesting if we use the variable $a as a base and we never want to touch its value. We could speak here of a notion that does not exist in PHP, of a constant variable (const).

1
2
3
4
5
6
7
8
9
10
11
12
public insurance function foo() {
$a = 1;
$a = bar($a);
threw out $a; //$a is 2
}
public insurance function bar($arg) {
$arg = 2;
return $arg;
}

I'm sure you've already written methods like this. In this case, it is clearly a mistake. Of course, the syntax is perfectly true, but an allocation has been made for the copy of the parameter, and the deallocation of a variable has been made. Expensive memory calls and unnecessary processing time. The form below would be equivalent but much faster.

1
2
3
4
5
6
7
8
9
10
11
public insurance function foo() {
$a = 1;
bar($a);
threw out $a; //$a is 2
}
public insurance function bar(&$arg) {
$arg = 2;
}

Here we carried out exactly the same treatment but it is simply faster and less consuming.

This optimization is very simple to perform and can even make your code easier to read. There are several syntaxes to know to do pass-by-reference.

  1. Parameter passing by reference (which we have just seen)
  2. Function return by reference (not detailed here, because the PHP engine optimizes this part and I did not feel any convincing gain)
  3. Copying a variable by reference, not very useful but formidable. I let you read the PHP documentation which is extremely detailed. You could easily learn things that I omitted 🙂

Other optimization

Thanks to the three optimizations above, you should have no problem speeding up your processing. However, there are other items. I list you the interesting articles on different optimizations that you can do.

  • PHP String Concat vs Array Implode

Conclusion

The three parts that I have just detailed to you constitute a good alternative for my taste to optimize its scripts and avoid restarting developments in a faster language as a compiled language.

I made these three passes on one of my scripts and, among other things, I managed to reduce memory consumption by 3 approximately and achieve a speed gain by 2.

★ ★ ★ ★ ★