The concept of persistent database connections is a question mark for many PHP developers. When exactly is it a wise decision to use them?
by John Coggeshall
Published February 2006
Over the years, in my work as a technical consultant specializing in PHP, I have been asked about the appropriateness of persistent database connections more times than I can remember. After all, many of the most popular database extensions for PHP have functionality to establish such a connection.
To really understand the answer to this question, you need to understand the lifecycle of PHP script execution and how your Web server affects that lifecycle—and that's exactly where I'll begin this recipe.
The most common PHP setup is PHP running as a module within an Apache Web server—generally, Apache 1.3.x (although PHP will run fine in the latest versions of Apache as well)—as long as you're running newer Apache 2.0+ versions in pre-fork mode. The seemingly unrelated fact that PHP must run in pre-fork mode is the framework for understanding the pros and cons of persistent database connections. Not sure what I mean? Let's take a look at the way Apache works.
When a client such as a Web browser connects to a Web site served by Apache, a process is dispatched to handle that request. In Apache 1.3.x, each of these processes (called "child" processes) are mutually independent. That is to say, unlike a thread-based model, each process has a completely independent memory and execution space in the server. This approach has long been considered a benefit to Apache because it makes the entire server incredibly stable—if a particular request causes Apache to crash, it will be far more likely to crash only the particular, individual child that handled the request.
This process is illustrated in Figure 1, where red indicates a segmentation fault caused by a particular HTTP request.
Figure 1 Segmentation fault in action
While this architecture is clearly a benefit for ensuring a stable Web server environment, the side effect of this "sandboxing" of each individual HTTP request is that everything regarding the request is by necessity also isolated from other requests. Because most PHP setups (by recommendation) are done running PHP as a module within Apache (mod_php), PHP scripts running in this pre-fork model themselves are subject to the same mutual isolation of requests. Thus, all variables and resources available within PHP are allocated and destroyed within the context of a single Apache child—and this is where persistent Oracle connections start coming into play.
Because all PHP resources are defined within the context of a single Apache child, when PHP establishes a database connection to an Oracle database the only place that connection exists is within that specific child. Thus, when a executing the same PHP script that makes a database connection concurrently for multiple different clients, each child must establish its own connection. (See Figure 2.)
Figure 1 Concurrent database connections across multiple clients
If each child process is executing the same PHP script, you can imagine the inefficiencies the pre-fork Apache model imposes on PHP as well as Oracle, as theoretically only one database connection was needed for all four children. Furthermore, even with four individual connections to the database using the standard connection methods provided by the Oracle APIs for PHP, this connection is broken and re-established for every single request. Even if each child must have its own connection, re-establishing the connection every request seems quite unnecessary for the vast majority of cases—and thus PHP attempts to address this issue through the notion of persistent database connections. This approach, unfortunately, only solves part of the problem—while persistent connections prevent the connection from being broken on every request, the PHP resource that is persisted across requests still only exists within a single child.
(Hint: If you are curious about how many total connections can occur to your Oracle database, take a look at the MaxClients configuration directive in your Apache Web server. Assuming your PHP scripts make X connections to your database per request, Oracle should be prepared to accept MaxClients * X database connections under peak load.)
With the knowledge of how PHP, Apache, and Oracle interact, you are now equipped to make the most of every single connection to the database, as establishing one can easily be the most expensive operation when working with the database per request.
At first glance, apparently not much can be done to improve performance other than to persist the connection within a single child. However, making this assumption would be a mistake—the trick isn't to optimize PHP, but rather your server architecture itself.
Unless you are running some sort of remote procedure call (RPC) server that handles nothing but RPC requests, your Apache servers are probably doing more then just serving PHP-enabled pages that make use of the database connection. In fact, it's almost guaranteed that for every single "page" request (loading the index.php Web page from the browser), the browser may make 10, 20, or even 30 individual requests to the Web server for images, flash files, or other embedded documents within frames (and so on). Depending on how your servers are configured, this fact can render any benefit of using persistent connections effectively moot. Fine-tuning your Web servers running PHP to make the most of every single database connection request is critical, so let's take a look at the relevant Apache 1.3.x configuration settings: KeepAlive, MaxRequestsPerChild, and MaxSpareServers.
Now that you have an idea about what's going on with the Web server as it relates to PHP and persistent connections, let's take a look at how you can improve things. The basic goal here is to maximize the number of times a persistent connection is used before it is closed implicitly by Apache itself.
To begin, take a look at the MaxRequestsPerChild configuration directive. This is an important directive when working with persistent connections, because it defines, as its name implies, the maximum number of HTTP requests a single Apache child can serve before the process is killed and restarted. While under ideal circumstances this setting is set to zero (meaning each child accepts an unlimited number of connections), if anything running within your Web server is unstable, leaking memory, and so on, this mechanism must be used to keep the server in top shape. Thus, if your children are being killed after X requests, then X-1 is the total number requests by which you can take advantage of a persistent database connection. This is the first directive you should look for, as it has the most direct impact on your persistent connection performance.
Similarly, the MaxSpareServers configuration directive is designed to "throttle back" the number of resources used on your Web server during low load times by killing off unnecessary Apache child processes that are not actually handling requests. Depending on the type of load patterns you experience on your site, Apache—although it does tend to manage this process somewhat intelligently—may very well be killing off children, and hence database connections, prematurely. For instance, for Web servers that experience frequent spikes in traffic, it makes sense to have significantly more spare child processes running, while servers that experience a heavy sustained followed by lulls for extended periods can kill them off to free up resources for other tasks.
Now that we've established the two primary means by which Apache kills off its children, and thus any persistent connections you may have established, let's now take a look at ways to optimize this behavior. While it is not always possible to simply not kill off children (especially if your child processes tend to leak memory over time), properly configuring Apache can make a big difference. Here are a few things you can do to improve performance.
Zend Core for Oracle (which includes a refactoring of the OCI extension for PHP) offers a number of new management tools for making the most of your persistent connections. These tools come in the form of a number of PHP configuration directives that you can use in the php.ini file.
You now have everything you need to make the most of each and every connection made to your Oracle database—and when to use (or not use) persistent connections. While the information here is specific to database connections, any persistent resource following the same pattern can be optimized using the techniques described. John Coggeshall is a Senior Technical Consultant for Zend Technologies where he provides professional services to clients around the world. He got started with PHP in 1997 and is the author of three published books and over 100 articles on PHP technologies with some of the biggest names in the industry such as Sams Publishing, Apress, and O'Reilly. John also is a active contributor to the PHP core as the author of the tidy extension, a member of the Zend Education Advisory Board, and frequent speaker at PHP-related conferences worldwide.