Dynamic management and supervision of PHP processes.
PHP-LRPM provides a framework for building all kinds of concurrent PHP applications. Application examples include, but are not limited to: message queue consumers, web crawlers, parallel database replication…
On a surface level, LRPM is similar in function to generic process supervisors like daemontools or supervisord, but is configured through PHP and easily and smoothly integrates with existing PHP codebases.
LRPM provides an easy to use yet versatile framework that can vastly simplify many use cases where dynamically orchestrating PHP CLI SAPI processes through a general-purpose process supervisor (e.g. systemd) or scheduler (e.g. cron) would be cumbersome.
While LRPM could be used to build a "prefork"-style HTTP server, and thus could be used to build web services, this is not the primary motivation behind its design — good specialized frameworks for building HTTP applications in PHP are readily available. The primary motivation behind LRPM is to provide a framework for building PHP CLI SAPI applications using process-level concurrency, in a way that works with many existing PHP libraries and extensions that are not fork safe.
This talk by Giorgio Sironi given at the Dutch PHP Conference in 2017 covers some problems and challenges when implementing long-running PHP processes.
LRPM dynamically gets its configuration from a custom source implemented in PHP (for example, user-provided PHP code might read configuration from a database), checks for changes in the configuration and starts/stops "worker" processes accordingly.
PHP-LRPM is built on top of POSIX process management facilities. PHP-LRPM starts its subprocesses via fork(2)
and subprocesses don’t daemonize.
The operating system signals PHP-LRPM immediately when a process terminates. PHP-LRPM will in turn attempt to restart the terminated process.
PHP-LRPM is able to take advantage of the OPcache PHP extension, allowing worker processes to store compiled bytecode in shared memory, which reduces the total memory usage of the process farm.
On Linux systems, PHP-LRPM is also able to take advantage of the fact that the fork(2)
system call is implemented through the use of copy-on-write pages. This reduces the memory usage of Linux deployments even further.
PHP-LRPM makes minimal assumptions about its use cases. It assumes no more than:
-
that worker processes will need to be provided with some sort of configuration
-
that they will need to run some code when they are started
-
that after initialization they might want to run in an endless loop.
PHP-LRPM uses exponential backoff when attempting to restart a process; some process managers will try several times in quick succession and fail the process until operator intervention.
PHP-LRPM configuration is reactive with regard to both process state changes and configuration changes. The process farm configuration is updated through a user-provided ConfigurationSource
, that can either be polled in order to pull fresh configuration periodically, or block on an event (e.g. receiving a message from some message bus) in order to push fresh configuration in reaction to an event.
It is easy to spread a PHP-LRPM process farm across multiple machines. Add-on package php-lrpm-cluster provides clustering support.
PHP-LRPM is a stable platform that works equally well with existing and legacy PHP codebases and with newly developed code. Concurrent services built on top of PHP-LRPM have been reliably running in production for years.
To achieve a high level of stability, PHP-LRPM by design separates trusted framework code from untrusted user code, by running all user code in sandbox processes.
LRPM is designed to be easily extended without modifying the framework itself. Examples include, but are not limited to:
-
Clustering support (php-lrpm-cluster) is implemented as an extension.
-
The php-symplib synchronous message passing framework, that PHP-LRPM uses internally, can also be used for communication between user processes, or even for workers communicating with the supervisor process.
-
Facilities like detecting memory leaks in user code (LRPM itself is stable in this regard) can easily be implemented by developing specialized worker processes.
-
Currently, LRPM is not intended to run as PID 1, nor was it tested in this role.
-
Using main entry points other than
bin/lrpm
, i.e. usingProcessManager
as a library and instantiating it from custom user programs, is not recommended and not supported.
-
Implement one or more
Worker
classes. ThePHPLRPM\Worker
interface exposes two public methods,start()
andcycle()
. The child process will callstart()
, passing the worker configuration as an argument, and then enter an endless loop, where it callscycle()
, and then checks for a shutdown condition (usually, the termination of the LRPM supervisor process). Backoff should be implemented at the end of thecycle()
method, as we dispatch signal handlers right aftercycle()
returns. Be careful not to run a tight loop at times when cycles are not doing actual work! -
Implement the
ConfigurationSource
class. ThePHPLRPM\ConfigurationSource
interface exposes a public methodloadConfiguration()
, that returns an associative array containing the configuration for all the workers (see example and full documentation in theJob configuration
subsection). -
Run
bin/lrpm
with your configuration class as an argument, e.g.bin/lrpm '\MyNamespace\MyConfigurationSource'
Configuration is a simple associative array of jobs to run. Here’s an example configuration consisting of a single job, with all the mandatory fields described in comments:
$config = [
42 => [ // Unique job id (string or int)
'name' => 'Job 42', // Descriptive name (string)
'workerClass' => '\PHPLRPM\Test\MockWorker', // Class implementing the Worker interface (string)
'mtime' => 1629121362, // Time when this job config was last modified, as UTC Unix timestamp (int)
'workerConfig' => [] // Array of additional configuration specific to the Worker implementation (array)
]
]
Class ConfigurationValidator
is used for config validation internally, and you can also use it to test your ConfigurationSource
implementation.
If a job’s mtime
returned by the ConfigurationSource
is newer than mtime
from previous poll, that job will be restarted with the new configuration.
See example.php
for a full running example with more details.
Minimum number of seconds a process is expected to run; if the process terminates earlier then this, it will be restarted with backoff
Mandatory: no
Default value: 5
Instead of relying on periodic polling of the ConfigurationSource
(default interval between polls is 30 seconds), it is possible to push configuration updates in response to an event. Here’s how:
-
Make the
loadConfiguration()
method block waiting for a configuration change event -
Set the configuration poll interval to 0, e.g.
bin/lrpm --interval=0 '\MyNamespace\MyPushConfigurationSource'
Unless you are sure that your blocking call can get interrupted by a SIGTERM, set a short wait limit, e.g. less than 5 seconds, in order to help the service shut down cleanly.
The LRPM supervisor process installs signal handlers for SIGCHLD (child processes termination notifications), SIGUSR1 (configuration process readiness notification), SIGTERM and SIGINT (shut down request).
The configuration process (the process running user-provided ConfiguratinoSource
class) installs a signal handler for SIGHUP, that will reset the internal configuration poll timer, effectively making LRPM reload configuration immediately.
Worker processes (processes running user-provided Worker
classes) install default signal handlers for SIGTERM and SIGINT. Signal handlers are dispatched between loop cycles, and these default handlers will terminate the Worker.
You can implement and install your own signal handlers inside your Worker implementation, but make sure that your Worker process shuts down cleanly after receiving SIGTERM, otherwise the LRPM supervisor will consider it unresponsive and follow up with a SIGKILL.
If for some reason you need to implement a custom entry point for LRPM, this is possible, but not recommended and not supported.
The code in the entry point runs in the supervisor (parent) process, while Worker
classes run in child processes fork(2)
-ed from the supervisor. Ideally the entry point should do no more than set up the autoloader and run the ProcessManager
. Any open file descriptors apart from stdin/stdout/stderr should be closed before entering the event loop (ProcessManager→run()
). Sharing open sockets between parent and children through fork(2)
is not safe! Worker processes should connect to wherever they need to connect to only after they have been spawned.
For these and other reasons it is recommended to use the provided bin/lrpm
entry point.
It is recommended to run LRPM as a normal system service. Its main process stays in the foreground and logs to stdout and stderr.
For LRPM to be able to listen for control messages, it needs to create a Unix domain socket in the /run/php-lrpm
directory — make sure that this directory is writable by the main LRPM process. As a fallback, LRPM will attempt to create a socket in /run/user/<euid>/php-lrpm
. If a socket cannot be created, LRPM wil run with control messaging disabled.
Place the bin/lrpmctl
tool into your PATH (either by adding vendor/bin
to the PATH, or symlinking lrpmctl
to e.g. /usr/local/bin
) and use it to query the running instance for status, or to restart a process on demand. Type lrpmctl -h
for more detailed usage instructions.
To take advantage of caching precompiled bytecode in shared memory, you need to explicitly enable using the OPcache extension in the CLI SAPI, and make sure that it’s configured to store the cache in shared memory. Minimal recommended config is:
opcache.enable=1 opcache.enable_cli=1 opcache.file_cache_only=0
LRPM was designed for stability and robustness. Many existing 3rd party PHP extensions are not fork safe, and can fail ungracefully in a process-level concurrent application. To work around this, LRPM runs user code in sandbox processes, providing a pristine code execution environment that benefits 3rd party libraries and extensions that rely on the "happy path".
In the following architecture diagram, green rectangles represent sandbox processes for running untrusted user code (including any extensions and drivers specific to user code). Yellow rectangles represent processes running trusted framework code. While it is possible to use the trusted code as a library, this is not recommended, and supporting such use cases is a non-goal.
PHP-LRPM keeps metadata in an associative array. For efficient lookups by PID, a separate index is maintained.
This functionality was offloaded to a generic library Array with Secondary Keys, that wraps a hash map and maintains secondary indexes (similar to how secondary keys in an SQL database work). Implementing this particular collection lead to the creation of Cardinal Collections, a PHP toolkit for building collections.
Included is the lrpmctl
tool, which uses the SyMPLib library to exchange messages with a running instance of LRPM over a Unix domain socket connection. Some examples of messages include getting the status
of all workers (see screenshot above), and requesting a restart
of a worker process.
Wait for children to terminate after sending SIGTERM, follow up with SIGKILL if child doesn’t respond to SIGTERM after some time.
Implemented blocking shutdown loop that makes sure all children are terminated on shutdown, including processes that may be unresponsive.
Made ConfigurationSource
run in a process separate from the supervisor. This is to prevent Worker
processes inheriting sockets opened by ConfigurationSource
code (e.g. persistent database connections). The supervisor process and the config process are using the SyMPLib library to exchange messages over a Unix domain socket connection.