We have tried multi-threading in PHP to speed up execution time; the results are good, but far from perfect. Is there another way we can improve PHP's performance?
In the previous post, we gave an overview of 1BRC, tried to push the limits of PHP when discussing performance optimization, and ran our best PHP script on an EC2 instance.
The results were not bad, but not noteworthy either: 17.0636 seconds (the fastest Java code took 1.535 seconds).
So what are we supposed to do? Call it a day and get on with our lives? No, obviously not!
We could "cheat" our way to a better score, by abusing one of Python's winning strategies: letting external libraries do the heavy lifting job!
Foreign Function Interface
One of the ways to optimize an interpreted language is by moving the slow operations in an external module, usually written in a low-level language.
In PHP you can write system-wide modules and enable them in php.ini
; this is useful for generic functions or for code that is not specific to one application.
Since version 7.4 PHP introduced a new feature: Foreign Function Interface (FFI).
FFI is a method for calling external libraries in your PHP coding, without changing global PHP configuration.
This method is more flexible than dealing with modules, but configuring it could be a bit daunting at first.
Let's try to wrap a Rust solution of 1BRC in a PHP script (yes, ok, we are definitely cheating).
The Rust solution
To keep things simple we need a Rust solution that:
- it's fast
- it's written in a clear way
- it's composed of a few files
There's no need to explain point 1
; points 2
and 3
are needed because we are going to modify the code to make it work as a module.
I love Rust, but I'm not a Rust programmer, so the simpler the code the better.
I choose the solution written by Flavio Bizzarri https://github.com/newfla/1brc_rust
Compiling Rust module
First of all, we clone the repository, then we edit the Cargo.toml
file to add some options:
[profile.release]
lto = true
strip = true
panic = "abort"
debug = false
opt-level = 3
codegen-units = 1
[lib]
crate-type = ["cdylib", "lib"]
bench = false
In the [profile.release]
section, we enabled additional performance optimizations (debug
, opt-level
, codegen-units
); we added the [lib]
section, where we specify that we want to compile the source as a cdylib
library (shared libraries that can be linked into external programs).
main.rs
file is used just to call adv::process
; we remove this file and add a run()
method in lib.rs
:
#[no_mangle]
pub extern "C" fn run(filename: *const c_char) -> *mut c_char {
let c_str = unsafe {
assert!(!filename.is_null());
CStr::from_ptr(filename)
};
let path = c_str.to_str().unwrap().to_string();
adv::process(path)
}
#[no_mangle]
disables the mangle (in short: it keeps the function's name in the exported library) and marks this function as "to export".
We are cheating, but in a responsible way ๐
: from PHP code, we pass the weather data filename to the Rust module. Then the Rust module returns the station's aggregated data to be displayed.
PHP is a loosely-typed language, while Rust is a strongly-typed language, so moving data between the two can be a bit of a challenge-in-the-challenge. We need libc
crate and ffi::CStr
from std
.
The code needed to convert from PHP String to Rust string slice has been taken from "The Rust FFI Omnibus"; using it's words:
Getting a Rust string slice (&str) requires a few steps:
We need to make sure that the C pointer is not
NULL
as Rust references are not allowed to beNULL
.Use
std::ffi::CStr
to wrap the pointer.CStr
will compute the string's length based on the terminatingNULL
. This requires anunsafe
block as we will be dereferencing a raw pointer, which the Rust compiler cannot verify meets all the safety guarantees so the programmer must do it instead.Ensure the C string is valid UTF-8 and convert it to a Rust string slice.
Use the string slice.
In adv.rs
we use this code:
let json_string = CString::new(serde_json::to_string(&cities).unwrap()).unwrap();
json_string.into_raw()
to return a JSON string to the PHP script.
That's it for Rust; we can compile the library with:
cargo build --release
The PHP script
On the PHP side first of all we need a class to manage the input and output of th Rust module. Let's create a file called libonebrc.php
:
<?php
final class LibOneBrc {
private static $ffi = null;
public function __construct() {
if (is_null(self::$ffi)) {
self::$ffi = FFI::cdef("char* run(const char* str);", "rust/libonebrc.so");
}
}
public function run($filename) {
$resultPtr = self::$ffi->run($filename);
return FFI::string($resultPtr);
}
}
The constructor's code uses FFI::cdef()
to import the Rust function from the rust/libonebrc.so
file.
Here we have to declare the extern function's signature using C code, so the Rust c_char
parameters, become char*
.
NOTE: it's also possible to use a .h
header file to specify the function(s) that PHP needs to know about; since we only need one simple function, it is easier to declare it inline in PHP code.
The run()
method invokes the run
method of the Rust module (self::$ffi->run($filename)
). We called both this wrapper method and the Rust function with the same name (run()
); this is only a coincidence (...or lack of fantasy); it's not mandatory.
FFI::string
converts the pointer to a String usable in PHP.
We also need an index.php
file to instantiate this LibOneBrc
class and to print the results:
<?php
require_once "libonebrc.php";
$libOneBrc = new LibOneBrc();
$filename = "./rust/measurements.txt";
$result = json_decode($libOneBrc->run($filename));
echo "{" . PHP_EOL;
$isFirstRow = true;
foreach ($result as $key => $value) {
if (!$isFirstRow) {
echo "," . PHP_EOL;
} else {
$isFirstRow = false;
}
echo "\t" . $value->city . '=' . $value->max / 10 . '/' . $value->min / 10 . '/' . round($value->sum / $value->count / 10, 1);
}
echo PHP_EOL . "}" . PHP_EOL;
Nothing interesting here: we call our run()
method, passing it the measurements filename.
The JSON string from Rust contains temperatures as integers, so we need to divide them by 10 and calculate the average temperature for each station.
The benchmark
Let's run this code on the EC2 instance. The configuration is the same as last time: an m6a.8xlarge
with 32 vCPUs and 128GB of memory. For the hard disk, I opted for a 200GB io1 volume (to reach 10,000 IOPS).
We run it with:
perf stat -o 1B-ffi.log -r 10 -d php app/index.php
and these are the results:
Performance counter stats for 'php app/index.php' (10 runs):
58802.93 msec task-clock # 29.718 CPUs utilized ( +- 0.26% )
4736 context-switches # 80.191 /sec ( +- 3.80% )
57 cpu-migrations # 0.965 /sec ( +- 13.37% )
52703 page-faults # 892.378 /sec ( +- 1.33% )
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
<not supported> L1-dcache-loads
<not supported> L1-dcache-load-misses
<not supported> LLC-loads
<not supported> LLC-load-misses
1.9787 +- 0.0197 seconds time elapsed ( +- 1.00% )
1.9787 seconds! ๐ฅณ ๐
This is a surprising result, considering the overhead of calling an external module and the fact that we are still making some calculations on the PHP side of the app.
Conclusions
After this 2-parts-journey we can affirm that:
- PHP is slow, but the performance improves significantly when using threads
- Performance tuning is a game of trade-offs: you can improve the speed of a task by saturating all the CPU cores, but your system will become unresponsive. In PHP this is a problem if your application needs to accept more than one connection at a time
- For heavy tasks, you can delegate to optimized external libraries
The full code is available on Github:
https://github.com/gfabrizi/1brc-php-ffi
I hope you enjoyed the post!
Top comments (1)
:)
On a more serious note - it's not a bad practice at all - mixing languages and taking advantage of their strengths (where they matter).