-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle apache DocumentRoot cyrillic encoding #17083
Conversation
When DocumentRoot contains cyrillic characters like DocumentRoot /home/hans/web/cyrillicрф.ratma.net/public_html and PHP is invoked with SetHandler (*PS not applicable to ProxySetMatch, the problem occurs with SetHandler specifically) like DocumentRoot /home/hans/web/cyrillicрф.ratma.net/public_html <FilesMatch \.php$> SetHandler "proxy:unix:/run/php/php8.3-fpm-cyrillicрф.ratma.net.sock" </FilesMatch> then apache will url-encode the cyrillic characters before sending it to fpm, so env_script_filename will contain /home/hans/web/cyrillic%D1%80%D1%84.ratma.net/public_html/index.php and we need to url-decode it to /home/hans/web/cyrillicрф.ratma.net/public_html/index.php otherwise we hit that zlog(ZLOG_DEBUG, "Primary script unknown"); SG(sapi_headers).http_response_code = 404; PUTS("File not found.\n"); error code path.
hmm interesting, the patch seems to break 2 tests, both related to ProxyPass (which I know handles cyrillic differently than SetHandler 🤔 ):
|
I will check this later next week. |
Found a way to not trigger
Nice. PS if you're checking this issue on a linux system and your
before starting. (Even Ubuntu24.04 server installer does not default to utf-8 locale for some reason :'( ) |
ping @bukka got time? |
Ok so it took me a little bit to realise that this is most likely that bug in httpd that just got fixed: apache/httpd#470 (it's part of 2.4.x branch so it will be released in the next httpd version). It's kind of different variant of #15246 . This is not really a PHP bug and changing that is in theory a BC break (even though quite unlikely to hit anyone). Maybe we could consider add this to master in case something similar happen in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said maybe it makes sense to add this as a feature and protection for future failures. It needs also a test though - look to other apache tests in fpm/tests for inspiration.
sapi/fpm/fpm/fpm_main.c
Outdated
ptr = &pt[len]; // php_raw_url_decode() writes a trailing null byte, &pt[len] is that null byte. | ||
goto apache_cyrillic_jump; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite hacky and not sure if it's going to work for another loop part. You should maybe alloc new space and replace pt in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite hacky
yeah well, i couldn't really find a better way without jump🤔
for example we could do
bool skip_first=false;
if(apache_was_here && memchr(pt, '%', len)) {
len = php_raw_url_decode(pt, len);
ptr = &pt[len]; // php_raw_url_decode() writes a trailing null byte, &pt[len] is that null byte.
skip_first=true;
}
while (skip_first || (ptr = strrchr(pt, '/')) || (ptr = strrchr(pt, '\\'))){
skip_first=false;
- now we check a flag with every iteration, and we disable a flag with every iteration, a flag we don't need at all if we use jmp
or we could do
if(apache_was_here && memchr(pt, '%', len)) {
len = php_raw_url_decode(pt, len);
ptr = &pt[len]; // php_raw_url_decode() writes a trailing null byte, &pt[len] is that null byte.
} else {
(ptr = strrchr(pt, '/')) || (ptr = strrchr(pt, '\\'))
}
do{...}
while((ptr = strrchr(pt, '/')) || (ptr = strrchr(pt, '\\'));
now we avoid the goto and the overhead-per-iteration and the flag but now we have to duplicate the (ptr = strrchr(pt, '/')) || (ptr = strrchr(pt, '\\'))
logic
not sure if it's going to work for another loop part.
i don't understand what you meant, what may not work?
You should maybe alloc new space and replace pt in this case.
not needed, php_raw_url_decode will never lengthen the string. it may shrink the string, or leave it with the same length, but never grow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry it's fine
I just thought about it and I think we should not change it even in master because httpd will not be encoding it so it would make impossible to have some paths with % accesible. It would really make sense if httpd kept encoding the path but this going to change soon... |
didn't know this before testing it, but this patch does not break documentRoots with
probably been broken for years, but nobody(*) encountered it
hmm that is concerning 🤔 |
% is also affected. I suspect a great deal of other characters are affected as well
So there was a bug in httpd that changed it from not encoding the path to encoding the path. This was also backported to stable versions of various distros as I think it might have been a security fix. Are you able to test this with latest 2.4.x branch if this is fixed there or you can still see the issue there? Also can you check with tag 2.4.59 and check if you can see issue there? If you haven't compiled httpd before here are my build steps (spend a bit of time to figure it out so sharing it with out to make it quicker - you need to change /path/to/apr ofc):
|
EDIT: ~~hmmm tested with apache2.4.59 with mod_fcgid-2.3.9, and this patch fixes both the #!/bin/bash
set -e
set -x
wget 'https://archive.apache.org/dist/httpd/httpd-2.4.59.tar.bz2'
tar xfv httpd-2.4.59.tar.bz2;
cd httpd-2.4.59/;
svn co http://svn.apache.org/repos/asf/apr/apr/trunk srclib/apr;
./buildconf;
./configure --prefix=/usr/local/apache2 --enable-so --enable-ssl --with-mpm=event --enable-mods-shared=all --with-included-apr;
make -j$(nproc);
sudo make install;
wget 'https://dlcdn.apache.org/httpd/mod_fcgid/mod_fcgid-2.3.9.tar.bz2'
tar xfv mod_fcgid-2.3.9.tar.bz2;
cd mod_fcgid-2.3.9/;
APXS=/usr/local/apache2/bin/apxs ./configure.apxs;
make -j$(nproc);
sudo make install;
service apache2 restart; here php-fpm fail with both cyrillic and % without this PR, but succeed with both when this PR is applied. Also tested applying the diff from apache/httpd#470 , unfortunately
so I had to partially apply 470 manually, which leaves room for error, here is my httpd-2.4.59.tar.bz2 + 470.diff
and with that compiled in.. everything works with php-master, even without this PR 😮 well i'm surprised |
Edit: this whole answer seems wrong. done a bunch more tests, see next message <.< okay, re-did the test with vanilla php-src and vanilla httpd2.4.59+mod_fcgid-2.3.9 using the
script again, and neither cyrillic nor % worked.
and.. cyrillic and so.. it seems apache/httpd#470 does not affect this PR on apache2.4.59 with mod_fcgid-2.3.9 , or my testing was insufficient.. anyway, testing this stuff manually was exhausting, ideally should write some automated tests to run whatever apache versions and apache PR's seems relevant, but not sure if i'm up for that task |
... did a bunch more testing, and this time wrote a script to automate all the testing, and it turns out that apache2.4.59 does NOT url-encode special characters, contrary to what my previous tests showed (wtf?), and that this PR would break compatibility with apache2.4.59. here is the test script i came up with, but the important part is that it has 6 tests, and all 6 of them pass (i expected only the first 2 to pass, but no, all 6 of them pass), <?php
declare(strict_types=1);
error_reporting(E_ALL);
set_error_handler(function ($errno, $errstr, $errfile, $errline) {
if (error_reporting() & $errno) {
throw new ErrorException($errstr, 0, $errno, $errfile, $errline);
}
});
function quote(string $str): string
{
// escapeshellarg() does not handle unicode correctly: https://3v4l.org/Hkv7h
if (str_contains($str, "\x00")) {
throw new RuntimeException("unix shell arguments cannot contain null bytes");
}
return "'" . str_replace("'", "'\\''", $str) . "'";
}
function e(string $cmd)
{
echo $cmd . PHP_EOL;
passthru($cmd, $ret);
if ($ret !== 0) {
throw new RuntimeException("Command returned non-zero $ret: $cmd");
}
}
function fsleep(float $seconds)
{
$int = (int)$seconds;
$frac = (int) (($seconds - $int) * 1e9);
time_nanosleep($int, $frac);
}
function sh(string $cmd)
{
e("/bin/bash -c " . quote($cmd));
}
if (DIRECTORY_SEPARATOR !== '/') {
throw new RuntimeException("Must be run on Unix");
}
function compile_apache_string(): string
{
$sh = <<<'SH'
#!/bin/bash
set -e
set -x
set -o pipefail
cd httpd-2.4.59/
mkdir -p install_dir
# CFLAGS='-DBIG_SECURITY_HOLE'
./configure --enable-so --enable-ssl --with-mpm=event --enable-mods-shared=all --with-included-apr --enable-proxy-fcgi --enable-proxy-http --prefix=$(pwd)/install_dir
make -j$(nproc)
make install
cd ..
SH;
return $sh;
}
function compile_apache(): void
{
sh(compile_apache_string());
}
function generatehttpdConf(string $relativeDocumentRoot): void
{
$httpdConf = <<<'EOF'
ErrorLog "$(pwd)/apache_error.log"
ServerRoot "$(pwd)/httpd-2.4.59/install_dir"
LoadModule unixd_module modules/mod_unixd.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_fcgi_module modules/mod_proxy_fcgi.so
LoadModule proxy_http_module modules/mod_proxy_http.so
# authz host
LoadModule authz_core_module modules/mod_authz_core.so
LoadModule authz_host_module modules/mod_authz_host.so
# if unixd is loaded
<IfModule unixd_module>
# User $(whoami)
# Group $(whoami)
User nobody
Group nogroup
</IfModule>
<IfModule !mpm_netware_module>
PidFile "$(pwd)/httpd.pid"
</IfModule>
# Funfact: apache does not support listening on unix sockets :(
# requested this in 2013: https://bz.apache.org/bugzilla/show_bug.cgi?id=55898
# 9123: random port hopefully not in use
Listen 127.0.0.1:9123
DocumentRoot "$(pwd)/htdocs/$relativeDocumentRoot$"
<Directory "$(pwd)/htdocs/$relativeDocumentRoot$">
Options Indexes FollowSymLinks
AllowOverride All
Require all granted
</Directory>
# php-fpm
<FilesMatch \.php$>
SetHandler "proxy:unix:$(pwd)/php-fpm.sock|fcgi://localhost/"
</FilesMatch>
EOF;
$httpdConf = strtr($httpdConf, [
'$(pwd)' => getcwd(),
'$relativeDocumentRoot$' => $relativeDocumentRoot,
]);
var_dump($httpdConf);
file_put_contents("httpd.conf", $httpdConf, LOCK_EX);
$relative = "htdocs/$relativeDocumentRoot/test.txt";
touch($relative);
$realpath = realpath($relative);
file_put_contents($realpath, $realpath, LOCK_EX);
file_put_contents("htdocs/$relativeDocumentRoot/test.php", "<?php echo(__FILE__);", LOCK_EX);
}
class httpd
{
public $apacheDescriptorSpec = [
0 => ['pipe', 'rb'], // default is to inherit our stdin, we don't want that.
//1 => ['pipe', 'wb'],
//2 => ['pipe', 'wb'],
];
public $apachePipes = [];
public $apacheProc;
private function start()
{
$this->apacheProc = proc_open('httpd-2.4.59/install_dir/bin/httpd -f ' . quote(getcwd() . "/httpd.conf") . ' -X', $this->apacheDescriptorSpec, $this->apachePipes);
if ($this->apacheProc === false) {
throw new RuntimeException("Failed to start apache");
}
fclose($this->apachePipes[0]);
unset($this->apachePipes[0]);
assert(empty($this->apachePipes));
fsleep(1);
}
public function __construct()
{
$this->start();
}
public function __destruct()
{
$status = proc_get_status($this->apacheProc);
if ($status['running']) {
proc_terminate($this->apacheProc);
}
proc_close($this->apacheProc);
}
public function restart()
{
$status = proc_get_status($this->apacheProc);
$pid1 = $status['pid'];
$pid2 = (int)file_get_contents("httpd.pid");
if ($status['running']) {
proc_terminate($this->apacheProc, SIGTERM);
}
if(posix_kill($pid2, 0)) {
posix_kill($pid2, SIGTERM);
}
echo "Waiting for apache pid1 ($pid1) to exit...";
while (($status = proc_get_status($this->apacheProc))['running']) {
fsleep(0.1);
echo ".";
}
echo "Waiting for apache pid2 ($pid2) to exit...";
while (posix_kill($pid2, 0)) {
fsleep(0.1);
echo ".";
}
proc_close($this->apacheProc);
echo "Restarting apache...";
$this->start();
}
};
$sh = <<<'SH'
#!/bin/bash
set -e
set -x
set -o pipefail
if [ -d "php-src" ]; then
echo "php-src directory already exists. Skipping cloning."
else
git clone 'https://github.com/php/php-src.git' --depth 1
cd php-src
./buildconf
./configure --disable-all --enable-fpm
make -j$(nproc)
cd ..
fi
# if php-fpm.pid exists, kill the process
if [ -f "php-fpm.pid" ]; then
pid=$(cat php-fpm.pid)
kill $pid || true
# wait for the process to exit..
while kill -0 $pid 2>/dev/null; do
sleep 0.1
done
fi
# make a basic php-fpm.conf
# warning: php-fpm.conf is hardcoded to relative directory /usr/local/var
# for some unknown reason, we need to specify the full paths (not even ./ relative works)
cat > php-fpm.conf <<EOF
[global]
pid = $(pwd)/php-fpm.pid
error_log = $(pwd)/php-fpm.log
daemonize = no
[www]
; user and group is "optional" according to docs, but it gives a warning if not set..
user = $(whoami)
group = $(whoami)
; unix socket
listen = $(pwd)/php-fpm.sock
listen.owner = $(whoami)
listen.group = $(whoami)
listen.mode = 0777
pm = static
pm.max_children = 2
EOF
# kill apache if it's running
if [ -f "httpd.pid" ]; then
pid=$(cat httpd.pid)
kill $pid || true
# wait for the process to exit..
while kill -0 $pid 2>/dev/null; do
printf .
sleep 0.1
done
# unlike php-fpm, apache does not clean up its pid file after exiting
rm -f httpd.pid
fi
if [ -d "httpd-2.4.59" ]; then
echo "httpd-2.4.59 directory already exists. Skipping cloning."
else
wget 'https://archive.apache.org/dist/httpd/httpd-2.4.59.tar.bz2'
tar xfv httpd-2.4.59.tar.bz2
cd httpd-2.4.59/
mkdir -p install_dir
svn co http://svn.apache.org/repos/asf/apr/apr/trunk srclib/apr
%compile_apache_string%
fi
# make a basic apache config
mkdir -p htdocs
SH;
$sh = strtr($sh, [
'%compile_apache_string%' => compile_apache_string(),
]);
sh($sh);
$fpmDescriptorSpec = [
0 => ['pipe', 'rb'], // default is to inherit our stdin, we don't want that.
//1 => ['pipe', 'wb'],
//2 => ['pipe', 'wb'],
];
$fpmPipes = [];
$fpmProc = proc_open('php-src/sapi/fpm/php-fpm -y php-fpm.conf -c php-src/php.ini-development --allow-to-run-as-root --nodaemonize --force-stderr', $fpmDescriptorSpec, $fpmPipes);
if ($fpmProc === false) {
throw new RuntimeException("Failed to start php-fpm");
}
fclose($fpmPipes[0]);
var_dump(proc_get_status($fpmProc));
$apacheDescriptorSpec = [
0 => ['pipe', 'rb'], // default is to inherit our stdin, we don't want that.
//1 => ['pipe', 'wb'],
//2 => ['pipe', 'wb'],
];
generatehttpdConf("");
register_shutdown_function(function () use ($fpmProc, &$httpd) {
echo "Shutting down..." . PHP_EOL;
$status = proc_get_status($fpmProc);
if ($status['running']) {
proc_terminate($fpmProc);
}
proc_close($fpmProc);
unset($httpd); // should force __destruct...
echo "Done." . PHP_EOL;
});
generatehttpdConf("");
$httpd = new httpd();
fsleep(0.1); // initialize sleep
function httpget(string $url): string
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
if ($response === false) {
throw new RuntimeException("curl failed: " . curl_error($ch));
}
curl_close($ch);
return $response;
}
function return_var_dump(...$args)
{
ob_start();
var_dump(...$args);
return ob_get_clean();
}
echo "first just a simple test to check that apache+php-fpm is working\n";
$tests = array(
"http://localhost:9123/test.txt" => getcwd() . "/htdocs/test.txt",
"http://localhost:9123/test.php" => getcwd() . "/htdocs/test.php",
);
foreach ($tests as $url => $expected) {
$response = httpget($url);
var_dump(["status" => ($response === $expected) ? "OK" : "FAIL", "url" => $url, "expected" => $expected, "response" => $response]);
}
echo "Testing cyrillic docroot: cyrillicрф\n";
@mkdir("htdocs/cyrillicрф", 0777, true);
generatehttpdConf("cyrillicрф/");
$httpd->restart();
fsleep(0.1); // initialize sleep
$tests = array(
"http://localhost:9123/test.txt" => getcwd() . "/htdocs/cyrillicрф/test.txt",
"http://localhost:9123/test.php" => getcwd() . "/htdocs/cyrillicрф/test.php",
);
foreach ($tests as $url => $expected) {
$response = httpget($url);
var_dump(["status" => ($response === $expected) ? "OK" : "FAIL", "url" => $url, "expected" => $expected, "response" => $response]);
}
echo "testing % docroot\n";
@mkdir("htdocs/%", 0777, true);
generatehttpdConf("%/");
$httpd->restart();
fsleep(0.1); // initialize sleep
$tests = array(
"http://localhost:9123/test.txt" => getcwd() . "/htdocs/%/test.txt",
"http://localhost:9123/test.php" => getcwd() . "/htdocs/%/test.php",
);
foreach ($tests as $url => $expected) {
$response = httpget($url);
var_dump(["status" => ($response === $expected) ? "OK" : "FAIL", "url" => $url, "expected" => $expected, "response" => $response]);
} and the important output is DocumentRoot "/hans/cyrillic/htdocs/"
first just a simple test to check that apache+php-fpm is working
array(4) {
["status"]=>
string(2) "OK"
["url"]=>
string(30) "http://localhost:9123/test.txt"
["expected"]=>
string(30) "/hans/cyrillic/htdocs/test.txt"
["response"]=>
string(30) "/hans/cyrillic/htdocs/test.txt"
}
array(4) {
["status"]=>
string(2) "OK"
["url"]=>
string(30) "http://localhost:9123/test.php"
["expected"]=>
string(30) "/hans/cyrillic/htdocs/test.php"
["response"]=>
string(30) "/hans/cyrillic/htdocs/test.php"
}
Testing cyrillic docroot: cyrillicрф
DocumentRoot "/hans/cyrillic/htdocs/cyrillicрф/"
array(4) {
["status"]=>
string(2) "OK"
["url"]=>
string(30) "http://localhost:9123/test.txt"
["expected"]=>
string(43) "/hans/cyrillic/htdocs/cyrillicрф/test.txt"
["response"]=>
string(43) "/hans/cyrillic/htdocs/cyrillicрф/test.txt"
}
array(4) {
["status"]=>
string(2) "OK"
["url"]=>
string(30) "http://localhost:9123/test.php"
["expected"]=>
string(43) "/hans/cyrillic/htdocs/cyrillicрф/test.php"
["response"]=>
string(43) "/hans/cyrillic/htdocs/cyrillicрф/test.php"
}
testing % docroot
DocumentRoot "/hans/cyrillic/htdocs/%/"
array(4) {
["status"]=>
string(2) "OK"
["url"]=>
string(30) "http://localhost:9123/test.txt"
["expected"]=>
string(32) "/hans/cyrillic/htdocs/%/test.txt"
["response"]=>
string(32) "/hans/cyrillic/htdocs/%/test.txt"
}
array(4) {
["status"]=>
string(2) "OK"
["url"]=>
string(30) "http://localhost:9123/test.php"
["expected"]=>
string(32) "/hans/cyrillic/htdocs/%/test.php"
["response"]=>
string(32) "/hans/cyrillic/htdocs/%/test.php"
} |
Thanks for testing. So just to clarify that patch got arleady applied to httpd 2.4.x branch which is kind of the main thing to test to see if it's going to work fine there and in the future httpd versions. But in general you should see the same results as with 2.4.59 because the main logic should be the same. I have got some automated test in my testing tool: https://github.com/wstool/wst-php-fpm/blob/563e061dbb7aa6fbe43979e78869b9b107c1b1b2/spec/instances/httpd-proxy-fcgi-handler-uds-basic.yaml . I will extend it later with your scenarios so it's covered too. |
This reverts commit 5136018. This makes us go back to using SetHandler instead of ProxyPassMatch The SetHandler unicode problems appears to have been fixed in upstream apache httpd 2.4.59 , so the solution to running domains with unicode/cyrillic characters should just be to update to apache>=2.4.59 ref php/php-src#17083 (comment) The ProxyPassMatch solution added new problems with OpenCart like https://forum.hestiacp.com/t/php-file-not-found-but-it-exists/16694/4 AH01136: Unescaped URL path matched ProxyPass; ignoring unsafe nocanon [proxy:error] [pid 204010:tid 204023] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:8000 (localhost:8000) failed
This reverts commit 5136018. This makes us go back to using SetHandler instead of ProxyPassMatch The SetHandler unicode problems appears to have been fixed in upstream apache httpd 2.4.59 , so the solution to running domains with unicode/cyrillic characters should just be to update to apache>=2.4.59 ref php/php-src#17083 (comment) The ProxyPassMatch solution added new problems with OpenCart like https://forum.hestiacp.com/t/php-file-not-found-but-it-exists/16694/4 AH01136: Unescaped URL path matched ProxyPass; ignoring unsafe nocanon [proxy:error] [pid 204010:tid 204023] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:8000 (localhost:8000) failed
When Apache's DocumentRoot contains cyrillic characters like
and PHP is invoked with SetHandler (*PS not applicable to ProxyPassMatch, the problem occurs with SetHandler specifically) like
then apache will url-encode the cyrillic characters before sending it to fpm, so env_script_filename will contain
and we need to url-decode it to
otherwise we hit that
php-src/sapi/fpm/fpm/fpm_main.c
Lines 1843 to 1845 in c5d9c7d
error code path.