Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the on-disk Phar after mapPhar() reads incorrect files with the phar:// protocol #17125

Open
TimWolla opened this issue Dec 12, 2024 · 3 comments

Comments

@TimWolla
Copy link
Member

Description

The following code:

<?php

$a = new Phar(__DIR__ . '/a.phar');
$a->addFromString(
	'foo.php',
	<<<'EOT'
		<?php
		echo "this is foo.php\n";
		EOT
);
$a->addFromString(
	'bar.php',
	<<<'EOT'
		<?php
		echo "this is bar.php\n";
		EOT
);
$a->setStub(
	<<<'EOT'
		<?php
		Phar::mapPhar('a.phar');

		include 'phar://a.phar/foo.php';
		rename('b.phar', 'a.phar');
		include 'phar://a.phar/bar.php';
		__HALT_COMPILER(); ?>
		EOT
);

$a = new Phar(__DIR__ . '/b.phar');
$a->addFromString(
	'bar.php',
	<<<'EOT'
		<?php
		/* Shift the offsets compared to a.phar. */
		echo "this is bar.php\n";
		EOT
);
$a->addFromString(
	'foo.php',
	<<<'EOT'
		<?php
		echo "this is foo.php\n";
		EOT
);
$a->setStub(
	<<<'EOT'
		<?php
		Phar::mapPhar('b.phar');

		include 'phar://b.phar/foo.php';
		rename('a.phar', 'b.phar');
		include 'phar://b.phar/bar.php';
		__HALT_COMPILER(); ?>
		EOT
);

Running php a.phar resulted in this output:

this is foo.php
PHP Parse error:  Unterminated comment starting line 2 in phar:///*redacted*/a.phar/bar.php on line 2

But I expected this output instead:

this is foo.php
this is bar.php

The reason for this appears to be that the file offsets and lengths are cached during the mapPhar(), but each time the phar is opened with phar://X.phar it is reopened from disk without validating that it's still the same phar / instead of reopening it from a cached file descriptor.

openat(AT_FDCWD, "*redacted*/a.phar", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=343, ...}) = 0
lseek(4, 0, SEEK_CUR)                   = 0
lseek(4, 0, SEEK_SET)                   = 0
read(4, "<?php\nPhar::mapPhar('a.phar');\n\n"..., 8192) = 343
read(4, "", 8192)                       = 0
lseek(4, 144, SEEK_SET)                 = 144
read(4, " ?>\r\nX\0\0\0\2\0\0\0\21\0\0\0\1\0\0\0\0\0\0\0\0\0\7\0\0\0b"..., 8192) = 199
lseek(4, 149, SEEK_SET)                 = 149
read(4, "X\0\0\0\2\0\0\0\21\0\0\0\1\0\0\0\0\0\0\0\0\0\7\0\0\0bar.ph"..., 8192) = 194
lseek(4, -8, SEEK_END)                  = 335
read(4, "\3\0\0\0GBMB", 8192)           = 8
lseek(4, -40, SEEK_END)                 = 303
read(4, "\250\267o[\342\272\367\303\221>\361\307kf&\257\376\335\360;\356=`\244\270#\335\353\277\253\345$"..., 8192) = 40
lseek(4, 0, SEEK_SET)                   = 0
read(4, "<?php\nPhar::mapPhar('a.phar');\n\n"..., 8192) = 343
ioctl(3, TCGETS, 0x7ffc358e61c0)        = -1 ENOTTY (Inappropriate ioctl for device)
fstat(3, {st_mode=S_IFREG|0664, st_size=343, ...}) = 0
fstat(3, {st_mode=S_IFREG|0664, st_size=343, ...}) = 0
read(3, "<?php\nPhar::mapPhar('a.phar');\n\n"..., 4096) = 343
lseek(4, 272, SEEK_SET)                 = 272
read(4, "<?php\necho \"this is foo.php\\n\";\250"..., 8192) = 71
close(4)                                = 0
write(1, "this is foo.php\n", 16this is foo.php
)       = 16
rename("b.phar", "a.phar")              = 0
newfstatat(AT_FDCWD, "*redacted*/a.phar", {st_mode=S_IFREG|0664, st_size=387, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "*redacted*", {st_mode=S_IFDIR|0775, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "*redacted*", {st_mode=S_IFDIR|0775, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "*redacted*", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "*redacted*", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "*redacted*/a.phar", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=387, ...}) = 0
lseek(4, 0, SEEK_CUR)                   = 0
lseek(4, 241, SEEK_SET)                 = 241
read(4, "<?php\n/* Shift the offsets compa"..., 8192) = 146
close(4)                                = 0
write(2, "PHP Parse error:  Unterminated c"..., 122PHP Parse error:  Unterminated comment starting line 2 in phar://*redacted*/a.phar/bar.php on line 2
) = 122
close(3)                                = 0

This is an issue for self-updating phars where the autoloader needs to load a class after the update has been performed.

PHP Version

PHP 8.3.14

Operating System

Ubuntu 24.04

@nielsdos
Copy link
Member

Right, and if I understand correctly: to fix this we should only use the cached file offsets if the file is still the same. The alternative of unconditionally using the cached offsets together with the cached phar would not work because that will still not allow auto-updating phars to work.

@TimWolla
Copy link
Member Author

The alternative of unconditionally using the cached offsets together with the cached phar would not work because that will still not allow auto-updating phars to work.

Do you mean because of Windows not allowing to delete / rename a file that is still open?

Because my expectation would be that the state of the files within a Phar is internally consistent, independent of whether I load a file before or after the Phar is replaced on disk. It should not silently use a new version (an error would be fine, caching the actual contents would be preferred).

If I want to access the new contents, I would loadPhar() them.

@nielsdos
Copy link
Member

I'm getting confused about this ticket, but I think I'm starting to understand your perspective. Let's make sure we're on the same page.

I understand that there is a bug here in that there's a mismatch between the cached offsets and when the contents are being read, resulting in the contents being garbled.

I saw two possible fixes:

  1. "Detect" that the file has changed and throw away the cached offsets, rely only on the disk contents
  2. Only rely on cached data, regardless of the change on disk

I thought the proper fix would be option 1, but you seem to imply the proper fix is option 2.
I suppose that option 1 is not viable for self-updating phars because the updater still needs to run the old code?

I would have to recheck how this works again internally, but I'm not so sure that the phar file contents are cached (e.g. if a particular file in the phar was never read before).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants