Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support musl gettext (libintl) functions in tests #17168

Closed
wants to merge 5 commits into from

Conversation

orlitzky
Copy link
Contributor

Work around several musl quirks to get the test suite passing:

  • bindtextdomain(..., NULL) returns NULL rather than a default path if no binding exists
  • bind_textdomain_codeset(..., NULL) always returns NULL
  • The gettext family of functions ignores any codeset suffixes on the directories that correspond to locale names, e.g. only en_US works, and not en_US.UTF-8.

The commit messages go into more detail about each workaround. Copying the message data from en_US.UTF-8 to en_US in ./configure is pretty hacky but it gets the job done (suggestions for improvement are welcome).

Refs:

@orlitzky
Copy link
Contributor Author

@arnaud-lb the Alpine CI should run without GNU gettext installed after this

* say that MUSL_LOCPATH will be ignored in the
* scenarios where secure_getenv() will return
* NULL rather than the value of the variable. */
musl_locpath = secure_getenv("MUSL_LOCPATH");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be guarded for MacOS and Windows to build, ideally move the decls for btd_result and musl_locpath into this guard to prevent unused warnings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a ./configure check for secure_getenv and the corresponding #if. It won't matter on musl, but now a fallback to getenv() is there in case in case someone write a new libc with this bug that is also missing secure_getenv().

ext/gettext/gettext.c Outdated Show resolved Hide resolved
@arnaud-lb
Copy link
Member

arnaud-lb commented Dec 16, 2024

Most failures in #13925 were caused by too much reliance on Glibc behavior. In most cases Musl was not at fault, was actually Posix compliant, and the fix improved code quality.

In this case however, Musl is diverging from the standard.

Can some of these issues be fixed in Musl instead? This would be better, IMHO.

Otherwise, can we update the Alpine package so that it depends on gettext explicitly? This would fix the issue as well.

@orlitzky
Copy link
Contributor Author

Most failures in #13925 were caused by too much reliance on Glibc behavior. In most cases Musl was not at fault, was actually Posix compliant, and the fix improved code quality.

In this case however, Musl is diverging from the standard.

Can some of these issues be fixed in Musl instead? This would be better, IMHO.

It's only in one of the three bullet points that musl is obviously wrong, when bindtextdomain(..., NULL) returns NULL instead of a default path for an unbound domain. And I think that everyone (even the authors of musl) agrees that it should be fixed in musl. But, other implementations can return NULL on error, so PHP should handle the NULL rather than segfault in any case.

The additional MUSL_LOCPATH case is harmless IMO and fixes the test suite with current versions of musl, but I could have it return false immediately instead of querying the environment variable? Ultimately, musl exists, so PHP should handle this case somehow to avoid segfaulting.

Otherwise, can we update the Alpine package so that it depends on gettext explicitly? This would fix the issue as well.

I think if you want to go this route, it would be better to have ext/gettext/config.m4 detect musl and throw an error saying to install gettext. Otherwise you will overlook other distros (Gentoo, Void) and people who compile their own PHP on Alpine.

@orlitzky
Copy link
Contributor Author

FWIW the only test that fails if we ignore MUSL_LOCPATH and instead return false is

var_dump(is_string(bindtextdomain('foo', null)));

in ext/gettext/tests/bug53251.phpt, because you get false and not a string. But that line is not really relevant to the fixed bug in the first place, it's moreso checking POSIX conformance of your bindtextdomain().

@arnaud-lb
Copy link
Member

It's only in one of the three bullet points that musl is obviously wrong

Right. My first impression was that bind_textdomain_codeset(domainname, NULL) was wrong too, but looking closely, it's probably ok (php/doc-en#4311 (comment)).

The additional MUSL_LOCPATH case is harmless IMO and fixes the test suite with current versions of musl, but I could have it return false immediately instead of querying the environment variable? Ultimately, musl exists, so PHP should handle this case somehow to avoid segfaulting.

You are right that we need to handle the NULL case, as the function can return NULL on error.

Regarding MUSL_LOCPATH, are you sure it's relevant? Looking at the code I can't find where it's used in gettext when no binding is found. Should we consider that bindtextdomain(domainanme, NULL) returning NULL is an error (even if errno is not set), and just return false?

Comment on lines 30 to 34
dnl If libintl.h is provided by libc, it's possible that libc is musl.
dnl The gettext family of functions under musl ignores the codeset
dnl suffix on directories like "en_US.UTF-8"; instead they look only
dnl in "en_US". To accomodate that, we copy some test cases from one
dnl to the other.
AC_MSG_NOTICE([copying en_US.UTF-8 messages to en_US in case you are on musl])
mkdir -p "${srcdir%/}"/ext/gettext/tests/locale/en_US
cp -r "${srcdir%/}"/ext/gettext/tests/locale/en_US.UTF-8/* "${srcdir%/}"/ext/gettext/tests/locale/en_US/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible issue with this, is that new files in en_US.UTF-8/ will not be reflected to en_US/ until ./configure is executed again. This may be confusing during test development.

An alternative would be to move this to a Makefile rule, but I'm not sure that we can efficiently trigger the copy only if the contents of en_US.UTF-8/ changed.

An other one would be to symlink en_US/ to en_US.UTF-8/. I don't know how well symlinks are supported on various systems, but since the en_US/ path will be used only on Musl this might be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easiest of all would be to symlink them in the source tree and commit the symlink, but I don't have any Windows machines around to test on.

Copy link
Member

@arnaud-lb arnaud-lb Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An other alternative would to be create the symlink in phpt files directly, and only on Musl. This way we avoid committing a symlink as well as potential Windows issues. We can create a .inc file in ext/opcache/tests that we include in all relevant tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest force-push, I changed the cp -r to an ln -s. That should avoid issues with newly-added test files, and is probably safe: the ln -s is only performed if your libc provides gettext(), and I doubt that will be true on any system where symlinks don't work.

@orlitzky
Copy link
Contributor Author

Regarding MUSL_LOCPATH, are you sure it's relevant? Looking at the code I can't find where it's used in gettext when no binding is found. Should we consider that bindtextdomain(domainanme, NULL) returning NULL is an error (even if errno is not set), and just return false?

When no binding is found, I think it falls back to the current locale, and the current locale is mapped to a file under MUSL_LOCPATH when setlocale() is called. But you are right that this is not really relevant to gettext() -- you can't just drop .mo files in there and have them work -- but it's the closest thing to a POSIX-compliant return value I could think of.

I'll rewrite this later to just return false on musl without trying to fix the NULL.

@nielsdos
Copy link
Member

This is a bugfix, please target 8.3.
Coincidentally a fuzzer just stumbled across not handling the NULL return today

@orlitzky
Copy link
Contributor Author

I just force-pushed a new version that returns false as soon as bindtextdomain(..., NULL) returns NULL rather than messing with MUSL_LOCPATH. It's easy enough to work around that in the one test where it matters. And in that test, I made the expected output a little more precise by doing,

$expected = [true, true, false, "UTF-8", "UTF-8"];
$expected_musl = [false, true, false, false, false];

so that one false interspersed in the middle of the usual expected output will not go unnoticed.

@orlitzky
Copy link
Contributor Author

This is a bugfix, please target 8.3. Coincidentally a fuzzer just stumbled across not handling the NULL return today

You want me to change the target branch? (I guess you forward-port bug fixes from older branches rather than back-porting them from master?)

@nielsdos
Copy link
Member

This is a bugfix, please target 8.3. Coincidentally a fuzzer just stumbled across not handling the NULL return today

You want me to change the target branch? (I guess you forward-port bug fixes from older branches rather than back-porting them from master?)

Yes and yes

According to POSIX, bindtextdomain() returns "the implementation-
defined default directory pathname used by the gettext family of
functions" when its second parameter is NULL (i.e. when you are
querying the directory corresponding to some text domain and that
directory has not yet been set). Its PHP counterpart is feeding
that result direclty to RETURN_STRING, but this can go wrong in
two ways:

  1. If an error occurs, even POSIX-compliant implementations
     may return NULL.

  2. At least one non-compliant implementation (musl) lacks
     a default directory and returns NULL whenever the domain
     has not yet been bound.

In either of those cases, PHP segfaults on the NULL string. In this
commit we check for the NULL, and RETURN_FALSE when it happens rather
than crashing.

This partially addresses GH php#13696
Musl has two quirks that are leading to failed internationalization
tests. First is that the return value of bindtextdomain(..., NULL)
will always be false, rather than an "implementation-defined default
directory," because musl does not have an implementation-defined
default directory. One test needs a special case for this.

Second is that the musl implementation of bind_textdomain_codeset()
always returns NULL. The POSIX-correctness of this is debatable, but
it is roughly equivalent to correct, because musl only support UTF-8,
so the NULL value indicating that the codeset is unchanged from the
locale's codeset (UTF-8) is accurate.

PHP's bind_textdomain_codeset() function however treats NULL as
failure, unconditionally:

  * php/doc-en#4311
  * php#17163

This unfortunately causes false to be returned consistently on musl --
even when nothing unexpected has happened -- and naturally this is
affecting several tests. For now we change two tests to accept "false"
in addition to "UTF-8" so that they may pass on musl. If PHP's
bind_textdomain_codeset() is updated to differentiate between NULL and
NULL-with-errno-set, these tests can also be updated once again to
reject the NULL-with-errno result.

This partially addresses GH php#13696
The gettext() family of functions under musl does not support codeset
suffixes like ".UTF-8", because the only codeset it understands is
UTF-8. (Yes, it is annoying that it doesn't support the suffix for the
codeset that it does understand; no, I am not in charge.) Thanks to
this, we have six failing tests on musl,

  * FAIL Gettext basic test with en_US locale that should be on nearly
    every system
    [ext/gettext/tests/gettext_basic-enus.phpt]

  * FAIL Test if bindtextdomain() returns string id if no directory path
    is set( if directory path is 'null')
    [ext/gettext/tests/gettext_bindtextdomain-cwd.phpt]

  * FAIL Test dcgettext() functionality
    [ext/gettext/tests/gettext_dcgettext.phpt]

  * FAIL Test dgettext() functionality
    [ext/gettext/tests/gettext_dgettext.phpt]

  * FAIL Test if dngettext() returns the correct translations
    (optionally plural).
    [ext/gettext/tests/gettext_dngettext-plural.phpt]

  * FAIL Test ngettext() functionality
    [ext/gettext/tests/gettext_ngettext.phpt]

These are all fixed by symlinking the en_US.UTF-8 message data to en_US,
where musl is able to find it.

This does not make the situation any better for developers (who don't
know what libc their users will be running), but that problem is
inhereted from C and is not the fault of the gettext extension.

This partially addresses GH php#13696
If gettext() et al. come from musl libc, ./configure will symlink
ext/gettext/tests/locale/en_US.UTF-8 to en_US (without the ".UTF-8"
suffix). We want to ignore the symlink.
@orlitzky orlitzky force-pushed the musl-gettext-support branch from 4334502 to 385a91f Compare December 19, 2024 14:49
@orlitzky orlitzky changed the base branch from master to PHP-8.3 December 19, 2024 14:50
@orlitzky
Copy link
Contributor Author

This is a bugfix, please target 8.3. Coincidentally a fuzzer just stumbled across not handling the NULL return today

Done. There are merge conflicts porting back to master, but nothing serious.

arnaud-lb added a commit that referenced this pull request Dec 19, 2024
arnaud-lb added a commit that referenced this pull request Dec 19, 2024
* PHP-8.3:
  NEWS for GH-17168
  ext/gettext/config.m4: symlink en_US.UTF-8 test bits to en_US for musl
  ext/gettext/tests: fix libintl return values under musl
  ext/gettext/gettext.c: handle NULLs from bindtextdomain()
arnaud-lb added a commit that referenced this pull request Dec 19, 2024
* PHP-8.4:
  NEWS for GH-17168
  ext/gettext/config.m4: symlink en_US.UTF-8 test bits to en_US for musl
  ext/gettext/tests: fix libintl return values under musl
  ext/gettext/gettext.c: handle NULLs from bindtextdomain()
@arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb closed this Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault ext/gettext/gettext.c bindtextdomain()
5 participants