-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support musl gettext (libintl) functions in tests #17168
Conversation
@arnaud-lb the Alpine CI should run without GNU gettext installed after this |
ext/gettext/gettext.c
Outdated
* say that MUSL_LOCPATH will be ignored in the | ||
* scenarios where secure_getenv() will return | ||
* NULL rather than the value of the variable. */ | ||
musl_locpath = secure_getenv("MUSL_LOCPATH"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be guarded for MacOS and Windows to build, ideally move the decls for btd_result
and musl_locpath
into this guard to prevent unused warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a ./configure
check for secure_getenv
and the corresponding #if
. It won't matter on musl, but now a fallback to getenv()
is there in case in case someone write a new libc with this bug that is also missing secure_getenv()
.
Most failures in #13925 were caused by too much reliance on Glibc behavior. In most cases Musl was not at fault, was actually Posix compliant, and the fix improved code quality. In this case however, Musl is diverging from the standard. Can some of these issues be fixed in Musl instead? This would be better, IMHO. Otherwise, can we update the Alpine package so that it depends on |
It's only in one of the three bullet points that musl is obviously wrong, when The additional
I think if you want to go this route, it would be better to have |
FWIW the only test that fails if we ignore var_dump(is_string(bindtextdomain('foo', null))); in |
Right. My first impression was that
You are right that we need to handle the Regarding |
ext/gettext/config.m4
Outdated
dnl If libintl.h is provided by libc, it's possible that libc is musl. | ||
dnl The gettext family of functions under musl ignores the codeset | ||
dnl suffix on directories like "en_US.UTF-8"; instead they look only | ||
dnl in "en_US". To accomodate that, we copy some test cases from one | ||
dnl to the other. | ||
AC_MSG_NOTICE([copying en_US.UTF-8 messages to en_US in case you are on musl]) | ||
mkdir -p "${srcdir%/}"/ext/gettext/tests/locale/en_US | ||
cp -r "${srcdir%/}"/ext/gettext/tests/locale/en_US.UTF-8/* "${srcdir%/}"/ext/gettext/tests/locale/en_US/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible issue with this, is that new files in en_US.UTF-8/
will not be reflected to en_US/
until ./configure
is executed again. This may be confusing during test development.
An alternative would be to move this to a Makefile rule, but I'm not sure that we can efficiently trigger the copy only if the contents of en_US.UTF-8/
changed.
An other one would be to symlink en_US/
to en_US.UTF-8/
. I don't know how well symlinks are supported on various systems, but since the en_US/
path will be used only on Musl this might be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easiest of all would be to symlink them in the source tree and commit the symlink, but I don't have any Windows machines around to test on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An other alternative would to be create the symlink in phpt files directly, and only on Musl. This way we avoid committing a symlink as well as potential Windows issues. We can create a .inc
file in ext/opcache/tests
that we include in all relevant tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest force-push, I changed the cp -r
to an ln -s
. That should avoid issues with newly-added test files, and is probably safe: the ln -s
is only performed if your libc provides gettext()
, and I doubt that will be true on any system where symlinks don't work.
When no binding is found, I think it falls back to the current locale, and the current locale is mapped to a file under I'll rewrite this later to just return false on musl without trying to fix the |
This is a bugfix, please target 8.3. |
394fe12
to
4334502
Compare
I just force-pushed a new version that returns $expected = [true, true, false, "UTF-8", "UTF-8"];
$expected_musl = [false, true, false, false, false]; so that one |
You want me to change the target branch? (I guess you forward-port bug fixes from older branches rather than back-porting them from master?) |
Yes and yes |
According to POSIX, bindtextdomain() returns "the implementation- defined default directory pathname used by the gettext family of functions" when its second parameter is NULL (i.e. when you are querying the directory corresponding to some text domain and that directory has not yet been set). Its PHP counterpart is feeding that result direclty to RETURN_STRING, but this can go wrong in two ways: 1. If an error occurs, even POSIX-compliant implementations may return NULL. 2. At least one non-compliant implementation (musl) lacks a default directory and returns NULL whenever the domain has not yet been bound. In either of those cases, PHP segfaults on the NULL string. In this commit we check for the NULL, and RETURN_FALSE when it happens rather than crashing. This partially addresses GH php#13696
Musl has two quirks that are leading to failed internationalization tests. First is that the return value of bindtextdomain(..., NULL) will always be false, rather than an "implementation-defined default directory," because musl does not have an implementation-defined default directory. One test needs a special case for this. Second is that the musl implementation of bind_textdomain_codeset() always returns NULL. The POSIX-correctness of this is debatable, but it is roughly equivalent to correct, because musl only support UTF-8, so the NULL value indicating that the codeset is unchanged from the locale's codeset (UTF-8) is accurate. PHP's bind_textdomain_codeset() function however treats NULL as failure, unconditionally: * php/doc-en#4311 * php#17163 This unfortunately causes false to be returned consistently on musl -- even when nothing unexpected has happened -- and naturally this is affecting several tests. For now we change two tests to accept "false" in addition to "UTF-8" so that they may pass on musl. If PHP's bind_textdomain_codeset() is updated to differentiate between NULL and NULL-with-errno-set, these tests can also be updated once again to reject the NULL-with-errno result. This partially addresses GH php#13696
The gettext() family of functions under musl does not support codeset suffixes like ".UTF-8", because the only codeset it understands is UTF-8. (Yes, it is annoying that it doesn't support the suffix for the codeset that it does understand; no, I am not in charge.) Thanks to this, we have six failing tests on musl, * FAIL Gettext basic test with en_US locale that should be on nearly every system [ext/gettext/tests/gettext_basic-enus.phpt] * FAIL Test if bindtextdomain() returns string id if no directory path is set( if directory path is 'null') [ext/gettext/tests/gettext_bindtextdomain-cwd.phpt] * FAIL Test dcgettext() functionality [ext/gettext/tests/gettext_dcgettext.phpt] * FAIL Test dgettext() functionality [ext/gettext/tests/gettext_dgettext.phpt] * FAIL Test if dngettext() returns the correct translations (optionally plural). [ext/gettext/tests/gettext_dngettext-plural.phpt] * FAIL Test ngettext() functionality [ext/gettext/tests/gettext_ngettext.phpt] These are all fixed by symlinking the en_US.UTF-8 message data to en_US, where musl is able to find it. This does not make the situation any better for developers (who don't know what libc their users will be running), but that problem is inhereted from C and is not the fault of the gettext extension. This partially addresses GH php#13696
If gettext() et al. come from musl libc, ./configure will symlink ext/gettext/tests/locale/en_US.UTF-8 to en_US (without the ".UTF-8" suffix). We want to ignore the symlink.
4334502
to
385a91f
Compare
Done. There are merge conflicts porting back to master, but nothing serious. |
* PHP-8.3: NEWS for GH-17168 ext/gettext/config.m4: symlink en_US.UTF-8 test bits to en_US for musl ext/gettext/tests: fix libintl return values under musl ext/gettext/gettext.c: handle NULLs from bindtextdomain()
* PHP-8.4: NEWS for GH-17168 ext/gettext/config.m4: symlink en_US.UTF-8 test bits to en_US for musl ext/gettext/tests: fix libintl return values under musl ext/gettext/gettext.c: handle NULLs from bindtextdomain()
Work around several musl quirks to get the test suite passing:
bindtextdomain(..., NULL)
returnsNULL
rather than a default path if no binding existsbind_textdomain_codeset(..., NULL)
always returnsNULL
en_US
works, and noten_US.UTF-8
.The commit messages go into more detail about each workaround. Copying the message data from
en_US.UTF-8
toen_US
in./configure
is pretty hacky but it gets the job done (suggestions for improvement are welcome).Refs: