Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV on start if LC_ALL is unset #22375

Closed
egorenar opened this issue Jul 4, 2024 · 10 comments
Closed

SIGSEGV on start if LC_ALL is unset #22375

egorenar opened this issue Jul 4, 2024 · 10 comments

Comments

@egorenar
Copy link

egorenar commented Jul 4, 2024

Description
Perl crashes on OpenWRT ARM router at start if LC_ALL is unset.
OpenWRT issue: openwrt/packages#24512

Steps to Reproduce
Just run perl.

The error seems to be triggered by code in locale.c

Perl_croak(aTHX_ "%s: %" LINE_Tf ": panic: %s%s%s\n",
                     __FILE__, immediate_caller_line,
                     msg, errno_text, called_by);

which calls Perl_sv_vcatpvfn_flags where it crashes by attempting to interpret a passed line number as a string pointer.
Probably because Perl_sv_vcatpvfn_flags is not able to handle the specifier U32uf ??

Crash backtrace with gdb on ARM:

Program received signal SIGSEGV, Segmentation fault.
0xb6fd8d18 in strlen (s=<optimized out>, s@entry=0x5c8 <error: Cannot access memory at address 0x5c8>) at src/string/strlen.c:17
17              for (w = (const void *)s; !HASZERO(*w); w++);
   0xb6fd8d0c <strlen+52>:      e308e080        movw    lr, #32896      @ 0x8080
   0xb6fd8d10 <strlen+56>:      e34f4efe        movt    r4, #65278      @ 0xfefe
   0xb6fd8d14 <strlen+60>:      e348e080        movt    lr, #32896      @ 0x8080
=> 0xb6fd8d18 <strlen+64>:      e593c000        ldr     r12, [r3]
   0xb6fd8d1c <strlen+68>:      e1a02003        mov     r2, r3
   0xb6fd8d20 <strlen+72>:      e2833004        add     r3, r3, #4
   0xb6fd8d24 <strlen+76>:      e08c1004        add     r1, r12, r4
   0xb6fd8d28 <strlen+80>:      e1c1100c        bic     r1, r1, r12
   0xb6fd8d2c <strlen+84>:      e111000e        tst     r1, lr
   0xb6fd8d30 <strlen+88>:      0afffff8        beq     0xb6fd8d18 <strlen+64>
   0xb6fd8d34 <strlen+92>:      e1a03002        mov     r3, r2
(gdb) bt
#0  0xb6fd8d18 in strlen (s=<optimized out>, s@entry=0x5c8 <error: Cannot access memory at address 0x5c8>) at src/string/strlen.c:17
#1  0xb6e3265c in Perl_sv_vcatpvfn_flags (my_perl=my_perl@entry=0xb6ca1020, sv=sv@entry=0xb6c9ea40, pat=pat@entry=0xb6e87e48 "%s: %: panic: %s%s%s\n", patlen=patlen@entry=0x15, args=<optimized out>,
    args@entry=0xbefffbe0, svargs=<optimized out>, svargs@entry=0xb6e76764 <Perl_vmess+84>, sv_count=<optimized out>, sv_count@entry=0xbefffbe0, maybe_tainted=<optimized out>,
    maybe_tainted@entry=0x0, flags=<optimized out>, flags@entry=0x0) at sv.c:12669
#2  0xb6e4984c in Perl_sv_vsetpvfn (my_perl=my_perl@entry=0xb6ca1020, sv=sv@entry=0xb6c9ea40, pat=pat@entry=0xb6e87e48 "%s: %: panic: %s%s%s\n", patlen=0x15, args=args@entry=0xbefffbe0,
    svargs=svargs@entry=0x0, sv_count=sv_count@entry=0x0, maybe_tainted=maybe_tainted@entry=0x0) at sv.c:11295
#3  0xb6e76764 in Perl_vmess (my_perl=my_perl@entry=0xb6ca1020, pat=0xb6e87e48 "%s: %: panic: %s%s%s\n", pat@entry=0xb6ca1020 "`\300\377\266", args=0xbefffbe0, args@entry=0xbefffbc0) at util.c:1684
#4  0xb6e75ee0 in Perl_vcroak (my_perl=my_perl@entry=0xb6ca1020, pat=pat@entry=0xb6ca1020 "`\300\377\266", args=args@entry=0xbefffbc0) at util.c:1899
#5  0xb6e76af8 in Perl_croak (my_perl=my_perl@entry=0xb6ca1020, pat=0xb6e87e48 "%s: %: panic: %s%s%s\n") at util.c:1952
#6  0xb6d3e38c in Perl_locale_panic (msg=0xb6ffffb0 "'C.UTF-8;C;C;C;C;C' needs an '=' to split name=value\n", immediate_caller_line=immediate_caller_line@entry=0x5c8,
    higher_caller_file=<optimized out>, higher_caller_line=0x1001, higher_caller_line@entry=0xb6d40b98) at locale.c:1113
#7  0xb6d40a34 in S_parse_LC_ALL_string (my_perl=0xb6e880c8, my_perl@entry=0xb6ca1020, string=string@entry=0xb6fffcf0 "C.UTF-8;C;C;C;C;C", output=0xb6ffff98, output@entry=0xbefffc84,
    always_use_full_array=0x20, always_use_full_array@entry=0x1, panic_on_error=0x1, caller_line=0x1001, override=override_if_ignored) at locale.c:1480
#8  0xb6d40b98 in S_new_LC_ALL (my_perl=my_perl@entry=0xb6ca1020, lc_all=lc_all@entry=0xb6fffcf0 "C.UTF-8;C;C;C;C;C", force=force@entry=0x1) at locale.c:4090
#9  0xb6d439b8 in S_give_perl_locale_control (caller_line=0x23fd, lc_all_string=0xb6fffcf0 "C.UTF-8;C;C;C;C;C", my_perl=0xb6ca1020) at locale.c:8575
#10 Perl_init_i18nl10n (my_perl=0xb6ca1020, printwarn=printwarn@entry=0x1) at locale.c:9213
#11 0xb6cfc95c in perl_construct (my_perl=<optimized out>) at perl.c:464
#12 0x0001091c in main (argc=<optimized out>, argv=<optimized out>, env=<optimized out>) at perlmain.c:102
(gdb)

Expected behavior
No crash.

Perl configuration

❯ perl -V
Summary of my perl5 (revision 5 version 40 subversion 0) configuration:

  Platform:
    osname=linux
    osvers=3.18.19
    archname=arm-linux-musl
    uname='Linux OpenWrt 3.18.19 #1 SMP Thu Jan 1 12:00:00 CEST 2015 arm GNU/Linux'
    config_args='-der -Dusethreads'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=undef
    use64bitall=undef
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='arm-openwrt-linux-muslgnueabi-gcc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -Os -pipe -fno-caller-saves -fno-plt -fhonour-copts -mfloat-abi=hard -D_LARGEFILE64_SOURCE -I/home/egorenar/Repositories/openwrt-rel/staging_dir/toolchain-arm_cortex-a15+neon-vfpv4_gcc-13.3.0_musl_eabi/usr/include -I/home/egorenar/Repositories/openwrt-rel/staging_dir/toolchain-arm_cortex-a15+neon-vfpv4_gcc-13.3.0_musl_eabi/include -I/home/egorenar/Repositories/openwrt-rel/staging_dir/toolchain-arm_cortex-a15+neon-vfpv4_gcc-13.3.0_musl_eabi/include/fortify'
    optimize='-O2'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -Os -pipe -fno-caller-saves -fno-plt -fhonour-copts -mfloat-abi=hard -D_LARGEFILE64_SOURCE -I/home/egorenar/Repositories/openwrt-rel/staging_dir/toolchain-arm_cortex-a15+neon-vfpv4_gcc-13.3.0_musl_eabi/usr/include -I/home/egorenar/Repositories/openwrt-rel/staging_dir/toolchain-arm_cortex-a15+neon-vfpv4_gcc-13.3.0_musl_eabi/include -I/home/egorenar/Repositories/openwrt-rel/staging_dir/toolchain-arm_cortex-a15+neon-vfpv4_gcc-13.3.0_musl_eabi/include/fortify'
    ccversion=''
    gccversion='13.3.0'
    gccosandvers=''
    intsize=4
    longsize=4
    ptrsize=4
    doublesize=8
    byteorder=1234
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=8
    longdblkind=0
    ivtype='long'
    ivsize=4
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='arm-openwrt-linux-muslgnueabi-gcc'
    ldflags ='-L/home/egorenar/Repositories/openwrt-rel/staging_dir/toolchain-arm_cortex-a15+neon-vfpv4_gcc-13.3.0_musl_eabi/usr/lib -L/home/egorenar/Repositories/openwrt-rel/staging_dir/toolchain-arm_cortex-a15+neon-vfpv4_gcc-13.3.0_musl_eabi/lib -fuse-ld=bfd -znow -zrelro'
    libpth=/home/egorenar/Repositories/openwrt-rel/staging_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/lib /home/egorenar/Repositories/openwrt-rel/staging_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/usr/lib
    libs=-lpthread -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
    libc=
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-fPIC -rdynamic -Wl,-rpath,/usr/lib/perl5/5.40/CORE'
    cccdlflags='-fPIC'
    lddlflags='-shared -L/home/egorenar/Repositories/openwrt-rel/staging_dir/toolchain-arm_cortex-a15+neon-vfpv4_gcc-13.3.0_musl_eabi/usr/lib -L/home/egorenar/Repositories/openwrt-rel/staging_dir/toolchain-arm_cortex-a15+neon-vfpv4_gcc-13.3.0_musl_eabi/lib -fuse-ld=bfd -znow -zrelro'


Characteristics of this binary (from libperl):
  Compile-time options:
    HAS_LONG_DOUBLE
    HAS_STRTOLD
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_ZAPHOD32
    PERL_HASH_USE_SBOX32
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Built under linux
  @INC:
    /usr/lib/perl5/5.40

Workaround
Setting LC_ALL=C fixes the problem.

@mauke
Copy link
Contributor

mauke commented Jul 4, 2024

This is at least two different issues.

The first issue is that the locale code is throwing an error (presumably because LC_ALL is set to an invalid value, or at least what it considers to be invalid).

The second issue is that the code that tries to throw the error crashes. I suspect U32uf is set to "" for some reason.

What does perl -V:u32uformat say on your platform?

@egorenar
Copy link
Author

egorenar commented Jul 4, 2024

What does perl -V:u32uformat say on your platform?

$ perl -V:u32uformat
u32uformat='UNKNOWN';

@mauke
Copy link
Contributor

mauke commented Jul 4, 2024

Huh.
Did you compile this perl yourself? If so, can you post the generated config.sh and config.h files?

@egorenar
Copy link
Author

egorenar commented Jul 4, 2024

Huh. Did you compile this perl yourself? If so, can you post the generated config.sh and config.h files?

It was built as part of OpenWrt router image for my ARM router.
There was a version bump recently to 5.40. Since then the problem has appeared.

config.h:

#define U32uf               /**/

config.h.gz
config.sh.gz

@mauke
Copy link
Contributor

mauke commented Jul 4, 2024

OK, I'm confused.

U32uf used to be defined entirely in perl.h (based on preprocessor heuristics). These tests were moved to Configure in commit a503b74 (in 2022), which was first released with perl 5.38.0. Now U32uf is defined in config.h based on the value of the shell variable u32uformat (as recorded in config.sh).

However, in your case this variable doesn't exist (which makes no sense) and so the macro expands to nothing (which is bad because it breaks all format strings the macro is used in).

Another anomaly is that your config.sh contains definitions for ansi2knr and d_bcmp. The last traces of these symbols were removed in commits c9db53f and e5d7f4e (both in 2017). The last perl release that had these was 5.26.3 (in 2018).

It looks like somehow you're using (parts of?) the perl configure infrastructure from 2018 (or before) together with perl headers from 2024, so you're missing all kinds of symbols.

mauke added a commit to mauke/perl5 that referenced this issue Jul 4, 2024
If any of these format strings are empty, things can go very wrong at
runtime, from garbage output to segfaults (e.g. see Perl#22375).

This is a static check, so it could be placed in any source file. I
chose util.c because according to the comment at the top, it is the home
of "any stuff that people couldn't think of a better place for".
@egorenar
Copy link
Author

egorenar commented Jul 4, 2024

Thanks for looking into this.
And i think you are right, it seems OpenWrt generates a custom config.sh file based on
https://github.com/openwrt/packages/blob/master/lang/perl/files/base.config
which explains ansi2knr 😄
So, OpenWrt's Perl package configuration is broken at the moment.

mauke added a commit that referenced this issue Jul 5, 2024
If any of these format strings are empty, things can go very wrong at
runtime, from garbage output to segfaults (e.g. see #22375).

This is a static check, so it could be placed in any source file. I
chose util.c because according to the comment at the top, it is the home
of "any stuff that people couldn't think of a better place for".
@mauke
Copy link
Contributor

mauke commented Jul 6, 2024

The current state of things: The OpenWRT perl is cross-compiled with a custom config.sh. This config.sh was originally created for perl v5.26 or earlier; it has since been manually updated with a few new values and symbols, but it has not been kept in sync with the state of Configure in the perl source tree.

The net result is that OpenWRT's config.sh still defines symbols that are no longer used by perl (like ansi2knr), which is harmless, but it is also missing a bunch of other symbols that perl implicitly relies on existing. In particular, several format string definitions are missing, which results in a miscompiled perl v5.38 and v5.40.

As far as I can tell, there is nothing for us to do here: Our Configure creates the symbols we expect. OpenWRT needs to update its config.sh template to add the symbols that perl v5.40 expects to exist (and maybe should delete the old symbols that are no longer used). As a safeguard against this specific kind of misconfiguration, I have added a check that the format strings in question are not empty (commit c5df4fd).¹

Therefore I am going to close this ticket until something else comes up.


¹ Now that I think about it, the problem is not that these strings are empty; it's that they don't exist. So instead of tripping a static assertion, this code would be an undeclared symbol error on current OpenWRT (util.c:125:27: error: ‘I32df’ undeclared here). But in either case, the net result is the same: The compiler will produce an error instead of creating a miscompiled perl executable.

@mauke mauke closed this as not planned Won't fix, can't repro, duplicate, stale Jul 6, 2024
@egorenar
Copy link
Author

egorenar commented Jul 6, 2024

I fixed all outdated symbols and added new ones and now got this:

$ perl
locale.c: 1480: panic: 'C.UTF-8;C;C;C;C;C' needs an '=' to split name=value
; errno=0
Called by locale.c: 4097

Any hint what Perl expects here exactly ?
LC_ALL looks okay to me 😕

From locale.c:

#  ifndef PERL_LC_ALL_USES_NAME_VALUE_PAIRS

        if (! name_value) {
            /* Get the index of the category in this position */
            index = map_LC_ALL_position_to_index[component_number++];
        }
        else

#  endif

        {   /* Get the category part when each component is the
             * 'category=locale' form */

            category_end = strchr(s, '=');

            /* The '=' terminates the category name.  If no '=', is improper
             * form */
            if (! category_end) {
                error = no_equals;
                goto failure;
            }

It seems PERL_LC_ALL_USES_NAME_VALUE_PAIRS is defined ?

This means: d_perl_lc_all_uses_name_value_pairs='define'

@khwilliamson
Copy link
Contributor

I haven't been closely following this conversation, but this value is defined from a Configure probe. I skimmed the conversation and see mention of cross compilation. Could it be that this is a result of a mismatch between things? Platforms use two different methods to denote LC_ALL when not all locale categories have the same locale. The system you ran perl on uses the positional notation, but the configuration that is being used is expecting name=value notation. Here's some relevant lines from a FreeBSD box that has positional notation

/*#define PERL_LC_ALL_USES_NAME_VALUE_PAIRS     / **/
#define  PERL_LC_ALL_SEPARATOR "/"      /**/
#define  PERL_LC_ALL_CATEGORY_POSITIONS_INIT  { 1, 2, 3, 4, 5, 6 }      /**/

In your case the separator is a semi-colon instead of a slash. It is likely that the 1..6 is correct for your box.

If you changed your config.h to correspond, locale.c would not generate this error. But I don't know why the incorrect values were generated.

@egorenar
Copy link
Author

egorenar commented Jul 6, 2024

Yep, figured it out, OpenWrt's d_perl_lc_all_* and corresponding perl_lc_all_* weren't set properly in config.sh.
After fixing it everything works as expected.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants