Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure with LTO enabled in GCC #10

Open
jeremyd2019 opened this issue Jul 1, 2021 · 9 comments
Open

Failure with LTO enabled in GCC #10

jeremyd2019 opened this issue Jul 1, 2021 · 9 comments

Comments

@jeremyd2019
Copy link

I ran this locally, and I'm seeing heap corruption:

WinDbgX analysis
Critical error detected c0000374
(218.4ec): Break instruction exception - code 80000003 (first chance)
ntdll!RtlReportCriticalFailure+0x56:
00007ff9`aa02f122 cc              int     3
0:000> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************


KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 1718

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 22192

    Key  : Analysis.Init.CPU.mSec
    Value: 499

    Key  : Analysis.Init.Elapsed.mSec
    Value: 27069

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 52

    Key  : Timeline.OS.Boot.DeltaSec
    Value: 92809

    Key  : Timeline.Process.Start.DeltaSec
    Value: 26

    Key  : WER.OS.Branch
    Value: vb_release

    Key  : WER.OS.Timestamp
    Value: 2019-12-06T14:06:00Z

    Key  : WER.OS.Version
    Value: 10.0.19041.1

    Key  : WER.Process.Version
    Value: 3.9.6000.1013


NTGLOBALFLAG:  70

PROCESS_BAM_CURRENT_THROTTLED: 0

PROCESS_BAM_PREVIOUS_THROTTLED: 0

APPLICATION_VERIFIER_FLAGS:  0

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ff9aa02f122 (ntdll!RtlReportCriticalFailure+0x0000000000000056)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 1
   Parameter[0]: 0000000000000000

FAULTING_THREAD:  000004ec

PROCESS_NAME:  python.exe

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_CODE_STR:  80000003

EXCEPTION_PARAMETER1:  0000000000000000

ADDITIONAL_DEBUG_TEXT:  Enable Pageheap/AutoVerifer ; Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]

STACK_TEXT:  
00000000`00000000 00000000`00000000 heap_corruption!python.exe+0x0


STACK_COMMAND:  ** Pseudo Context ** ManagedPseudo ** Value: ffffffff ** ; kb

SYMBOL_NAME:  heap_corruption!python.exe

MODULE_NAME: heap_corruption

IMAGE_NAME:  heap_corruption

FAILURE_BUCKET_ID:  HEAP_CORRUPTION_80000003_heap_corruption!python.exe

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {ba8abe75-133c-bc8e-6aca-284ac0a108d6}

Followup:     MachineOwner
---------

I will try pageheap/autoverifier as recommended by windbgx, and building with debug symbols.

@jeremyd2019
Copy link
Author

I'm still fighting to get usable debug symbols, but I did get pointed in the vicinity of PyInit_nt, so that may be a lead.

@jeremyd2019
Copy link
Author

#0  0x00007ff965cf9e68 in initialize_members.lto_priv.0 (
    desc=0x7ff965ec1940 <stat_result_desc>, members=0x1d75d9bad50,
    n_members=19) at ../Python-3.9.6/Objects/structseq.c:375
#1  0x00007ff965d03e4e in PyStructSequence_NewType (
    desc=desc@entry=0x7ff965ec1940 <stat_result_desc>)
    at ../Python-3.9.6/Objects/structseq.c:465
#2  0x00007ff965d03b6d in posixmodule_exec (m=0x1d75b07c090)
    at ../Python-3.9.6/Modules/posixmodule.c:15501
#3  0x00007ff965d008f6 in PyModule_ExecDef (module=0x1d75b07c090,
    def=<optimized out>) at ../Python-3.9.6/Objects/moduleobject.c:399

@jeremyd2019
Copy link
Author

So, PyStructSequence has a 'magic value' PyStructSequence_UnnamedField (https://github.com/msys2-contrib/cpython-mingw/blob/mingw-v3.9.6/Objects/structseq.c#L21). It compares this supposed "string" (const char * const) with entries in the PyStructSequence_Field array by pointer (==). Apparently in the course of LTO, the pointer to that string in initialize_members.lto_priv.0 is not equal to the pointer elsewhere, resulting in this crash.

It calls count_members https://github.com/msys2-contrib/cpython-mingw/blob/mingw-v3.9.6/Objects/structseq.c#L325-L336, where it gets 19 members and 3 unnamed members. It allocates memory for n_members - n_unnamed_members + 1 https://github.com/msys2-contrib/cpython-mingw/blob/mingw-v3.9.6/Objects/structseq.c#L460, but then when it initializes the members suddenly the unnamed members appear named to it https://github.com/msys2-contrib/cpython-mingw/blob/mingw-v3.9.6/Objects/structseq.c#L368 so it writes too many members.

@jeremyd2019
Copy link
Author

/cc @lhmouse for GCC LTO issue. Does BFD support identical COMDAT folding?

@jeremyd2019
Copy link
Author

jeremyd2019 commented Jul 1, 2021

Dump of assembler code for function initialize_members.lto_priv.0:
   0x00007ff966339e30 <+0>:     push   %rsi
   0x00007ff966339e31 <+1>:     push   %rbx
   0x00007ff966339e32 <+2>:     lea    0x1e1913(%rip),%rsi        # 0x7ff96651b74c <__func__.16+6092>
...

(gdb) p desc->fields[7]
$10 = {name = 0x7ff966554867 <opstrings+3431> "unnamed field",
  doc = 0x7ff966559cd0 <statvfs_result.doc__+432> "integer time of last access"}

(gdb) p (char *)0x7ff96651b74c
$12 = 0x7ff96651b74c <__func__.16+6092> "unnamed field"

@jeremyd2019
Copy link
Author

Interesting... from the python docs:

const char * const PyStructSequence_UnnamedField
Special value for a field name to leave it unnamed.

Changed in version 3.9: The type was changed from char *.

So maybe that type change is why this bug wasn't happening in python 3.8?

@jeremyd2019
Copy link
Author

Does BFD support identical COMDAT folding?

I tried gcc's -fmerge-all-constants but that didn't help.

@lhmouse
Copy link

lhmouse commented Jul 2, 2021

I don't know many details about LTO. From my experience there are a few strange issues (some of them exist on Linux as well) so I am not sure whether it is a good idea to enable it in production environments.

@ssbssa
Copy link

ssbssa commented Oct 31, 2022

So, PyStructSequence has a 'magic value' PyStructSequence_UnnamedField (https://github.com/msys2-contrib/cpython-mingw/blob/mingw-v3.9.6/Objects/structseq.c#L21). It compares this supposed "string" (const char * const) with entries in the PyStructSequence_Field array by pointer (==). Apparently in the course of LTO, the pointer to that string in initialize_members.lto_priv.0 is not equal to the pointer elsewhere, resulting in this crash.

Looks a lot like this gcc bug to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants