Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C#] Tests fail on MacOS arm64 with a stack overflow #41397

Closed
CurtHagenlocher opened this issue Apr 26, 2024 · 15 comments
Closed

[C#] Tests fail on MacOS arm64 with a stack overflow #41397

CurtHagenlocher opened this issue Apr 26, 2024 · 15 comments
Assignees
Labels
Component: C# Priority: Blocker Marks a blocker for the release Type: bug
Milestone

Comments

@CurtHagenlocher
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

Importing pyarrow into PythonNET seems to trigger a stack overflow. At first it seemed that this was triggered by the move to .NET 8 but the failure can be repro'd locally against the maint-15.0.0 branch (which uses 7.0) so there's clearly something else going on.

The active test run was aborted. Reason: Test host process crashed : Stack overflow.
   at Python.Runtime.Runtime.PyImport_ImportModule(System.String)
   at Python.Runtime.PyModule.Import(System.String)
   at Python.Runtime.Py.Import(System.String)
   at Apache.Arrow.Tests.CDataSchemaPythonTest.ImportRecordBatchFromBuffer()
   at System.RuntimeMethodHandle.InvokeMethod(System.Object, Void**, System.Signature, Boolean)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(System.Object, System.Reflection.BindingFlags)
   at Xunit.Sdk.TestInvoker`1[[System.__Canon, System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].CallTestMethod(System.Object)

Component(s)

C#

@raulcd
Copy link
Member

raulcd commented May 3, 2024

I am marking this as a blocker for 16.1.0. If I can't get any news around the issue in the next 3-4 days I will remove all C# commits from the 16.1.0 branch in order to unblock the release. Hopefully this is reasonable. Do let me know if I should wait more time. Thanks!

@CurtHagenlocher
Copy link
Contributor Author

I don't think this should be a blocker because I get the same error when running against the 15.0.0 branch. To the best of my ability to tell, the stack overflow is coming from inside the kernel so maybe this reflects some kind of OS change? This would also explain why we don't see the same error on e.g. Linux.

Unfortunately, most of my hardcore debugging experience is on Windows and the toolset is just entirely different. I'm currently approaching this by trying to create a minimal repro.

@raulcd
Copy link
Member

raulcd commented May 8, 2024

The problem is that at the moment the nuget job and all the verification jobs for the 16.1.0 release that involves C# are failing. We can't generate new packages and can't verify the release with this failure. We can't release at the moment.
See the nuget failure here: https://github.com/ursacomputing/crossbow/actions/runs/8985979418/job/24718232619
and verification job failures here: #41572 (comment)

@raulcd
Copy link
Member

raulcd commented May 8, 2024

On a different note, I can see that currently non of the C# verification nor the nuget jobs are failing on the nightly tasks on main so we might just have to find what is the issue that solved it on main and cherry-pick for the 16.1.0 release.

@raulcd
Copy link
Member

raulcd commented May 8, 2024

Ok, cherry-picking 48a9639 seems to fix the issue on the maintenance branch:
#41437 (comment)
and
#41437 (comment)
I will validate once the Release Candidate 1 is created.

@raulcd
Copy link
Member

raulcd commented May 9, 2024

I've validated that RC1 jobs are successful with the commit mentioned above. I am closing the issue as this is not happening on main for nuget or verification jobs.

@raulcd raulcd closed this as completed May 9, 2024
@CurtHagenlocher
Copy link
Contributor Author

I'm still seeing this both in PR builds and locally so I'm reopening. See #41603 for a recent PR build that exhibits this problem.

@raulcd raulcd modified the milestones: 16.1.0, 17.0.0 May 14, 2024
@adamreeve
Copy link
Contributor

Would it make sense to downgrade the runner image to macos-13 or one of the other non-ARM images to keep the tests passing until this can be fixed? It appears that this issue only started when GitHub changed macos-latest to be an ARM image. I tested downgrading to macos-13 and that does seem to fix the issue; I ran the C# actions a few times on my fork without any failures.

@adamreeve
Copy link
Contributor

Also, this has been reported as a bug in PythonNET at pythonnet/pythonnet#2396, so this doesn't look specific to pyarrow.

And I should add that I didn't test macos-14-large as that needs a paid account, so it's not clear whether it was the upgrade to MacOS 14 that broke things or the change to ARM.

@CurtHagenlocher
Copy link
Contributor Author

It definitely worked on my M1 Mac before the OS upgrade.

Are the OS images shared between tests or can the runner be downgraded just for the C# runs?

@assignUser
Copy link
Member

We could downgrade the C# jobs only, the only thing we need to keep in mind is the binary jobs for the releases and if want/need to build nuget binaries on arm64 mac.

@CurtHagenlocher
Copy link
Contributor Author

The build environment shouldn't matter when building the packages as they're all purely MSIL; it would only matter when testing.

@adamreeve
Copy link
Contributor

This is the change I tested, which only affected the C# jobs: adamreeve@449a8b3. Looking more closely at the failures in #41397 (comment), those appear to be a different issue and it makes sense that those were fixed by 48a9639, so I think maybe the verification jobs aren't actually affected by this.

@CurtHagenlocher
Copy link
Contributor Author

Thanks @adamreeve! This looks like the best thing to do in the near term.

kou pushed a commit that referenced this issue Jun 3, 2024
#41934)

### What changes are included in this PR?

Downgrades the macOS test image for C# to use an older operating system. This works around pythonnet/pythonnet#2396.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #41397

Authored-by: Curt Hagenlocher <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@kou
Copy link
Member

kou commented Jun 3, 2024

Issue resolved by pull request 41934
#41934

@kou kou closed this as completed Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: C# Priority: Blocker Marks a blocker for the release Type: bug
Projects
None yet
Development

No branches or pull requests

5 participants