-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpcio >=1.51 python stack is segfaults on osx (x86-64) #281
Comments
I just wanted to chime in a finding here. We ran into this issue with the arrow nightly builds and was able to confirm that gRPC v1.54.0 appears to alleviate this crash. See apache/arrow#35090 for more info. For now we have to bump back down a version, but if v1.54 can get released on conda-forge arrow would switch to that. Thanks! |
Thanks. Could you try 1.53 as well? 1.54 is pretty fresh and we'd have to either skip 1.53 or migrate twice in a row. |
Now that I see that 1.53 broke a version pattern scheme that's held for a while, I might go to 1.54 directly with the next migration... |
Still broken for me (and seems so for arrow as well: apache/arrow#35089) |
grpc 1.54 is in conda-forge for a while now (there's also 1.55 already, but that will need more time due to breaking changes around protobuf). I'd still be very interested in knowing what broke and how! |
I think the crash is fixed, so this can be closed but requests still aren't being sent correctly. I'd be interested in this portion of the code, but getting a debug build working of grpc and abseil might be tricky for someone outside these projects (I wasn't able to get abseil cmake to keep the debug symbols): |
From @lidavidm's work in apache/arrow#36908:
In the meantime, I've started looking at enabling the C++ test suite here as well: #311 |
this doesn't generate abseil as debug but I'll take a quick look |
I built both
|
Thanks for digging into this!
So grpc-cpp=1.51.1=*_0 only bumped the version, but shortly after we rebuilt for a new re2 (build *_1) and more importantly: the newest abseil at the time (build *_2). It's possible that grpc was using abseil in a way that wasn't compatible with that newer version, however, they definitely followed suit as of grpc 1.53 (which doesn't look like it needed source changes). Which grpc & abseil version did you use for your debug builds? |
in this case grcp-cpp-feedstock But this issue has persisted since 1.51 as per the title. I'd dig in more but grcp takes forever to build. |
step-by-step repro for posterity (note: grpc: diff --git a/recipe/build-cpp.sh b/recipe/build-cpp.sh
index 51b2f3f..7a137d9 100755
--- a/recipe/build-cpp.sh
+++ b/recipe/build-cpp.sh
@@ -60,6 +60,7 @@ cmake ${CMAKE_ARGS} .. \
-GNinja \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_BUILD_TYPE=Release \
+ -DCMAKE_CXX_FLAGS_RELEASE="${CMAKE_CXX_FLAGS_RELEASE:-} -O1 -g -DNDEBUG" \
-DCMAKE_CXX_FLAGS="$CXXFLAGS" \
-DCMAKE_PREFIX_PATH=$PREFIX \
-DCMAKE_INSTALL_PREFIX=$PREFIX \ abseil: iff --git a/recipe/build-abseil.sh b/recipe/build-abseil.sh
index a1533e0..3fdc43f 100644
--- a/recipe/build-abseil.sh
+++ b/recipe/build-abseil.sh
@@ -24,6 +24,7 @@ fi
cmake -G Ninja \
${CMAKE_ARGS} \
-DCMAKE_BUILD_TYPE=Release \
+ -DCMAKE_CXX_FLAGS_RELEASE="${CMAKE_CXX_FLAGS_RELEASE:-} -O1 -g -DNDEBUG" \
-DCMAKE_CXX_STANDARD=17 \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_PREFIX_PATH=${PREFIX} \
|
I tried to add a test for OP example in #312, but either I cannot get the server process to work correctly, our CI setup doesn't allow accessing accessing the default port of 127.0.0.1. On all platforms, it runs into a variant of
Interestingly, despite the failing startup, the segfault seems to happen on osx. |
you don't need a server, since the segfault happens due to the connection state being updated in callbacks for the client (eg IDLE -> CONNECTING) |
I had tried it without the server, and it gets the same error (on linux). If we are to integrate this into CI (best way to fix it and keep it fixed), we're going to need a form of the test that passes. |
I don't think the server is started? the https://github.com/conda-forge/grpc-cpp-feedstock/pull/312/files#diff-ddad1f9c7894b9921ef910104dd72b393dcb3d052923fdc85a5f0eaad8bdb450 differs significantly from https://github.com/grpc/grpc/blob/master/examples/python/helloworld/greeter_server.py also while the crash is non-deterministic, the keepalive error is 100% deterministic and you can just run it once and verify no warnings/errors were emitted. |
Good point, thanks. I had just stupidly copied stuff together from their docs. Updated the PR, let's see how it goes. |
@h-vetinari I saw you tried to downgrade abseil to test if it failed, have you tried new abseil + old (1.50) grpc? |
#313 for now does 1.50 as published. We can then try to bump abseil and see what happens. |
Small update from #313: |
Alright, the good news is that both on 1.50 as well as on 1.56, the newest abseil (20230802) fixes the error. The bad news is that the migration for this is stuck in purgatory until we figure out how to move to a Macos>=10.13 world |
thanks for the investigation @h-vetinari. I'd be suspicious if |
Do you think it has something to do with having compiled abseil 20230125 against 10.9? That possibility did not occur to me, I would have guessed it's an abseil bug, but perhaps you're right. In that case, we could fix all the existing grpc builds by just recompiling abseil against 10.13, which would be very nice... |
That was my initial thought before debugging (since that was the big change other than the version bump), but the optional code didn't have/fallback on libc++, so not sure if it's worth your time. |
Closing this as fixed by #315 |
Solution to issue cannot be found in the documentation.
Issue
On 1.51/2, I get the following making a unary call:
then it randomly can segfault.
Running under lldb shows it's there's some kind of error (linkage to libstdc++?) in abseil (20230125.0):
Installed packages
# Name Version Build Channel bzip2 1.0.8 h0d85af4_4 conda-forge c-ares 1.18.1 h0d85af4_0 conda-forge ca-certificates 2022.12.7 h033912b_0 conda-forge grpcio 1.52.1 py311h814d153_0 conda-forge libabseil 20230125.0 cxx17_hf0c8a7f_1 conda-forge libcxx 14.0.6 hccf4f1f_0 conda-forge libffi 3.4.2 h0d85af4_5 conda-forge libgrpc 1.52.1 h493e69f_0 conda-forge libprotobuf 3.21.12 hbc0c0cd_0 conda-forge libsqlite 3.40.0 ha978bb4_0 conda-forge libzlib 1.2.13 hfd90126_4 conda-forge ncurses 6.3 h96cf925_1 conda-forge openssl 3.0.8 hfd90126_0 conda-forge pip 23.0.1 pyhd8ed1ab_0 conda-forge protobuf 4.21.12 py311h814d153_0 conda-forge python 3.11.0 he7542f4_1_cpython conda-forge python_abi 3.11 3_cp311 conda-forge re2 2023.02.01 hf0c8a7f_0 conda-forge readline 8.1.2 h3899abd_0 conda-forge setuptools 67.3.2 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge tk 8.6.12 h5dbffcc_0 conda-forge tzdata 2022g h191b570_0 conda-forge wheel 0.38.4 pyhd8ed1ab_0 conda-forge xz 5.2.6 h775f41a_0 conda-forge zlib 1.2.13 hfd90126_4 conda-forge
Environment info
The text was updated successfully, but these errors were encountered: