Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDK crash in libdartjni.so (GetApplicationContext+60) when uploading to Play Store #1150

Closed
lukehutch opened this issue May 16, 2024 · 27 comments

Comments

@lukehutch
Copy link

I am getting an NDK crash in libdartjni.so (GetApplicationContext+60) when I upload my APK to the Play Store.

image

Any ideas?

@HosseinYousefi
Copy link
Member

  • What version of package:jni are you using?
  • Will this happen If you override the dependency to use the latest version of package:jni?
  • Can you create a minimal reproducible repo that has this problem?

@HosseinYousefi HosseinYousefi added the needs-info Additional information needed from the issue author label May 16, 2024
@lukehutch
Copy link
Author

  • What version of package:jni are you using?

jni 0.7.3 -- required by cronet_http 1.2.0 (which is the latest version currently).

  • Will this happen If you override the dependency to use the latest version of package:jni?

Do I do that in my project via dependency_override, or do I need to fork cronet_http to override the dependency there?

Do you know if cronet_http works with jni 0.9.2? I'm worried about causing runtime crashes by messing with dependency versions of NDK projects...

  • Can you create a minimal reproducible repo that has this problem?

I would have to create a whole new app consisting of just the cronet client, do all the Play Store setup for the app, and upload it for testing, to try to recreate the crash with the smallest amount of code possible (and I really can't afford the time to do something like that for a few weeks, because I'm about to launch my app).

It doesn't crash on my local device or emulator, but I don't know what would be different about the Play Store...

@lukehutch
Copy link
Author

Here is how I call into cronet, however... this would be close to my minimal repro case:

void main() async {
  // Set up better http clients on Android and iOS
  if (Platform.isAndroid) {
    final engine =
        CronetEngine.build(cacheMode: CacheMode.memory, cacheMaxSize: 1000000);
    runWithClient(_run, () => CronetClient.fromCronetEngine(engine));
  } else if (Platform.isIOS || Platform.isMacOS) {
    final config = URLSessionConfiguration.ephemeralSessionConfiguration()
      ..cache = URLCache.withCapacity(memoryCapacity: 1000000);
    runWithClient(_run, () => CupertinoClient.fromSessionConfiguration(config));
  } else {
    // Run with the default IOClient
    _run();
  }
}

void _run() async {
  // Must come first
  WidgetsFlutterBinding.ensureInitialized();

  // Call runApp from here...
}

@HosseinYousefi
Copy link
Member

Thanks for the quick reply.

Do I do that in my project via dependency_override, or do I need to fork cronet_http to override the dependency there?

Do you know if cronet_http works with jni 0.9.2? I'm worried about causing runtime crashes by messing with dependency versions of NDK projects...

You can use this dependency:

cronet_http:
    git:
      url: https://github.com/dart-lang/http.git
      ref: jni-0.9.2
      path: pkgs/cronet_http

I regenerated the bindings as well, so it will work just fine.

It doesn't crash on my local device or emulator, but I don't know what would be different about the Play Store...

I'll try to upload a cronet example app to play store to see if I get the same error.

@HosseinYousefi HosseinYousefi removed the needs-info Additional information needed from the issue author label May 16, 2024
@lukehutch
Copy link
Author

Thanks, awesome, that build saved me a lot of work! I'll replace this dep and see if it fixes the crash.

@lukehutch
Copy link
Author

While I have your attention -- is all of JNI deprecated in Android, or does jni just use something that is deprecated?:

Note: /home/luke/.pub-cache/hosted/pub.dev/jni-0.9.2/android/src/main/java/com/github/dart_lang/jni/JniPlugin.java uses or overrides a deprecated API.

@HosseinYousefi
Copy link
Member

Thanks, awesome, that build saved me a lot of work! I'll replace this dep and see if it fixes the crash.

The PR is here: dart-lang/http#1198
Once cronet_http: 1.2.1 gets released, you won't have to use a git dependency.

While I have your attention -- is all of JNI deprecated in Android, or does jni just use something that is deprecated?:

Haha, I just had this conversation here: dart-lang/http#1198 (comment)

It's not about JNI, it's something related to the way Flutter plugins work. I thought I have suppressed the warning, so users don't get confused by this message. However it seems that simply importing the deprecated class triggers the warning, so I should use the fully qualified name of the package instead of importing it.

You can safely ignore it, and I'll hide the warning in the next release of package:jni.

@HosseinYousefi
Copy link
Member

Any updates on this? cronet_http: 1.2.1 is also released.

@lukehutch
Copy link
Author

Hi, I have been stuck on iOS build issues, so I haven't uploaded my Android app again to the Play store (and even back when I was uploading periodically, I only saw this specific crash once), but it's possible that one of the recent crash fixes in jni is the reason for the crash I experienced too, so it may be fixed. You can go ahead and close this if you want, and I'll comment on it again if I see the crash again.

You presumably know a lot about JNI already, but I figured I'd send you this link to a JNI project I created a while back. I found I had to very carefully check every incoming parameter (including ensuring that the user is not trying to call a static method in a non-static way or vice versa, etc.), and the thrown status after every JNI call, to avoid crashes. It's a very finicky API. There may be some safety mechanisms that you can see in this code that would be useful for your jni package too, to protect against future possibilities of crashes.

https://github.com/toolfactory/narcissus/blob/main/src/main/c/narcissus.c

@HosseinYousefi
Copy link
Member

Thanks for sharing. We do similar things, but instead of using C macros, it's a Dart script that generates a "safer" C API that wraps JNI.

Closing this for now, let me know if your problem persists.

@lukehutch
Copy link
Author

lukehutch commented May 30, 2024

Hi @HosseinYousefi -- with latest jni, I am now getting the same exact crash on Android:

flutter/flutter#149229

Can you please re-open this issue?

It seems that jniEnv is not initialized when GetApplicationContext is called:

jobject GetApplicationContext() {
attach_thread();
return (*jniEnv)->NewGlobalRef(jniEnv, jni_context.appContext);
}

This also seems to be a race condition, because it is only triggered in certain builds or certain execution environments.

So this bascially means that GetApplicationContext is being called before Java_com_github_dart_1lang_jni_JniPlugin_initializeJni.

@lukehutch
Copy link
Author

@HosseinYousefi Any ideas how to go about debugging what is going on here?

@HosseinYousefi
Copy link
Member

We recently had an issue regarding 32-bit devices, can you check if the devices with the error happen to be 32-bit?

I have a PR to fix cronet, the new version will be published soon: dart-lang/http#1225

@lukehutch
Copy link
Author

lukehutch commented Jun 5, 2024

@HosseinYousefi No, it's a Pixel 7 Pro, so it's a 64-bit device.

I looked through how the plugin is registered, and I can't see how GetApplicationContext could possibly be called before initializeJni is called. So potentially jniEnv is being set back to null somehow? And I think that can only happen with memory corruption, since there is no code in the plugin that can set jniEnv to null.

@HosseinYousefi
Copy link
Member

@HosseinYousefi No, it's a Pixel 7 Pro, so it's a 64-bit device.

I looked through how the plugin is registered, and I can't see how GetApplicationContext could possibly be called before initializeJni is called. So potentially jniEnv is being set back to null somehow?

jniEnv is thread local. So it will be null on new threads. However we always run attach_thread() which uses the cached global jvm to fill it again, so it's odd to see that it's null.

Where do you actually see the error being about jniEnv being null? Can you share the full log? You can use the debug build, so the logs are not obfuscated.


One theory: we have not been calling detach_thread, maybe that causes problems? My fix for this is already landed here: #1163. You can override the jni dependency to the main branch of this repo to see if that fixes it.

@lukehutch
Copy link
Author

Unfortunately I have only seen this on release and profile builds, and I have not been able to symbolize the stacktraces. But it's crashing in GetApplicationContext() with a null reference, and there's only one way that can happen.

It is sporadic, and right now I can't replicate it, but it keeps resurfacing.

@lukehutch
Copy link
Author

lukehutch commented Jun 6, 2024

One theory: we have not been calling detach_thread, maybe that causes problems? My fix for this is already landed here: #1163. You can override the jni dependency to the main branch of this repo to see if that fixes it.

Your fix for #557 doesn't zero-out jniEnv on deinit, but even if it did, that would not solve this but potentially introduce a race condition (if another thread is trying to call GetApplicationContext as the JNI-attached thread is shutting down).

However the note in #557 may have identified the problem:

To me it seems we are lazily attaching to the current thread and caching the result in a thread_local JNIEnv*.

So potentially GetApplicationContext is being called before the lazy initializer is being called? What triggers the lazy initializer? Can GetApplicationContext (and other methods) be made to trigger initializeJni if necessary?

@lukehutch
Copy link
Author

lukehutch commented Jun 7, 2024

I just checked the reproducibility, and this crash has occurred 100% of the time in the testing phase of five different build uploads to Google Play.

Locally, I used to be able to reproduce this in Profile builds, but that somehow spontaneously fixed itself without any code changes, after clearing build files and re-building.

This is a total showstopper for my app, because I can't even get through Play Store app review without the app crashing on load. I may have to launch without cronet.

@lukehutch
Copy link
Author

jniEnv is thread local. So it will be null on new threads. However we always run attach_thread() which uses the cached global jvm to fill it again, so it's odd to see that it's null.

If you're just making a thread-local copy of a single global jniEnv instance every time you instantiate a new thread, then why is this even a thread-local reference? Just use the global entrance, it's safer, and it would solve the race condition issue, at least in this particular case.

@HosseinYousefi
Copy link
Member

HosseinYousefi commented Jun 7, 2024

Your fix for #557 doesn't zero-out jniEnv on deinit

We use pthread_key_create(&tlsKey, detach_thread), deinit is not called when threads get detached and detach_thread makes jniEnv null.

that would not solve this but potentially introduce a race condition (if another thread is trying to call GetApplicationContext as the JNI-attached thread is shutting down).

Another thread would have a different jniEnv as it's thread-local.

If you're just making a thread-local copy of a single global jniEnv

No, the jniEnv is different on each thread. We're calling AttachCurrentThread to get a pointer to it on each spawned thread.

So potentially GetApplicationContext is being called before the lazy initializer is being called? What triggers the lazy initializer? Can GetApplicationContext (and other methods) be made to trigger initializeJni if necessary?

attach_thread() is the lazy initializer, if jniEnv is null, it calls AttachCurrentThread to populate it. This function is called on the first line of all JNI related methods including GetApplicationContext!

static inline void attach_thread() {
  if (jniEnv == NULL) {
    (*jni->jvm)->AttachCurrentThread(jni->jvm, __ENVP_CAST & jniEnv, NULL);
  }
}

This is a total showstopper for my app, because I can't even get through Play Store app review without the app crashing on load. I may have to launch without cronet.

As this is a problem with cronet, can you please open an issue there as well? You can simply link to this one. Unfortunately without a reproducible example, or logs I can't start the debugging process on my end.

@lukehutch
Copy link
Author

@HosseinYousefi I created the Cronet bug and test project as requested: dart-lang/http#1241

This test project doesn't crash on my machine, but the release build does lock up.

@HosseinYousefi
Copy link
Member

@HosseinYousefi I created the Cronet bug and test project as requested: dart-lang/http#1241

This test project doesn't crash on my machine, but the release build does lock up.

I can't reproduce the locking up of release build.

@lukehutch
Copy link
Author

@HosseinYousefi Is there anything else I can do to debug this?

@HosseinYousefi
Copy link
Member

@HosseinYousefi Is there anything else I can do to debug this?

Well, as you cannot reproduce it on your machine, and I don't seem to be able to reproduce it on mine, there is nothing we can do to fix it this way.

You say that the play store's checks fail. Can you share a minimal app that fails the play store checks? This way at least I can create a play store account and iterate to see what makes it fail.

You are the only person reporting this bug, so I highly doubt the example is going to be as simple as just initializing cronet.

@lukehutch
Copy link
Author

Well, as you cannot reproduce it on your machine

I am able to reproduce the lock-up issue on my machine with the minimal repro code that I shared. That code just does not crash in the same way as I originally reported -- but it should not lock up, so this is still worth looking into. Any tips for how to debug what is happening on my machine with that example?

You are the only person reporting this bug, so I highly doubt the example is going to be as simple as just initializing cronet.

The crash happens before Flutter starts, and the only thing that my app does after initializing cronet is to start Flutter. I looked to see if any other plugins use JNI, and there aren't any. Possibly this is a memory corruption bug.

@HosseinYousefi
Copy link
Member

I am able to reproduce the lock-up issue on my machine with the minimal repro code that I shared. That code just does not crash in the same way as I originally reported -- but it should not lock up, so this is still worth looking into. Any tips for how to debug what is happening on my machine with that example?

These are probably separate issues, I'm talking about the original crash. The lock-up issue might be a cronet specific one.

In any case these are some steps you can take:

What is the output of your flutter doctor?
Does this only happen with cronet_http as a dependency?
What about having only jni as a dependency and doing something simple, like print('hello'.toJString());, does this fail too?
Have you tried depending on the master branch of cronet? Make sure you do flutter pub upgrade and flutter clean before rebuilding.

@HosseinYousefi
Copy link
Member

I'm now pretty sure that the problem is not jni-related. Closing this issue. Let's continue any further discussions in the http thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

2 participants