Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JObject.implement deadlocks #1908

Open
knopp opened this issue Jan 16, 2025 · 4 comments
Open

JObject.implement deadlocks #1908

knopp opened this issue Jan 16, 2025 · 4 comments

Comments

@knopp
Copy link
Contributor

knopp commented Jan 16, 2025

Java code

import android.os.Handler;
import android.os.Looper;

public class Deadlock {
    public interface Delegate {
        void perform();
    }

    public static void deadlock(Delegate delegate) {
        Handler handler = new Handler(Looper.getMainLooper());
        handler.post(delegate::perform);
    }
}

Dart code

  final delegate = Deadlock$Delegate.implement($Deadlock$Delegate(perform: () {
    print('Hello');
  }));
  Deadlock.deadlock(delegate);

Java Stack trace

_invoke:-1, PortProxyBuilder (com.github.dart_lang.jni), PortProxyBuilder.java
invoke:143, PortProxyBuilder (com.github.dart_lang.jni), PortProxyBuilder.java
2 hidden frames
run:0, Deadlock$$ExternalSyntheticLambda0 (com.superlist.super_native_dialogs), D8$$SyntheticClass
8 hidden frames

Native Stack trace

[libc.so] syscall 0x000000710b8d63cc
[libdartjni.so] wait_for dartjni.h:119
[libdartjni.so] Java_com_github_dart_1lang_jni_PortProxyBuilder__1invoke dartjni.c:459

The problem that Java_com_github_dart_1lang_jni_PortProxyBuilder__1invoke checks Dart_CurrentIsolate_DL to determine whether the call is coming from another thread, and if that returns null it sends message on port and wait. However in case of Flutter on Android, the platform thread is the isolate thread, which means it is essentially blocking the main thread. Note that Dart_CurrentIsolate_DL returns null, because after posting the callback to main looper the isolate has been exited.

The solution that would work in the context of Flutter is to remember the thread Id alongside isolate, and if the thread Id matches, calling Dart_EnterIsolate_DL and Dart_ExitIsolate_DL around the trampoline.

Now while this works for Flutter, I'm not sure the solution is generic enough since it makes assumption about the isolate being "pinned" to a specific thread.

cc @HosseinYousefi

@knopp
Copy link
Contributor Author

knopp commented Jan 16, 2025

Note that I'm having same issue with using ffi with NativeFunction.isolateLocal, but because the counterpart is C code, calling Dart_EnterIsolate_DL and Dart_ExitIsolate_DL manually is not as inconvenient as having to do that in Java

@HosseinYousefi
Copy link
Member

Now while this works for Flutter, I'm not sure the solution is generic enough since it makes assumption about the isolate being "pinned" to a specific thread.

Yes, that's why we didn't do this before. Maybe we can only do this when we detect that we're on the main isolate of a Flutter application where the thread is indeed pinned.

cc @dcharkes @liamappelbe @mkustermann for ideas.

@liamappelbe
Copy link
Contributor

Note that I'm having same issue with using ffi with NativeFunction.isolateLocal, but because the counterpart is C code, calling Dart_EnterIsolate_DL and Dart_ExitIsolate_DL manually is not as inconvenient as having to do that in Java

Can you elaborate on this? Are you seeing a deadlock with NativeFunction.isolateLocal itself, or when waiting for a response message? I wouldn't expect NativeFunction.isolateLocal to ever deadlock.

Yes, that's why we didn't do this before. Maybe we can only do this when we detect that we're on the main isolate of a Flutter application where the thread is indeed pinned.

IIUC, jnigen does blocking callbacks similarly to ffigen, and there are 2 code paths. When the callback is coming from a random thread, it sends a message to the target isolate and waits for a reply. When the callback is coming from the same thread as the target isolate, the callback is invoked synchronously. And it sounds like the issue here is that the check that decides which code path to take is a bit unreliable on flutter.

jnigen is using Dart_CurrentIsolate_DL to do this check, and ffigen is using the current thread ID. Both have issues, since as you say, we don't pin isolates to a particular thread. Maybe the best we can do atm is to check both?

  • When the callback is created, save the current isolate ID and the current thread ID
  • When the callback is invoked
    • If the isolate ID matches, or the current isolate is null but the thread ID matches, use the synchronous code path (enter the target isolate first if the current isolate is null)
    • Otherwise use the message sending code path

Another option would be to discard the message sending code path entirely, and just enter the target isolate and invoke the callback synchronously. In fact, this is one of the NativeCallable proposals that hasn't been implemented yet:

  • When the callback is created, save the current isolate ID
  • When the callback is invoked
    • If the isolate ID matches, call the callback synchronously
    • If the current isolate is null, enter the target isolate, call the callback, then exit the target isolate
    • If the current isolate is non-null and doesn't match, save the ID and exit the isolate, enter the target isolate, call the callback, exit the target isolate, then re-enter the original isolate

@knopp
Copy link
Contributor Author

knopp commented Jan 17, 2025

Apologies for confusion, perhaps shouldn't have mixed these under same issue. The NativeFunction.isolateLocal situation is different. It does not deadlock, it just fails when Dart_CurrentIsolate_DL returns NULL. I.e. consider the following on iOS where platform and UI threads are merged.

// this works because dart_ffi_callback is called while isolate is active
void dart_ffi_callback(void (*isolate_local_trampoline)(void)) {   
   isolate_local_trampoline();
}

// this doesn't work, even though the trampoline is invoked on same thread, because 
// the trampoline is invokedwhile pumping the dispatch queue and isolate is
// no longer active.
void dart_ffi_callback(void (*isolate_local_trampoline)(void)) {   
   dispatch_async(dispatch_get_main_queue(), ^{
     // same thread, fails.
     isolate_local_trampoline();
   });
}

// This works again. This could be done automatically by the trampoline if we saved thread Id 
// with the callback metadata, but it might be the wrong thing to do if we don't know that isolate
// is always running on a particular thread (i.e. like flutter UI thread).
void dart_ffi_callback(void (*isolate_local_trampoline)(void)) {   
   Dart_Isolate isolate = Dart_CurrentIsolate_DL();
   dispatch_async(dispatch_get_main_queue(), ^{
     Dart_EnterIsolate_DL(isolate);
     isolate_local_trampoline();
     Dart_ExitIsolate_DL(isolate);
   });
}

As far as I can tell, unlike jnigen, dart ffi trampolines never block? NativeFunction.isolateLocal simply fails if the isolate thread local is not set, and NativeFunction.listener only post on port and does not attempt to propagate the return value so it never blocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants