Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-api: use c-based api for libnode embedding #54660

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

vmoroz
Copy link
Member

@vmoroz vmoroz commented Aug 30, 2024

Note: this is an active work in progress and there are still a lot of code churning. You are welcome to comment on the code and share your thoughts, but please be aware that the code is not final yet.

This is a temporary spin off from the PR #43542.
This separate PR is created to simplify merging and rebasing with the latest code while we discuss the new API design.
When the code is ready it should be merged back to PR #43542.

The goal of the original PR is to enable C API and the Node-API for the embedded scenarios.
The C API allows using the shared libnode from runtimes that do not interop with C++ such as WASM, C#, Java, etc.
This PR works towards the same goal with some changes to the original code.

This is the related issue #23265.

The API design principles

  • Follow the best practices of the Node-API design and provide a way to interop with it.
  • Prefix the new API constructs with node_embedding_.
  • Design the API for ABI safety and being future proof for new requirements.
    • Follow the Builder pattern for the API design.
    • The typical use is to create an object, configure it, initialize it based on the configuration, use it, and then delete it. The configuration changes are prohibited after the object is initialized.
    • What if the initialization sequence must be customized? It means that we add a new configuration function and insert a customization hook into the initialization sequence. Thus, we can evolve the API by adding new configuration functions, and occasionally deprecating the old functions.
    • All behavior changes must be associated with a new API version number.

The API usage

  • To use the C embedding API, we must create, configure, and initialize the global node_embedding_platform. It initializes Node and V8 JS engine once per process and parses the CLI arguments.
  • Then, we create, configure, and initialize one or more node_embedding_runtimes. A runtime is responsible for running JavaScript code.
  • The runtime CLI arguments are initialized by default with the args and exec_args from the result of the platform initialization. They can be overridden while configuring the runtime.
  • A runtime can run in its own thread, several runtimes can share the same thread, or the same runtime can be run from multiple threads.
  • The runtime event loop APIs provide control over the runtime execution. These functions can be called many times because they do not destroy the runtime in the end.
  • The runtime offers to specify version of Node-API and to retrieve the associated napi_api instance. Any Node-API code that uses the napi_env must be run in the runtime scope controlled by node_embedding_runtime_open_scope and node_embedding_runtime_close_scope functions.

The API overview

Based on the use scenarios, the API can be split up into six groups.

Error handling API

  • node_embedding_on_error sets the global error handling hook.

Global platform API

  • node_embedding_set_api_version
  • node_embedding_run_main
  • node_embedding_create_platform
  • node_embedding_delete_platform
  • node_embedding_platform_set_flags
  • node_embedding_platform_get_parsed_args

Runtime API

  • node_embedding_run_runtime
  • node_embedding_create_runtime
  • node_embedding_delete_runtime
  • node_embedding_runtime_set_flags
  • node_embedding_runtime_set_args
  • node_embedding_runtime_on_preload
  • node_embedding_runtime_on_start_execution
  • node_embedding_runtime_add_module
  • add API to handle unhandled exceptions

Runtime API to run event loops

  • node_embedding_runtime_set_task_runner
  • node_embedding_run_event_loop
  • node_embedding_complete_event_loop
  • node_embedding_terminate_event_loop
  • add API for emitting beforeExit event
  • add API for emitting exit event

Runtime API to interop with Node-API

  • node_embedding_run_node_api
  • node_embedding_open_node_api_scope
  • node_embedding_close_node_api_scope

Documentation

  • The new C embedding API is added to the existing embedding.md file after the C++ embedding API description.
  • The index.md is changed to indicate that the embedding.md has docs for C++ and C APIs.
  • TODO: complete the examples section.

Tests

  • The new C embedding API tests pass the same scenarios as the C++ embedding API tests.
  • The embedtest executable can be run in several modes controlled by the first CLI argument. It effectively contains several main functions for different test scenarios.
  • The JS test code is changed to provide the test mode argument based on the scenario.
  • Added several new test scenarios:
    • run several Node.js runtimes each in its own thread;
    • run several Node.js runtimes all in the same thread;
    • run Node.js runtime from different threads.
    • test that preload callback is called for the main and worker threads.

The PR status

The code is not 100% complete yet. There are still a few TODO items, but I would like to start a discussion with the Node-API team about the new API.

  • Address outstanding TODOs
    • Allow running Node.js uv_loop from UI loop. Follow the Electron
      implementation. - Complete implementation for non-Windows.
    • Can we use some kind of waiter concept instead of the
      observer thread?
    • Generate the main script based on the runtime settings.
    • Set the global Inspector for he main runtime.
    • Start workers from C++.
    • Worker to inherit parent Inspector.
    • Cancel pending event loop tasks on runtime deletion.
    • Can we initialize platform again if it returns early?
    • Test passing the V8 thread pool size.
    • Add a way to terminate the runtime.
    • Allow to provide custom thread pool from the app.
    • Consider adding a v-table for the API functions to simplify
      binding with other languages.
    • We must not exit the process on node::Environment errors.
    • Be explicit about the recoverable errors.
    • Store IsolateScope in TLS.
  • Review the API design
  • Write docs

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/node-api

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Aug 30, 2024
@vmoroz vmoroz marked this pull request as draft August 30, 2024 14:58
@legendecas legendecas added the node-api Issues and PRs related to the Node-API. label Aug 30, 2024
// Skip printing output for --help, --version, --v8-options.
node_api_platform_no_print_help_or_version_output = 1 << 12,
// Initialize the process for predictable snapshot generation.
node_api_platform_generate_predictable_snapshot = 1 << 14,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have an option which is something like

node_api_platform_nodejs_binary_default

which gives you the same configuration that is present for the node.js binary

typedef struct node_api_env_options__* node_api_env_options;

typedef enum {
node_api_platform_no_flags = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since a bunch of them seem to disable specific flags should there be an all_flags, or are they all on by default and then there are no/disable flags only?

Copy link
Member Author

@vmoroz vmoroz Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the approach since our last Node-API meeting. These flags are 1-to-1 mapping to the flags defined in the node.h. The default is the no_flags configuration. Then, embedders can disable some default Node.js features.
We can add an alias for the no_flags as a default_flags.

src/node_api_embedding.cc Outdated Show resolved Hide resolved
return napi_ok;
}

napi_status NAPI_CDECL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an engine does not support snapshots, can it just do nothing in the snapshot functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. Maybe we can change it in a way that the snapshot can be just a JS text. In JSI they use term "prepared JavaScript" for the same purpose. The only question if we want this API to be Node-specific, or we rather target it to be Runtime/engine independent. E.g. I use this API with the jsr_ prefix across the V8 and Hermes JS engines (it is also based on the Node-API): https://github.com/microsoft/v8-jsi/blob/master/src/node-api/js_runtime_api.h

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are already using it, being cross runtime might makes sense, just need to makes sure its easy for a platform to not support it and still have the same code run.

return std::move(env_setup_);
}

napi_status OpenScope() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that a scope is something different than a handle_scope - https://nodejs.org/api/n-api.html#napi_handle_scope, just wondering if there might be confusion between the concepts?

Copy link
Member Author

@vmoroz vmoroz Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is different. When we are inside of a module we already have some current v8::Isolate and v8::Context. We do not have them when we are outside and operating with the environment. So, we must establish them to use any V8/Node API. In the standalone v8-jsi project I used a function jsr_run_task that opens/closes the scope internally. (edit: I see that the v8-jsi also has the open/close scope. It is convenient to use when we do not want to create a lot of lambdas.)

doc/api/embedding.md Outdated Show resolved Hide resolved
doc/api/n-api.md Outdated Show resolved Hide resolved
src/node_api_embedding.h Outdated Show resolved Hide resolved
return napi_ok;
}

napi_status NAPI_CDECL node_api_open_env_scope(napi_env env) {
Copy link
Member

@mhdawson mhdawson Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if all functions which are only to be called as part of embedding versus in an add-on implementation should have some extra bit in the name. For exampe in this method node_api_embed_open_env_scope

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This open/close scope API gives us a lot of flexibility, but it is difficult to use and like you said it is quite confusing.
I am currently considering to replace it with a function that receives a lambda (c function + void state), and then the napi_env will be available only for that lambda. Other APIs will change from using napi_env to something like node_embedding_env or node_embedded_env.

return napi_ok;
}

napi_status NAPI_CDECL node_api_await_promise(napi_env env,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought this might be an extension to the promise support we already have - https://nodejs.org/api/n-api.html#promises

This is a good example were I think we needed the embed or something else in the name as otherwise people might get confused and think it could be called from an addon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe the prefix should be node_embedding_api_XXX

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we shorten it to the node_embedding_?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with node_embedding_

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All API is changed to use the napi_embedding_ prefix.

@vmoroz vmoroz added the embedding Issues and PRs related to embedding Node.js in another project. label Sep 11, 2024
@vmoroz
Copy link
Member Author

vmoroz commented Sep 13, 2024

This PR was discussed today 9/13/2024 in the Node-API meeting.
This is the summary as I recall it. @mhdawson , @legendecas , @KevinEady , @gabrielschulhof , feel free to augment this comment in case if I missed or misunderstood something.

  • The global error handling callback.
    • The initial suggestion from the team was to use the "get-the-last-error" approach as in the Node-API.
    • The counter argument was that while the last error approach works great in the single threaded case it may not work in the multi-thread environment.
    • Since the detailed error info is mostly used for logging, implementing it in the single place is much simpler.
    • The default C embedded API error handler prints the error message to the stderr and exits the process. It is intended to handle "non-recoverable" errors such as wrong argument value passed to the API, or wrong CLI arguments.
    • The related question was what to do if a V8 Isolate runs out of memory. It needs to be investigated, but I guess the answer is that it will be handled by Node.js as it is handled today. The C embedding API does not currently participate in the process. If Node.js typically recovers from that condition, then it must continue doing it.
  • Does the new C-based embedding API has a goal to do the same as the C++ embedding API?
    • The answer is "yes" and "no", or better to say "it depends".
    • While we want to have the same functionality, there is no goal to wrap up all existing C++ embedding APIs.
    • The new C embedding API is going to grow based on the scenarios, and we hope that the Builder pattern let us evolve the API without ABI-breaking changes.
    • The C embedding API is going to be implemented on the top of the existing C++ embedded API.
  • The API growth based on Builder pattern aims to inject various callbacks in the different parts of the initialization process when needed. E.g. if the Electron needs to do some extra work between the CLI args parsing and V8 platform initialization, then we can add a callback that can be called between these steps.
    • The concern is that such hooks may bloat the C embedding API. Would it be better to use the V8 API instead such as rusty_v8?
    • The answer is that hopefully we are not going to have too many hooks.
    • Providing the C wrappers around the whole V8 API seems to be outside of scope of this PR. One of the goals is to see if we can implement the API in a way that it might be useful for other JS runtimes and engines. Though it is not strictly necessary.
    • Another approach is to see if the whole initialization process can be represented as a pipeline connecting various tasks, and then the embedder can configure the sequence of the tasks in the pipeline.
  • Why to create the new C based embedder API if the C++ embedder API provides much more freedom?
    • The main goal is to provide access to shared libnode from languages that do not support C++ interop. E.g. C#.
  • Will the new API make it it be more difficult to support and change the C++ embedding API?
    • In many cases the C API is just a thin wrapper on top of the C++ API. Hopefully it will not introduce too many issues.
  • It is worth to focus on specific use cases.
    • It is a good point and it should help us to introduce only a bare minimal API to start with. Then, we can grow it based on the new scenarios.
    • We discussed if we should start with a single threaded cases.
    • For one of my use cases it is not enough: we want to use libnode from ASP.NET where we must run multiple threads.
    • Should we have one primary Runtime and others are just the worker threads?
      • The node::Environment was introduced in Node.js to implement worker threads in Node.js.
      • Unlike the worker thread created from JS, the embedder has a control over the thread where the node::Environment is executed.
      • It maybe makes sense to have a single "root" node::Environment and others to be dependent upon it. It must address the issue with the Inspector that currently can be only attached to a single node::Environment or its child worker threads.
  • Should we support Node.js experimental features such as the snapshots and the ES6 modules.
    • Since the C embedding API is also an experimental feature, I do not see big drawbacks against it as long as the C API experimental status will be aligned with the features experimental status.
  • We have discussed the node_embedding_runtime_add_module function.
    • The function allows to add native modules that can be implemented in the same executable that embeds the libnode.
    • The implementation simply wraps the existing linked modules implementation available in the C++ embedders API.
    • We should consider to rename it to reduce confusion.
  • Why do we need to invoke the Node-API code inside of a callback for the node_embedding_runtime_invoke_node_api?
    • Unlike use of the Node-API inside of the native modules, embedders must explicitly establish the V8 Isolate context, etc and then handle the Node-API and JS errors. This function is responsible for taking case of these tasks.
    • The callback for node_embedding_runtime_on_preload and node_embedding_runtime_add_module functions use the same Node-API CallIntoModule internal function.
    • As an alternative we can return back the node_embedding_runtime_open_scope and node_embedding_runtime_close_scope functions.

Copy link
Member

@legendecas legendecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the new C-based embedding API has a goal to do the same as the C++ embedding API?

  • The answer is "yes" and "no", or better to say "it depends".
  • While we want to have the same functionality, there is no goal to wrap up all existing C++ embedding APIs.
  • The new C embedding API is going to grow based on the scenarios, and we hope that the Builder pattern let us evolve the API without ABI-breaking changes.
  • The C embedding API is going to be implemented on the top of the existing C++ embedded API.

I think this question could be better addressed with an approach for embedders to opt into the "bleeding-edge" C++ API, like mentioned in #43542 (comment). An embedder can highly customize the behavior of V8/Node.js, e.g. Inspectors. If such advanced needs arise in an embedder that already adopted the C embedding API, I believe it would not be trivial for them to migrate to C++ based APIs. Allowing conversion between C/C++ API types would reduce the gaps for embedders using the two variant interfaces.


// Initializes the Node.js platform.
NAPI_EXTERN node_embedding_exit_code NAPI_CDECL
node_embedding_platform_initialize(node_embedding_platform platform,
Copy link
Member

@legendecas legendecas Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flags and args are unsafe to be changed after initialization. I think node_embedding_platform_set_flags and node_embedding_platform_set_args should be merged in node_embedding_platform_initialize.

To allow builder style API for ABI compatibility, we can add an opaque initialization parameter bag and pass it here:

typedef struct node_embedding_platform_init_opt__* node_embedding_platform_init_opt;

NAPI_EXTERN node_embedding_exit_code NAPI_CDECL
node_embedding_platform_initialize(node_embedding_platform platform,
                                   node_embedding_platform_init_opt opt,
                                   bool* early_return);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I was also thinking along the same lines. As a result the "initialization" and "is_initialized" methods are removed and the "create" methods have a new configuration callback where all the builder set methods are called.
Based on your feedback I am going to go one more step further and introduce opaque "options" types so that all these "set_" and "on_" methods are only applicable to these "optons" types.
This way it would be easy to understand when it is safe to call them.


// Runs the Node.js runtime event loop.
NAPI_EXTERN node_embedding_exit_code NAPI_CDECL
node_embedding_runtime_run_event_loop(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though libuv doesn't provide ABI guarantees, I think we should expose uv in embedder C API instead of wrapping it, allowing more flexibility of this C API since embedders have more control over libnode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree with you that it can provide more flexibility, my concern is that non-C/C++ users have to bind to a much broader set of APIs which like you said may not be ABI safe.
Also, while we offer functions that seems to do the same as the uv_run, in practice they may be doing some more Node.js work such as draining the v8::Isolate work items.

Currently the node_api.h exposes the uv_loop_t associated with the napi_env. Technically, users can get it today. Though, we were discussing to deprecate it.

I also would like to explore a scenario where the UV loop is not used for processing the task queue, but we rather offer something like the V8 foreground task runner where the embedder API is responsible for pumping the task queue.

Thus, my proposal is to wait for a scenario where exposing the raw uv_loop_t is required before we add it.
Adding new API is easy, deprecating is more difficult.

@vmoroz
Copy link
Member Author

vmoroz commented Sep 20, 2024

We have discussed the API today 09/20/2024 with @mhdawson. The key take aways:

  • It is not clear how to use the new node_embedding_on_wake_up_event_loop. Its goal is to enable running UV loop tasks in app's UI event loop. It is not obvious how to use it. After the discussion and replying to @legendecas feedback, I started to consider replacing it with a V8-like "foreground task runner" concept. It is being currently used for the V8 ABI safe API based on Node-API.
  • It would be great to provide the key scenarios which this API targets to address.
  • An API function that supposed to aggregate other functions must use them for its implementation rather than calling existing Node.js aggregating implementations. E.g. the node_embedding_complete_event_loop must use node_embedding_run_event_loop and other currently missing functions to raise Node.js beforeExit and exit events. This way we can validate that we expose the right APIs and developers can use the low level functions without hitting a wall.
  • We also discussed an idea to replace the "callback+data" pairs with small structs. Hopefully it can make the API easier to use from C++ and other languages, and reduce the number of parameters in some cases.
  • The API is still churning. It is probably worth to get another review pass in a couple of weeks.

@vmoroz
Copy link
Member Author

vmoroz commented Sep 23, 2024

Does the new C-based embedding API has a goal to do the same as the C++ embedding API?

I think this question could be better addressed with an approach for embedders to opt into the "bleeding-edge" C++ API, like mentioned in #43542 (comment). An embedder can highly customize the behavior of V8/Node.js, e.g. Inspectors. If such advanced needs arise in an embedder that already adopted the C embedding API, I believe it would not be trivial for them to migrate to C++ based APIs. Allowing conversion between C/C++ API types would reduce the gaps for embedders using the two variant interfaces.

I just do not see how it can be done in practice.
The C API is targeting languages that cannot do the C++ interop. E.g. C#, Python, or a C++ compiler that does not understand the libnode C++ mangled/decorated names.
If they cannot interop with C++, then converting between C and C++ cannot help.
From another hand, if the embedder code can work with C++ API, then I do not see a point to use the C API.

The only real "escape hatch" is to add the missing functionality to the C API and compile the libnode privately until the PR is accepted by Node.js. Thus, the C API is designed to be extensible from the beginning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. embedding Issues and PRs related to embedding Node.js in another project. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. node-api Issues and PRs related to the Node-API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants