Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Ocaml support #166

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

arthurmelton
Copy link

This is attempting to add ocaml support. Currently it only supports loading a model and infering to just follow the quickstart. The quickstart for ocaml currently looks like the following:

let () =
  let model = Result.get_ok @@ Cartonml.load "https://carton.pub/google-research/bert-base-uncased" in
  let out = Cartonml.infer model [| ("input", Cartonml.String [| "Today is a good [MASK]." |]) |] in
  (*
  [|("tokens", Cartonml.String [|"day"|]);
    ("scores", Cartonml.Float [|14.5513076782226562|])|]
  *)

To build this you will have to be in carton/source/carton-bindings-ocaml/ and call the build.sh script. This will build and move all the files where they need to be. If you have opam installed then you can do opam install ocamlml/.

Currently though calling infer fails with this unwrap getting it upset carton-runner-interface/src/client.rs:172:41. It looks to be something with mpsc closing. I don't fully understand where it is being closed/cloned but the only place I see is LaurentMazare/ocaml-rust/src/custom.rs#L108, but this does not seam to make any sense.

Anyways all of this is to just say that I am working on Ocaml support and hopefully will get this to work. I believe once I can share the Carton struct that it will be pretty easy to do the rest.

Copy link
Owner

@VivekPanyam VivekPanyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for taking the time to do this!

Currently though calling infer fails with this unwrap getting it upset carton-runner-interface/src/client.rs:172:41. It looks to be something with mpsc closing.

It looks like things are still in progress, but if it continues to be an issue and you'd like help figuring out what's going on, feel free to post the backtrace.

Anyways all of this is to just say that I am working on Ocaml support and hopefully will get this to work. I believe once I can share the Carton struct that it will be pretty easy to do the rest.

Great!

I know this is still a draft, but I added a few high level questions below

Comment on lines +9 to +21
pub enum Tensor {
Float(Vec<f32>),
Double(Vec<f64>),
I8(Vec<i64>),
I16(Vec<i64>),
I32(Vec<i64>),
I64(Vec<i64>),
U8(Vec<i64>),
U16(Vec<i64>),
U32(Vec<i64>),
U64(Vec<i64>),
String(Vec<String>),
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to structure the OCaml interface so that it's possible for us to share numeric tensor data between Rust and OCaml (i.e. without having to make copies)?

I realize the fact that carton::types::Tensor is generic makes usage with tools like cxx or ocaml_rust tricky.

Conveniently, I should have a PR up tonight that changes Tensor to no longer be generic (for other reasons). I'll link it here when it's up. Hopefully that helps clean up some ffi code as well.

I'm aware that we currently also copy numeric tensor data in the Python bindings (only when going from Rust to Python), but the interface with Python is designed in a way where we can remove those copies without changing the interface. Can we do something similar for OCaml?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the PR: #167

Copy link
Author

@arthurmelton arthurmelton Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually help a lot implementing things.

The reason I am copying and such is that ocaml only supports Float, Double, int32, and int64, so we cant share by reference as ocaml could not understand some of the smaller ints. There are libraries to add support for other types but I believe if we wanted to make it have all the types in rust we would have have to fork ocaml-rust.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also update my enum to have a i32 for the lower values if there is no way to not copy the data.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. As I mentioned in my other comment, it's been a long time since I've written OCaml so I don't have a whole lot of context. Is there a popular ndarray library for OCaml that we could use in the interface?

For example:

  • Python has Numpy,
  • Rust has the ndarray crate
  • JavaScript has a few usable ones on top of TypedArrays
  • C lets us pass pointers to data (along with a deleter function to free the data when it's no longer needed)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I understand what these libraries accomplish. Is the goal to have a chunk of memory with a fixed length, and the "basic" types of a language (the ones specified in the tensor enum)?

I believe in Ocaml the Base.Array is a vector (I believe what its called when its not a linked list) so all the data will be in the same block of memory. To add support for the different number types we can use stdint (will give us support for the rest of the ints needed) then all the data would be the same as a rust vector.

If I am misunderstanding the reason for the need of these types, then there does look to be np for Ocaml that give support for matrixes and multi dementinal arrays.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of these libraries is to make interacting with multidimensional arrays (i.e. ndarrays) easier.

For a numeric multidimensional array, technically all you need is a base data pointer along with the shape and strides of the array. That's not particularly convenient to use directly however, so several ndarray libraries implement helpful functionality on top of the raw data.

It looks like OCaml has BigArray built in which appears to support a lot of features we need. Importantly, it also lets us pass Tensors between OCaml and Rust without having to make a copy of the underlying data. I'm not sure if you can use this easily using ocaml-rust though.

override_runner_name: Option<String>,
override_required_framework_version: Option<String>,
) -> Result<CustomConst<Carton>, Custom<CartonError>> {
Runtime::new().unwrap().block_on(async {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for us to expose async functionality to OCaml from Rust?

I've been exploring the same thing for C and C++ and am writing a doc that should be in an upcoming PR with either the C or C++ bindings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no Ocaml god, and am mainly just interested in learning Ocaml. Most of the async in ocaml that I have seen is by using a specific library and preprocessor (Lwt) as Ocaml does not nativity understand what async is. There would have to be some way to return a Ocaml type but also call functions to the async library from rust. I don't know if that is possible, and if it is I would not even know how to start doing that.

Once you get the C/C++ bindings I will check how you explore async, and will see if I can do something similar.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that's widely used? I haven't written OCaml in a very long time so I don't have much context.

Here are the C bindings: #169 (the PR is larger than I hoped; sorry :/)

The doc I mentioned is not included in that PR although the included README.md provides an overview of the async approach for C.

My C++ PR will talk about design tradeoffs a little, but the short version is that I'm likely going to have async Rust functions return std::futures in C++ and also provide something like CartonAsyncNotifier from the C interface (because polling several std::futures is inefficient).

If there isn't native async/coroutine/green threads/event loop functionality in OCaml, maybe we can do something similar to what I'm planning to do for C++?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never interacted with the janestreet async library, so I am guessing its not super widely used. It looks very similar to Lwt though. With Ocaml I have not done much of writing async functions and have only really used them, so I will have to do some research on how you can make your own async functions with Lwt to see if it is possible to add support through ffi.

As for the initial of your c ffi, it looks like you are making your own version of async. I will have to see if I can implement the Ocaml async with a bigger library as having the "normal" feel to it would be a lot better. If it is not possible then I will probably just have to do what you did.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as having the "normal" feel to it would be a lot better.

Yes, that's generally a priority for the bindings. It should feel like it "belongs" in the language.

The callback + opaque user-provided argument approach is fairly common in C, but we shouldn't directly translate that to other languages unless there isn't a better alternative.

Copy link
Owner

@VivekPanyam VivekPanyam Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I decided not to use cxx for the C++ bindings.

It doesn't natively support some features that are important for making the C++ API feel normal (namely template instantiation). For some other parts of the API, it would either require extra copies or require us to write wrapper C++ code on top of the generated cxx code to avoid these copies.

So if we need to write wrapper C++ code and/or create (fairly tedious) workarounds to get a normal feeling API, we might as well implement idiomatic C++ bindings on top of the C API.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ PR is up at #174. It's also unfortunately quite large, but the included README.md might be worth taking a look at

@VivekPanyam VivekPanyam added new language Adding support for a new programming language and removed new language Adding support for a new programming language labels Oct 4, 2023
@VivekPanyam VivekPanyam mentioned this pull request Oct 4, 2023
@arthurmelton
Copy link
Author

Currently what I believe I have gotten it to, is this block_on is breaking it on source/carton-bindings-ocaml/src/lib.rs L19. I believe what is happening is once this block_on finishes and while it is trying to evaluate the async function it is dropping the original item and doing so somehow breaks the mpsc channel. I believe what is going to have to happen is either this gets connected to a ocaml async library or some custom await function inside the library, but that would not be ideal. Because the rust future size is not known at compile time I dont think it is possible to transfer the rust future to and from ocaml.

@VivekPanyam
Copy link
Owner

VivekPanyam commented Oct 8, 2023

Currently what I believe I have gotten it to, is this block_on is breaking it

Ah I see what's going on. You're creating a new tokio runtime on each call. Try something like this instead (from the C bindings):

/// A utility to lazily start a tokio runtime
pub(crate) fn runtime() -> &'static Runtime {
static CELL: OnceLock<Runtime> = OnceLock::new();
CELL.get_or_init(|| Runtime::new().unwrap())
}

@arthurmelton
Copy link
Author

Ah I see what's going on. You're creating a new tokio runtime on each call. Try something like this instead (from the C bindings):

Well, that fixed it. You have no idea how long I have been banging my head on this problem. I will finally work on the 2 things you pointed out / suggested.

@VivekPanyam
Copy link
Owner

Well, that fixed it. You have no idea how long I have been banging my head on this problem.

Yeah I know what that's like :) Glad I could help!

@VivekPanyam
Copy link
Owner

Hey @arthurmelton! Just wanted to check if you're blocked on something that I might be able to help with.

(Totally fine if things are just busy and you haven't gotten a chance to spend more time on this :) )

@arthurmelton
Copy link
Author

Yea kinda busy, I think during this week I should have some time to work on things.

@arthurmelton
Copy link
Author

arthurmelton commented Oct 21, 2023

Hey I just wanted to say that much of my free time has gone away unexpectedly. I though I would have time this week but there was not much. I will continue working on this when I have more time but if someone wants my very minimal genarray code, this is what I have so far. got_so_far.diff.txt (because github does not support .diff files for some reason)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants