-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Ocaml support #166
base: main
Are you sure you want to change the base?
[WIP] Ocaml support #166
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thanks for taking the time to do this!
Currently though calling
infer
fails with this unwrap getting it upsetcarton-runner-interface/src/client.rs:172:41
. It looks to be something with mpsc closing.
It looks like things are still in progress, but if it continues to be an issue and you'd like help figuring out what's going on, feel free to post the backtrace.
Anyways all of this is to just say that I am working on Ocaml support and hopefully will get this to work. I believe once I can share the Carton struct that it will be pretty easy to do the rest.
Great!
I know this is still a draft, but I added a few high level questions below
pub enum Tensor { | ||
Float(Vec<f32>), | ||
Double(Vec<f64>), | ||
I8(Vec<i64>), | ||
I16(Vec<i64>), | ||
I32(Vec<i64>), | ||
I64(Vec<i64>), | ||
U8(Vec<i64>), | ||
U16(Vec<i64>), | ||
U32(Vec<i64>), | ||
U64(Vec<i64>), | ||
String(Vec<String>), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to structure the OCaml interface so that it's possible for us to share numeric tensor data between Rust and OCaml (i.e. without having to make copies)?
I realize the fact that carton::types::Tensor
is generic makes usage with tools like cxx
or ocaml_rust
tricky.
Conveniently, I should have a PR up tonight that changes Tensor
to no longer be generic (for other reasons). I'll link it here when it's up. Hopefully that helps clean up some ffi code as well.
I'm aware that we currently also copy numeric tensor data in the Python bindings (only when going from Rust to Python), but the interface with Python is designed in a way where we can remove those copies without changing the interface. Can we do something similar for OCaml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the PR: #167
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually help a lot implementing things.
The reason I am copying and such is that ocaml only supports Float
, Double
, int32
, and int64
, so we cant share by reference as ocaml could not understand some of the smaller ints. There are libraries to add support for other types but I believe if we wanted to make it have all the types in rust we would have have to fork ocaml-rust
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also update my enum to have a i32 for the lower values if there is no way to not copy the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. As I mentioned in my other comment, it's been a long time since I've written OCaml so I don't have a whole lot of context. Is there a popular ndarray library for OCaml that we could use in the interface?
For example:
- Python has Numpy,
- Rust has the
ndarray
crate - JavaScript has a few usable ones on top of
TypedArray
s - C lets us pass pointers to data (along with a deleter function to free the data when it's no longer needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see if I understand what these libraries accomplish. Is the goal to have a chunk of memory with a fixed length, and the "basic" types of a language (the ones specified in the tensor enum)?
I believe in Ocaml the Base.Array
is a vector (I believe what its called when its not a linked list) so all the data will be in the same block of memory. To add support for the different number types we can use stdint (will give us support for the rest of the ints needed) then all the data would be the same as a rust vector.
If I am misunderstanding the reason for the need of these types, then there does look to be np for Ocaml that give support for matrixes and multi dementinal arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of these libraries is to make interacting with multidimensional arrays (i.e. ndarrays) easier.
For a numeric multidimensional array, technically all you need is a base data pointer along with the shape and strides of the array. That's not particularly convenient to use directly however, so several ndarray libraries implement helpful functionality on top of the raw data.
It looks like OCaml has BigArray built in which appears to support a lot of features we need. Importantly, it also lets us pass Tensors between OCaml and Rust without having to make a copy of the underlying data. I'm not sure if you can use this easily using ocaml-rust
though.
override_runner_name: Option<String>, | ||
override_required_framework_version: Option<String>, | ||
) -> Result<CustomConst<Carton>, Custom<CartonError>> { | ||
Runtime::new().unwrap().block_on(async { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way for us to expose async functionality to OCaml from Rust?
I've been exploring the same thing for C and C++ and am writing a doc that should be in an upcoming PR with either the C or C++ bindings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am no Ocaml god, and am mainly just interested in learning Ocaml. Most of the async in ocaml that I have seen is by using a specific library and preprocessor (Lwt
) as Ocaml does not nativity understand what async is. There would have to be some way to return a Ocaml type but also call functions to the async library from rust. I don't know if that is possible, and if it is I would not even know how to start doing that.
Once you get the C/C++ bindings I will check how you explore async, and will see if I can do something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that's widely used? I haven't written OCaml in a very long time so I don't have much context.
Here are the C bindings: #169 (the PR is larger than I hoped; sorry :/)
The doc I mentioned is not included in that PR although the included README.md provides an overview of the async approach for C.
My C++ PR will talk about design tradeoffs a little, but the short version is that I'm likely going to have async Rust functions return std::future
s in C++ and also provide something like CartonAsyncNotifier
from the C interface (because polling several std::future
s is inefficient).
If there isn't native async/coroutine/green threads/event loop functionality in OCaml, maybe we can do something similar to what I'm planning to do for C++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never interacted with the janestreet async library, so I am guessing its not super widely used. It looks very similar to Lwt though. With Ocaml I have not done much of writing async functions and have only really used them, so I will have to do some research on how you can make your own async functions with Lwt to see if it is possible to add support through ffi.
As for the initial of your c ffi, it looks like you are making your own version of async. I will have to see if I can implement the Ocaml async with a bigger library as having the "normal" feel to it would be a lot better. If it is not possible then I will probably just have to do what you did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as having the "normal" feel to it would be a lot better.
Yes, that's generally a priority for the bindings. It should feel like it "belongs" in the language.
The callback + opaque user-provided argument approach is fairly common in C, but we shouldn't directly translate that to other languages unless there isn't a better alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, I decided not to use cxx
for the C++ bindings.
It doesn't natively support some features that are important for making the C++ API feel normal (namely template instantiation). For some other parts of the API, it would either require extra copies or require us to write wrapper C++ code on top of the generated cxx
code to avoid these copies.
So if we need to write wrapper C++ code and/or create (fairly tedious) workarounds to get a normal feeling API, we might as well implement idiomatic C++ bindings on top of the C API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ PR is up at #174. It's also unfortunately quite large, but the included README.md
might be worth taking a look at
Currently what I believe I have gotten it to, is this |
Ah I see what's going on. You're creating a new tokio runtime on each call. Try something like this instead (from the C bindings): carton/source/carton-bindings-c/src/utils.rs Lines 19 to 23 in a15b3c3
|
Well, that fixed it. You have no idea how long I have been banging my head on this problem. I will finally work on the 2 things you pointed out / suggested. |
Yeah I know what that's like :) Glad I could help! |
Hey @arthurmelton! Just wanted to check if you're blocked on something that I might be able to help with. (Totally fine if things are just busy and you haven't gotten a chance to spend more time on this :) ) |
Yea kinda busy, I think during this week I should have some time to work on things. |
Hey I just wanted to say that much of my free time has gone away unexpectedly. I though I would have time this week but there was not much. I will continue working on this when I have more time but if someone wants my very minimal genarray code, this is what I have so far. got_so_far.diff.txt (because github does not support |
This is attempting to add ocaml support. Currently it only supports loading a model and infering to just follow the quickstart. The quickstart for ocaml currently looks like the following:
To build this you will have to be in
carton/source/carton-bindings-ocaml/
and call thebuild.sh
script. This will build and move all the files where they need to be. If you haveopam
installed then you can doopam install ocamlml/
.Currently though calling
infer
fails with this unwrap getting it upsetcarton-runner-interface/src/client.rs:172:41
. It looks to be something withmpsc
closing. I don't fully understand where it is being closed/cloned but the only place I see is LaurentMazare/ocaml-rust/src/custom.rs#L108, but this does not seam to make any sense.Anyways all of this is to just say that I am working on Ocaml support and hopefully will get this to work. I believe once I can share the
Carton
struct that it will be pretty easy to do the rest.