Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Source Generation #33

Open
RichiCoder1 opened this issue Nov 18, 2023 · 14 comments
Open

Use Source Generation #33

RichiCoder1 opened this issue Nov 18, 2023 · 14 comments
Assignees

Comments

@RichiCoder1
Copy link

RichiCoder1 commented Nov 18, 2023

Ideally it'd be excellent to see this PDK support writing more idiomatic .NET Code via source generation.

For example, turning the Getting Started example from:

using System;
using System.Runtime.InteropServices;
using System.Text.Json;
using Extism;

namespace MyPlugin;
public class Functions
{
    public static void Main()
    {
        // Note: a `Main` method is required for the app to compile
    }

    [UnmanagedCallersOnly(EntryPoint = "greet")]
    public static int Greet()
    {
        var name = Pdk.GetInputString();
        var greeting = $"Hello, {name}!";
        Pdk.SetOutput(greeting);

        return 0;
    }
}

to

using Extism;

namespace MyPlugin;

public partial class Functions
{
	[WasmExport(Name = "greet")]
    public static string Greet(string name)
    {
		return $"Hello, {name}";
    }
}

with something like this being generated under the covers:

namespace MyPlugin;
public partial class Functions
{
    public static void Main() { }

    [UnmanagedCallersOnly(EntryPoint = "greet")]
    public static int <>__Greet0()
    {
        var name = global::Extism::Pdk.GetInputString();
		var output = Functions::Greet(name);
        global::Extism::Pdk.SetOutput(output);

        return 0;
    }
}
@NickDarvey
Copy link
Contributor

I think this could be added with new package (like a Extism.Pdk.CSharp or Extism.Pdk.SourceGenerators) so people who want the control or to use it from F# can still use the core package.

@mhmd-azeez
Copy link
Collaborator

I agree with @NickDarvey that anything we do should take F# into account too. @RichiCoder1 do you only have strings in mind or do you also mean complex objects (structs, classes, and records)?

Also, would you like to contribute to the source generator? I'd be happy to help in designing the API and implementing

@RichiCoder1
Copy link
Author

@RichiCoder1 do you only have strings in mind or do you also mean complex objects (structs, classes, and records)?

An MVP would probably be just primitives (which I guess is just int/float/string atm), but mostly because I'm not sure what/how involved more sophisticated data would be. Reading over the Extism docs, would probably want to orient the default marshalling experience so it's consistent on both ends?

Also, would you like to contribute to the source generator? I'd be happy to help in designing the API and implementing

Sadly I don't have a lot of cycles I could commit, but happy to help evaluate and possibly assist with!

@mhmd-azeez mhmd-azeez self-assigned this Jan 10, 2024
@mhmd-azeez
Copy link
Collaborator

mhmd-azeez commented Jan 11, 2024

I did some experimentation to see what's possible. Using Source Generation, a pattern like this is possible:

Use Case 1: String in, String out

public class Functions
{
    [WasmExport("greet")]
    public static string Greet(string input)
    {
        // TODO: implement greet
    }
}

Use Case 2: Complex objects

public class Functions
{
    [JsonSerialization(typeof(JsonContext))]
    [WasmExport("greet")]
    public static GreetResponse Greet(GreetRequest request)
    {
        // TODO: implement greet
    }
}

[JsonSerializable(typeof(GreetRequest))]
[JsonSerializable(typeof(GreetResponse))]
partial class JsonContext : JsonSerializerContext
{

}

public class GreetRequest
{

}

public class GreetResponse
{

}

The source generator will generate a method like this:

[UnmanagedCallersOnly(EntryPoint = "greet")]
public static void Greet()
{
    var typeInfo = global::SampleLib.JsonContext.Default.GreetRequest;
    var json = global::Extism.Pdk.GetInput();
    var serializer = new global::Extism.JsonExtismSerializer();
    var input = serializer.Deserialize<SampleLib.GreetRequest>(json, typeInfo);
    
    var result = SampleLib.Functions.Greet(input);
    
    var typeInfo2 = global::SampleLib.JsonContext.Default.GreetResponse;
    var json2 = serializer.Serialize(result, typeInfo2);
    global::Extism.Pdk.SetOutput(json2);
}

Notes:

  1. Specifying a JsonSerializerContext is required, otherwise trimming would break the app. Without trimming the wasm file is too big (21mb)
  2. I was hoping to make the pattern extensible (so that users can use their own serialization formats: message pack, protobuf, etc). But, the fact that reflection is not supported when trimming makes this impossible. So the source generator has to be aware of all supported formats, although we can start with json.

Overall, I feel like it's too much complexity for not much value. I think having a couple of convenience methods would give us most of the value, while also being less brittle:

[UnmanagedCallersOnly(EntryPoint = "greet")]
public static int Greet()
{
    var request = Pdk.GetInputJson<GreetRequest>(JsonContext.Default.GreetRequest);

    // TODO: implement greet

    Pdk.SetOutputJson(response, JsonContext.Default.GreetRequest);
}

Note: Extism functions can't take any explicit parameters, instead they just get a byte array as an input.

The new convenience methods are Pdk.GetInputJson<> and Pdk.SetOutputJson in this case.

Let me know what you think, I am also open for suggestions for solving the issues with the Source Generator approach.

If anyone is interested, the experiment is here: https://github.com/extism/dotnet-pdk/tree/spike/source-generation

@RichiCoder1
Copy link
Author

RichiCoder1 commented Feb 3, 2024

Howdy! Apologies for the late replay, just circling back on this.

I honestly think the Source Generate'd version looks fantastic! I do understand what you mean about it being more complicated to maintain though.

Also love the addition of the Pdk Input/Output Json helpers in the meantime.

I was hoping to make the pattern extensible (so that users can use their own serialization formats: message pack, protobuf, etc). But, the fact that reflection is not supported when trimming makes this impossible. So the source generator has to be aware of all supported formats, although we can start with json.

A solution for this would likely be a combination of ducktyping/interface and then a local or assembly attribute. So something like:

[assembly: WasmSerializer<ExtismMessagePackSerializer>]
public class Functions
{
    [WasmExport("greet")]
    public static GreetResponse Greet(GreetRequest request)
    {
        // TODO: implement greet
    }
}

would end up being resolved into

[UnmanagedCallersOnly(EntryPoint = "greet")]
public static void Greet()
{
    var input = global::Extism.Pdk.GetInput();
    var input = global::ThirdParty.ExtismMessagePackSerializer.Deserialize<SampleLib.GreetRequest>(input);
    
    var result = SampleLib.Functions.Greet(input);
    
    var output = global::ThirdParty.ExtismMessagePackSerializer.Serialize<SampleLib.GreetRequest>(result);
    global::Extism.Pdk.SetOutput(output);
}

The third party (or first party) serializer in this case would just use duck typing, but you could also provide an interface like so if you want people to have something to implement and check against:

public interface IExtismSerializer
{
    static abstract T Deserialize<T>(byte[] input);
    static abstract byte[] Serialize<T>(T input);
}

An example for message pack would be:

// SampleProj/dto.cs

[MessagePackObject]
public class GreetRequest
{

}
[MessagePackObject]
public class GreetResponse
{

}
// ThirdParty

namespace ThirdParty;

public class ExtismMessagePackSerializer : IExtismSerializer
{
	public static byte[] Serialize<T>(T input)
    {
		return MessagePackSerializer.Serialize(input);
    }

    public static T Deserialize<T>(bytes[] input)
    {
		return MessagePackSerializer.Deserialize<T>(input);
	}
}

(in this case, you could just MessagePackSerializer directly too, but that would not be that case for all serializers I reviewed)

An if you need to specify options or settings, you'd end up just dropping back to bytes[].

In that case, it would even be neat to have a "default" signature like so:

public class Functions
{
    [WasmExport("raw")]
    public static byte[] Raw(bytes[] data)
    {
        // TODO: implement own seralization and de-serialization logic
    }
}

Which basically does the Pdk'ing for you. It's a small thing, but a little bit of magic can go a long way IMHO.

@mhmd-azeez
Copy link
Collaborator

@RichiCoder1 thanks for the detailed write up, the only issue here is that you still need Reflection for this method to work (and reflection is not supported when using AOT compilation with wasm). The official MessagePack serializer uses Reflection, unless you use their Source Generator. Which causes the same problem as the System.Text.Json Serializer.

I am not sure if having to write this:

[JsonSerializable(typeof(GreetInput))]
public partial class SourceGenerationContext : JsonSerializerContext {}
public record GreetInput(int a, int b);

[UnmanagedCallersOnly]
public static int greet()
{
     var input = Pdk.GetInputJson(SourceGenerationContext.Default.GreetInput);
     // TODO: implement greet
}

Is much different from having to write this:

[JsonSerializable(typeof(GreetInput))] // We need this to get the JsonTypeInfo for the input/output type
public partial class SourceGenerationContext : JsonSerializerContext {}
public record GreetInput(int a, int b);

[JsonSerialization(typeof(SourceGenerationContext))]
[WasmExport("greet")]
public static GreetResponse Greet(GreetRequest request)
{
    // TODO: implement greet
}

My issue with the source generator approach is:

  1. It's too brittle. If you forget one of the attributes, suddenly your wasm is not exporting the function, or there is weird compilation errors, I am worried that instead of simplifying things, it just makes things more complicated and harder to understand for newcomers.
  2. In the end, it's not that much shorter. We can write convenience methods for other serialization protocols (perhaps in their own nuget packages) similar to Json.

@RichiCoder1
Copy link
Author

RichiCoder1 commented Feb 4, 2024

the only issue here is that you still need Reflection for this method to work

That's fair, and that's going to be true of any serialization method just due to WASM Source gen, no? Ultimately that comes down to documentation and also is one of the reasons I'd say you shouldn't necessarily stick multiple formats in core.

In the end, it's not that much shorter.

The difference imho is that it's trivial extra lines for one or two functions, but that starts to add for every function after that much faster for the "add another method" scheme. As soon as you're implementing a lot of members or a complicated interfaces, it starts looking a lot more intensive.

It's too brittle. If you forget one of the attributes, suddenly your wasm is not exporting the function, or there is weird compilation errors, I am worried that instead of simplifying things, it just makes things more complicated and harder to understand for newcomers.

Part of this can be mitigated via Analyzers, but also part of this is you'll get the exact same errors if you do it manually as if you do it via source gen. And a benefit of source gen is what's getting gen'd is pretty transparent in the end.

I'm also thinking of cases too where a vendor might want to provide, say, a PDK on top of Extism's PDK to C# developers to encourage them to write plugins. The default Extism experience is familiar for doing cross-lang stuff, but very alien for C# developers. Exposing source gen primitives allows downstream devs to expose a more "native" looking experience with much fewer caveats. Though I suppose a good argument against that also is the vendor could simply cover these warts themselves (hide all the interop in an abstraction layer).

That being said, source gens aren't simple to maintain so I can see that argument that it's too much for a small theoretical win.

@mhmd-azeez
Copy link
Collaborator

mhmd-azeez commented Feb 4, 2024

That's fair, and that's going to be true of any serialization method just due to WASM Source gen, no

Exactly, that's why you can't automate the serialization part. The way you have shown will break when you run dotnet publish if you enable PublishTrimmed. And PublishTrimmed is crucial because it decreases the size of the binary from 20mb down to about 7mb (which still pretty big, not much better).

The default Extism experience is familiar for doing cross-lang stuff, but very alien for C# developers

I agree, I'd love to be able to make it nicer

Overall, I am inclined towards waiting for more user feedback, hopefully we can get to a good solution down the line

@furesoft
Copy link
Contributor

I have some ideas how the reflection problem could be solved when using other serializers. I will try it this weekend and make a PR if it works as intended.

@Simonl9l
Copy link

I'd suggest that per "brittle-ness", the generation only covers half the possible solution space, if there are expectation that certain things exist, this is covered more in the custom code analyzer

I'd also suggest that the PDK on a PDK situation is also very likely. We're directly looking at that scenario, as expecting a plugin developer for a specific solution to have to add a bunch of boiler plate to define say the host functions etc. in support of such a solution would be developer friction. Having the .net PDK leverage generators and analyzers is then reflective of that same situation., with us being exitsm developers, but not having to expose extism to our plugin developers explciity - more "powered by".

@nilslice
Copy link
Member

Hi @Simonl9l - it sounds like perhaps you could benefit from something new we're working on. Especially given that PDK code generation and reduction of boilerplate is a first-class part of its DX.

Would you be open to setting up a call, where we could chat about this as well as more on your use case to see how we can best assist?

cc/ @bhelx

@bhelx
Copy link
Contributor

bhelx commented Jun 14, 2024

Seems like it's been a while since we had activity on this thread but there are still some things to be solved so I'm going to keep this open. Please chime in if anything has changed!

@Simonl9l
Copy link

@bhelx & @nilslice back from being deep in our own project, and looping back to Extism. If you are still interested connecting please advise.

@nilslice
Copy link
Member

Would be happy to! Feel free to send me an email, [email protected], or reach out via discord: https://extism.org/discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants