-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI Codegen design #606
Comments
This python server business is interesting. We are building a little list of data services - like metadata, docgen, and now codegen. We've talked about hosting some of these in Lightning but I'm not entirely sure they're a lightning concern. An obvious difficulty is that the we would want a node or elixir server to serve metadata and docgen, but the AI community would likely want a python server to service any codegen calls. One benefit of hosting it on http of course is that we can map different endpoints to different servers. I might add a command to the CLI which is a bit more dev focused and which calls out to the data server. So we'll have |
@josephjclark what is blocking this issue? Is this related to OpenFn/adaptors#19? |
@christad92 no - this issue is for AI-driven template generation). It's not blocked at the moment, I'll be picking it up next week. The other ticket is a simple static template generation utility. Useful but not particularly interesting. |
Notes on the CLI: There will be two CLI commands - a specific The The apollo command should do some useful stuff, like:
|
Most of this work is done. Instead of a codegen command, I'm so far only supporting a generic |
This issue contains a living design for
openfn codegen
(or whatever we call it), the CLI command which today will generate an adaptor template, but tomorrow may generate other stuff.Inputs
The command will need to take the following inputs:
Outputs
The existing AI service will generate the following files:
We need to extend it to generate a full adaptor template:
What if you point to an existing adaptor? Can we merge the new code with the existing
Adaptor.js
?API Spec
Here's what the API might look like:
spec.json
would contain all the inputs. Presumably they can all be overridden by CLI Args too.Right now this is:
But I would probably like to change "instruction" to "endpoints", and just have a list
Changing vs Generating
It's easy to generate a fresh adaptor because we're outputting fresh source code. No toes to step on.
But if this is going to work on an existing adaptor, we have to generate code inside existing code.
Either we do that by injecting generated code into an existing file, perhaps using static analysis tools, or we get the AI to do it (which feels like a risk to me).
I suppose we start by saying: here is the existing adaptor.js file. Please generate or amend this function. If that proves reliable then great, we can keep it. Bear in mind that everything is backed by git, so it's not like the codegen can make permanent breaking changes - also changes are easy to see via diff.
Model Abstraction
Should we expose the model to the user?
I think we're building a highly specific service here. I think we should chose a model that works well for the task, maybe fine tune it (I think we had mixed results on that?) and optimise the prompt for that model.
Maybe we even find that we want to use multiple models in the pipeline.
So I kind of thing we should abstract the model, maybe expose it as an option if it helps, but otherwise don't put it it in the user's hands. This also implies that we own the model's API keys (if required). Maybe users need an API key to use the service generally?
Python Server
The current AI spike is implemented as a python service behind a python http server.
We could re-implement the "frontend" of the service in TypeScript. But ML devs don't want to use TypeScript, so this isn't a very appealing option.
So we probably need to keep the python server. That means we need to host a central server which exposes a bunch of endpoints. Each endpoint should be like
api/<task>/<version>
, ie,api/adaptor/1
. Input payloads could be quite big if they have to include spec.json files or a prompt, so the endpoint should accept a JSON payload.The CLI could take a url to the server to call. By default it would call
data.openfn.org
or something, but you can passlocalhost:1234
to use a local dev server.The text was updated successfully, but these errors were encountered: