About a standard request API for the package #111
Replies: 4 comments 5 replies
-
So on initial review (cursory) the biggest bottleneck might be the prompt style, which is different then just API request format. Let me give you an example. I am now training an R tuned version of starchat (https://huggingface.co/HuggingFaceH4/starchat-beta), which itself would be a good default hugging-face model to suggest to users by the way. the promt needs to be formatted as follows: prompt_template = "<|system|>\n<|end|>\n<|user|>\n{query}<|end|>\n<|assistant|>" where this is just one of many prompt formats models behind the HF api might expect, a competing code model that smaller and faster (replit: https://huggingface.co/teknium/Replit-v2-CodeInstruct-3B ) has an entirely different prompt format. So we need a way for the user to specify the prompt format, or ideally for devs to tell the package what prompt format to use. we want to preven having to hardcode all this per model within a specific API... |
Beta Was this translation helpful? Give feedback.
-
Thank you for your well-thought-out proposal! I appreciate the effort you've put into detailing your thoughts and structuring your ideas. It's clear that you've given this a lot of thought. Here are my thoughts:
Here are a few things that might be worth considering:
|
Beta Was this translation helpful? Give feedback.
-
This may deserve its own discussion thread, but what about local models? I don't know how to do that yet without taking on a |
Beta Was this translation helpful? Give feedback.
-
I have a (very much incomplete) start on the S3 structure over here: #109 Building on what you already have @calderonsamuel, I propose to use these class names based on recommendations in the S3 Chapter of Advanced R: Perform Request: Process Response: Overall, I like where we're headed. |
Beta Was this translation helpful? Give feedback.
-
All right, it took a bit longer than I expected but here it is. The flowcharts were made with mermaid, you can expand them to full width with the
<->
button in the top right section of the image. Have in mind that this intends to start a discussion, and any function or class name is not definitive. Also for some reason Github added indentation in some code chunks and you might notice it.This is a high level representation of how I think we should handle the process. This should include every API that exists now for every LLM task that uses an user text prompt. Have in mind that this doesn't include the internal logic of the chat app, only how we get to the response.
We can expand a little to include expected output per step. See that the output of the whole process is a standard response (standard in the sense that we have a consistent structure). Once we get a standard response, we can assume that we can have a standard process for handling the response inside the chat app or inside any other service/addin.
At this point, we could start considering that standarization could be enforced trough an S3 class. This means that there should be an standard (and hopefully obvious) way of handling any necessary input for the request. For that, we need to stablish a common list of possible inputs.
If we zoom on every process we get the following:
Building the request
For building the request skeleton we need:
If we translate this to an R list, it would look like this for an OpenAI request:
For a HuggingFace inference API request it would have the same structure:
As you can see, the required parameters for the standard requests are things that we can assume any LLM API would require. The
extra
parameters is open to any specific parameter they accept. An S3 class that wraps this would basically be just a constructor for the structure and names.For any specific API we just need to create a subclass that checks the types of the
extra
parameters. It could also define sensible defaults for its skeleton.Performing the request
Here is where the work is heavier. Having a standard request skeleton, we need to:
For now, we need to do both in a single function (because we use
{curl}
for streaming and{httr2}
for non streaming, so we can't just pipe the perform process.With this in mind, we could define a generic
llm_request_perform()
.Then, for the OpenAI API we could perform like the following code chunk. When the request requires streaming and the
stream_handler
is not provided, it will fail. We return both the skeleton and the response, to be able to standarize the response later.It is worth noting that the stream handler would probably have to be specifically tailored to a subclass of
llm_request_skeleton
(e.g.llm_skeleton_openai
). Also have in mind that the return value ofllm_request_perform()
will have itsresponse
element in a not standard format.I haven't write the code necessary, but I think it would be worth to create an S3 class for this responses too.
Standarizing the response
Now we just need to grab the response and standarize it. That will mostly mean that we give it the same structure as a standard request skeleton, so we can re-use it for new requests or for any service. Of course, it would be better to have an S3 generic to standarize the response (or just getting the last response).
Of course, all this is just a high overview open to discussion and potentially buggy. I would love to read multiple feedback.
Beta Was this translation helpful? Give feedback.
All reactions