Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: OpenAI-compatible API #64

Open
taowang1993 opened this issue Oct 14, 2024 · 9 comments
Open

Feature Request: OpenAI-compatible API #64

taowang1993 opened this issue Oct 14, 2024 · 9 comments

Comments

@taowang1993
Copy link

taowang1993 commented Oct 14, 2024

Hi, Podcastfy Team.

I would like to make a feature request for easier deployment and integration with other systems.

  • Docker Support: It would be much easier to have a Dockerfile so that users can build a docker image and deploy it to the cloud. It would be even better if a prebuilt official docker image is provided in the dockerhub.

  • OpenAI-compatible API: It would be very easy to integrate Podcastfy into other systems if Podcastfy provides an API that is compatible with the OpenAI /audio/speech API format.

For example:
In Dify (an agent building platform like Langchain), currently I can generate a script and convert it into audio with tts models by clicking the play button.

I would like to integrate Podcastfy with Dify so that I can generate podcasts.

image

This will open up many opportunities.

For example, I can build an AI teacher that helps people learn new languages.

Podcastfy can be used to help students improve English listening comprehension.

As you can see in the screenshot, Dify can call any OpenAI-compatible APIs.

Thank you.

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@souzatharsis
Copy link
Owner

Re: OpenAI-compatible API

Excellent Feature Request; this is a must for wide adoption.
Could you please clarify whether for your use case to work should we implement OpenAI TTS interface or Chat interface?

Would appreciate if you could share the specific API spec you are referring to and we should be able to push it pretty quickly.

Thanks for sharing your detailed use case with us.

@taowang1993
Copy link
Author

OpenAI TTS interface or Chat interface

I think Podcastfy can go with the tts API format:
https://api.openai.com/v1/audio/speech

python

from pathlib import Path
import openai

speech_file_path = Path(__file__).parent / "speech.mp3"
response = openai.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="The quick brown fox jumped over the lazy dog."
)
response.stream_to_file(speech_file_path)

curl

curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3

node

import fs from "fs";
import path from "path";
import OpenAI from "openai";

const openai = new OpenAI();

const speechFile = path.resolve("./speech.mp3");

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: "tts-1",
    voice: "alloy",
    input: "Today is a wonderful day to build something people love!",
  });
  console.log(speechFile);
  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.promises.writeFile(speechFile, buffer);
}
main();

OpenAI TTS API Reference:
https://platform.openai.com/docs/api-reference/audio/createSpeech

@ralyodio
Copy link

we should support ollama for starters.

@souzatharsis
Copy link
Owner

@ralyodio podcastfy now supports running local llms via llamafiles https://github.com/souzatharsis/podcastfy/blob/main/usage/local_llm.md

What would be the value add of adding ollama given that?

We can move the ollama discussion to a separate issue if there's value in it. And keep this issue focused on OpenAI interface request.

Curious about your experience.

@ralyodio
Copy link

a lot of people already have ollama running and it exposes a rest api so apps (like this one) can easily integrate with it.

@souzatharsis souzatharsis pinned this issue Oct 18, 2024
@brumar
Copy link
Collaborator

brumar commented Oct 18, 2024

@taowang1993 the input in your example would the the raw content right? Not the transcript?
As I think exposing a rest API is out of scope for the moment (but @souzatharsis can prove me wrong), the idea is that if we have an API in python that follows the same signature of openai.audio.speech.create then it would be trial to expose it as a rest API right?

I feel this api would be a bit awkward as it focus only on a selection of arguments that podcastfy normally uses. It's technically feasible (as an instance method of a new class or even as a closure) and it would mesh very well with projects that want to expose this as a rest API, but won't be too useful for the integration of podcastfy in a larger python projects, and would be something we have to maintain over time.

I think the documentation could present a recipe to create a fastapi endpoint that would more or less respect the openai openapi.json for example, with a real code snippet leveraging the current abstractions, so that would be even more helpful for the kind of need you have, while not adding a new interface to maintain in the codebase. That's my 2cts anyway :)

@taowang1993
Copy link
Author

the input in your example would the the raw content right? Not the transcript?

In the context of podcastfy, the "input" would be the raw text (or document) that will be fed to podcastfy to convert into podcasts.

In the context of openai tts, the "input" is the transcript that users want to convert into speech.

The reason I propose an "openai-compatible" appoach, is because many ai systems already have the openai client sdk built in.

If podcastfy exposes an openai-compatible api, then it would be very easy to integrate into other systems, leading to wider adoption.

If openai api format is not very suitable for podcastfy, then any api format will also work.

@brumar
Copy link
Collaborator

brumar commented Oct 18, 2024

Thanks for the explanations. That's convincing!

@souzatharsis souzatharsis changed the title Feature Request: Support Docker and OpenAI-compatible API Feature Request: OpenAI-compatible API Oct 26, 2024
@souzatharsis
Copy link
Owner

Docker image has been created:

https://github.com/souzatharsis/podcastfy/blob/main/usage/docker.md

I've updated this Issue to focus solely on enabling OpenAI-type API

@polar-sh polar-sh bot added the Fund label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants