Feature Request: OpenAI-compatible API #64

taowang1993 · 2024-10-14T23:52:58Z

Hi, Podcastfy Team.

I would like to make a feature request for easier deployment and integration with other systems.

Docker Support: It would be much easier to have a Dockerfile so that users can build a docker image and deploy it to the cloud. It would be even better if a prebuilt official docker image is provided in the dockerhub.
OpenAI-compatible API: It would be very easy to integrate Podcastfy into other systems if Podcastfy provides an API that is compatible with the OpenAI /audio/speech API format.

For example:
In Dify (an agent building platform like Langchain), currently I can generate a script and convert it into audio with tts models by clicking the play button.

I would like to integrate Podcastfy with Dify so that I can generate podcasts.

This will open up many opportunities.

For example, I can build an AI teacher that helps people learn new languages.

Podcastfy can be used to help students improve English listening comprehension.

As you can see in the screenshot, Dify can call any OpenAI-compatible APIs.

Thank you.

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

souzatharsis · 2024-10-15T00:05:50Z

Re: OpenAI-compatible API

Excellent Feature Request; this is a must for wide adoption.
Could you please clarify whether for your use case to work should we implement OpenAI TTS interface or Chat interface?

Would appreciate if you could share the specific API spec you are referring to and we should be able to push it pretty quickly.

Thanks for sharing your detailed use case with us.

taowang1993 · 2024-10-15T02:58:13Z

OpenAI TTS interface or Chat interface

I think Podcastfy can go with the tts API format:
https://api.openai.com/v1/audio/speech

python

from pathlib import Path
import openai

speech_file_path = Path(__file__).parent / "speech.mp3"
response = openai.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="The quick brown fox jumped over the lazy dog."
)
response.stream_to_file(speech_file_path)

curl

curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3

node

import fs from "fs";
import path from "path";
import OpenAI from "openai";

const openai = new OpenAI();

const speechFile = path.resolve("./speech.mp3");

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: "tts-1",
    voice: "alloy",
    input: "Today is a wonderful day to build something people love!",
  });
  console.log(speechFile);
  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.promises.writeFile(speechFile, buffer);
}
main();

OpenAI TTS API Reference:
https://platform.openai.com/docs/api-reference/audio/createSpeech

ralyodio · 2024-10-16T11:46:52Z

we should support ollama for starters.

souzatharsis · 2024-10-16T12:12:17Z

@ralyodio podcastfy now supports running local llms via llamafiles https://github.com/souzatharsis/podcastfy/blob/main/usage/local_llm.md

What would be the value add of adding ollama given that?

We can move the ollama discussion to a separate issue if there's value in it. And keep this issue focused on OpenAI interface request.

Curious about your experience.

ralyodio · 2024-10-16T12:26:00Z

a lot of people already have ollama running and it exposes a rest api so apps (like this one) can easily integrate with it.

brumar · 2024-10-18T14:20:01Z

@taowang1993 the input in your example would the the raw content right? Not the transcript?
As I think exposing a rest API is out of scope for the moment (but @souzatharsis can prove me wrong), the idea is that if we have an API in python that follows the same signature of openai.audio.speech.create then it would be trial to expose it as a rest API right?

I feel this api would be a bit awkward as it focus only on a selection of arguments that podcastfy normally uses. It's technically feasible (as an instance method of a new class or even as a closure) and it would mesh very well with projects that want to expose this as a rest API, but won't be too useful for the integration of podcastfy in a larger python projects, and would be something we have to maintain over time.

I think the documentation could present a recipe to create a fastapi endpoint that would more or less respect the openai openapi.json for example, with a real code snippet leveraging the current abstractions, so that would be even more helpful for the kind of need you have, while not adding a new interface to maintain in the codebase. That's my 2cts anyway :)

taowang1993 · 2024-10-18T18:45:08Z

the input in your example would the the raw content right? Not the transcript?

In the context of podcastfy, the "input" would be the raw text (or document) that will be fed to podcastfy to convert into podcasts.

In the context of openai tts, the "input" is the transcript that users want to convert into speech.

The reason I propose an "openai-compatible" appoach, is because many ai systems already have the openai client sdk built in.

If podcastfy exposes an openai-compatible api, then it would be very easy to integrate into other systems, leading to wider adoption.

If openai api format is not very suitable for podcastfy, then any api format will also work.

brumar · 2024-10-18T19:33:27Z

Thanks for the explanations. That's convincing!

souzatharsis · 2024-10-27T18:20:32Z

Docker image has been created:

https://github.com/souzatharsis/podcastfy/blob/main/usage/docker.md

I've updated this Issue to focus solely on enabling OpenAI-type API

souzatharsis added the HIGH Priority label Oct 15, 2024

souzatharsis pinned this issue Oct 18, 2024

souzatharsis changed the title ~~Feature Request: Support Docker and OpenAI-compatible API~~ Feature Request: OpenAI-compatible API Oct 26, 2024

polar-sh bot added the Fund label Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: OpenAI-compatible API #64

Feature Request: OpenAI-compatible API #64

taowang1993 commented Oct 14, 2024 •

edited by polar-sh bot

Loading

souzatharsis commented Oct 15, 2024

taowang1993 commented Oct 15, 2024

ralyodio commented Oct 16, 2024

souzatharsis commented Oct 16, 2024

ralyodio commented Oct 16, 2024

brumar commented Oct 18, 2024

taowang1993 commented Oct 18, 2024

brumar commented Oct 18, 2024

souzatharsis commented Oct 27, 2024

Feature Request: OpenAI-compatible API #64

Feature Request: OpenAI-compatible API #64

Comments

taowang1993 commented Oct 14, 2024 • edited by polar-sh bot Loading

Upvote & Fund

souzatharsis commented Oct 15, 2024

taowang1993 commented Oct 15, 2024

ralyodio commented Oct 16, 2024

souzatharsis commented Oct 16, 2024

ralyodio commented Oct 16, 2024

brumar commented Oct 18, 2024

taowang1993 commented Oct 18, 2024

brumar commented Oct 18, 2024

souzatharsis commented Oct 27, 2024

taowang1993 commented Oct 14, 2024 •

edited by polar-sh bot

Loading