diff --git a/CHANGELOG.md b/CHANGELOG.md index f341cc6..396c340 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,6 @@ # Changelog -## [0.3.1] - 2024-11-07 +## [0.3.3] - 2024-11-08 ### Breaking Changes - Loading images from 'path' has been removed for security reasons. Please specify images by passing an 'url'. @@ -15,6 +15,9 @@ - Start TESTIMONIALS.md - Add apps using Podcastfy to README.md +### Fixed +- #165 Fixed audio generation in Windows OS issue: Normalize path separators for cross-platform compatibility + ## [0.2.3] - 2024-10-15 ### Added diff --git a/README.md b/README.md index 17f4f3c..3b4a58f 100644 --- a/README.md +++ b/README.md @@ -72,9 +72,12 @@ This sample collection is also [available at audio.com](https://audio.com/thatup ## Updates 🚀 ### v0.3.0+ release +- Generate podcasts from input topic using real-time internet search - Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation - Integrate with Google's Multispeaker TTS model for high-quality audio generation +See [CHANGELOG](CHANGELOG.md) for more details. + ## Quickstart 💻 ### Prerequisites @@ -108,8 +111,6 @@ python -m podcastfy.client --url --url - [CLI](usage/cli.md) -- [Docker Image](usage/docker.md) - - [How to](usage/how-to.md) Experience Podcastfy with our [HuggingFace](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo) 🤗 Spaces app. (Note: This UI app is less extensively tested than the Python package.) diff --git a/podcastfy/__init__.py b/podcastfy/__init__.py index ddde97c..0fcdce8 100644 --- a/podcastfy/__init__.py +++ b/podcastfy/__init__.py @@ -1,2 +1,2 @@ # This file can be left empty for now -__version__ = "0.3.1" # or whatever version you're on +__version__ = "0.3.3" # or whatever version you're on diff --git a/podcastfy/text_to_speech.py b/podcastfy/text_to_speech.py index ee01fe7..347baf1 100644 --- a/podcastfy/text_to_speech.py +++ b/podcastfy/text_to_speech.py @@ -134,7 +134,7 @@ def _generate_audio_segments(self, text: str, temp_dir: str) -> List[str]: for speaker_type, content in [("question", question), ("answer", answer)]: temp_file = os.path.join( temp_dir, f"{idx}_{speaker_type}.{self.audio_format}" - ) + ).replace('\\', '/') # Normalize path separators for cross-platform compatibility voice = provider_config.get("default_voices", {}).get(speaker_type) model = provider_config.get("model") diff --git a/pyproject.toml b/pyproject.toml index fdb2566..70b5db5 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "podcastfy" -version = "0.3.1" +version = "0.3.3" description = "An Open Source alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI" authors = ["Tharsis T. P. Souza"] license = "Apache-2.0" diff --git a/usage/conversation_custom.md b/usage/conversation_custom.md index 39f1b6c..4bfa7b7 100644 --- a/usage/conversation_custom.md +++ b/usage/conversation_custom.md @@ -187,7 +187,7 @@ creativity: 0.7 - The `word_count` is a target, and the AI may generate more or less than the specified word count. Low word counts are more likely to generate high-level discussions, while high word counts are more likely to generate detailed discussions. - The `output_language` defines both the language of the transcript and the language of the audio. Here's some relevant information: - Bottom-line: non-English transcripts are good enough but non-English audio is work-in-progress. - - Transcripts are generated using Google's Gemini 1.5 Pro, which supports 100+ languages by default. + - Transcripts are generated using Google's Gemini 1.5 Pro by default, which supports 100+ languages. Other user-defined models may or may not support non-English languages. - Audio is generated using `openai` (default), `elevenlabs`, `gemini`,or `edge` TTS models. - The `gemini`(Google) TTS model is English only. - The `openai` TTS model supports multiple languages automatically, however non-English voices still present sub-par quality in my experience.