-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Web Speech API #661
base: v2-dev
Are you sure you want to change the base?
Conversation
@zoollcar is attempting to deploy a commit to the Enrico Pro Team on Vercel. A member of the Team first needs to authorize it. |
Thanks for the feature. This patch is now in a state where I can review it and potentially merge it. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Update: thanks for resubmitting the PR, this is definitely a higher quality code that considers the application (e.g. other modules). I'm testing it on mobile and it's hanging a couple of times (I believe it to be a stability error with some changing react reference) and it's possibly something I can fix, but it's gonna require some time for me to check out and develop. On the UX side, there could be some rough edges (on my android phone the High quality List doesn't do much, no matter what one chooses the experience doesn't change, and this happens for the 4 available voices as well). So there's something that I can look into to improve the UX. Why is key? Because every feature Big AGI gets the same scrutiny and UX perfection. Thanks again, I'll follow up when I have time to check this out and review and change what needs to be changed. Let me know in the meantime if anything can improve on your side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall there could be some details (and a crash) to be ironed out. Going in the right direction and increasing in quality.
I wonder if the whole module should get a cleanup, meaning to bring also elevenlabs and webspeech under a same umbrella, e.g. /modules/tts/* or similar.
The module would benefit likely from having an abstraction (e.g. interface ISpeechSynthesis or similar) with the 2 Engines implementing the same interface. This way every call becomes more abstract and the caller doesn't need to know "if elevenlabs do this, otherwise do that".
In general is good, please take a look at the comments (only come are necessary).
throw new Error('TTSEngine is not found'); | ||
} | ||
|
||
export function useCapability(): CapabilitySpeechSynthesis { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crash issue identified to this hook (the one that gave the black screen in the screenshot). Seems that when switching provider, there's a react out-of-order issue. Only when switching TTS providers I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that to fix this properly, we may have to overhaul the ttsengine reactivity (hooks)
setPersonaTextInterim(text); | ||
|
||
// Maintain and say the current sentence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this.
src/apps/chat/store-app-chat.ts
Outdated
@@ -51,6 +52,12 @@ interface AppChatStore { | |||
micTimeoutMs: number; | |||
setMicTimeoutMs: (micTimeoutMs: number) => void; | |||
|
|||
TTSEngine: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now this could be: TTSEngine: 'elevenlabs' | 'webspeech', to force typescript to do its job.
src/apps/chat/store-app-chat.ts
Outdated
@@ -114,6 +121,12 @@ const useAppChatStore = create<AppChatStore>()(persist( | |||
micTimeoutMs: 2000, | |||
setMicTimeoutMs: (micTimeoutMs: number) => _set({ micTimeoutMs }), | |||
|
|||
TTSEngine: TTSEngineList[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if TTSEngine: 'elevenlabs' | 'webspeech', then this become one of the two values (probably 'WebSpeech' by default) -- then the conversion to a nice string can be done in the settings UI, and in the code we only match against those IDs.
As an alternative this could be left undefined
, and the UI will decide what to use every time, unles the user makes a choice. undefined
will default to 'webspeech'
src/apps/chat/store-app-chat.ts
Outdated
TTSEngine: TTSEngineList[0], | ||
setTTSEngine: (TTSEngine: string) => _set({ TTSEngine }), | ||
|
||
ASREngine: ASREngineList[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, we could keep an undefined
here, and hardcode a 'webspeech' as the ID - so we can fall back to that as autodetect
React.useEffect(() => { | ||
if (languageCode) { | ||
const fetchFunction = async () => { | ||
let res = await fetch(`https://raw.githubusercontent.com/HadrienGardeur/web-speech-recommended-voices/refs/heads/main/json/${languageCode}.json`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done here.
}; | ||
fetchFunction().catch((err) => { | ||
console.log('Error getting voice list: ', err); | ||
addSnackbar({ key: 'browser-speech-synthesis', message: 'Error getting voice list', type: 'issue' }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this message with some of the languages of the list. Strange because I thought the list will have all valid languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream error, the language listed in json file does not have a corresponding file. so 404. I'll delete all invalid languages
import { persist } from 'zustand/middleware'; | ||
import { useShallow } from 'zustand/react/shallow'; | ||
|
||
export type BrowsePageTransform = 'html' | 'text' | 'markdown'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably doesn't belong here. We have already a browser store (for the browsing capability) but it's different.
export type BrowsePageTransform = 'html' | 'text' | 'markdown'; | ||
|
||
interface BrowseState { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the settings also of TTSEngine could be here, to keep everything together.
import { useBrowseVoiceId } from './store-module-browser'; | ||
import { speakText, cancel } from './browser.speechSynthesis.client'; | ||
|
||
function VoicesDropdown(props: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is a duplication of the ElevenLabs. probalby needed because of different logic.
I'll make a abstraction(under The refactored version will be updated these days. |
Hi @zoollcar - just FWI - I won't have the time to merge this before the official V2 launch. I can't disclose dates, but I'll be very busy for a while. If you have a clean patch that doesn't require any work from my side, I'll see what I can do - in the meantime enjoy the fact that you're the only person with a custom big-AGI that supports multiple TTR/ASR engines. |
New feature:
TODO:
Some previous questions:
module has been moved to modules/browser/speech-synthesis
{name }has been replaced to the name of voice
copy en.json as a local file Languages.json. fetch voice list when select a language