-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generating speech locally in the web browser #352
Comments
Wow I see they got it working, they have a demo here: The file sizes shown in the drop down is not correct, and the UI has lots of options to try out models, perhaps more than needed, but it works!! In the "WASM friendly" fork, a new command-line argument "--input" was added . It's used to parse JSON directly from the command line. A new JSON object input is initialised instead of reading from JSON from stdin, parts of the code for parsing JSON line by line are commented out, but parts that deal with the found attributes, remain. I think to cleanly integrate it, a command like argument to input JSON without stdin, is a good idea, and to avoid repeating code, some of the common logic would probably need extracting out. @jozefchutka and @synesthesiam if you could weigh in on that, it'd be appreciated. Anyway, awesome work! @eschmidbauer I have to wonder, how did you find it? |
It would be great having piper compilable smoothly into wasm. The last time I tried, it took many manual steps to do so. Merging with wide-video@a8e4c87 is just a tip of the iceberg.
|
I would like to share the news with you guys that you can run all of the models from piper with web assembly using We have created a huggingface space so that you can try it. The address is The above huggingface space uses the following model from piper: We also have a YouTube video to show you how to do that. Everything is open-sourced. If you want to know how web assembly is supported for piper, please see the following pull request: There is one more thing to be improved:
FYI: In addition to running piper models with web assembly using sherpa-onnx, you can also run them on Android, iOS, Raspberry Pi, Linux, Windows, macOS, etc, with sherpa-onnx. All models from piper are supported by sherpa-onnx and you can find the converted models at https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models |
You can find the files for the above huggingface space at You can see that the wasm module file is only 11.5 MB. |
@csukuangfj This is great, thanks so much ! |
@csukuangfj Superb job! |
Sorry that I don't know whether it is possible. I am very new to WebAssembly (only learned it for 3 days) |
Piper has been integrated into Read Aloud, and released as a separate extension as well. The source code is here. Please help out if you can with some of the open issues. |
Following @ken107 work, I have updated https://piper.wide.video/ . Instead of whole piper being compiled into wasm, now it is 2 step process:
This already provides 4-8x improved performance when running on CPU. Here is the simplest implementation https://piper.wide.video/poc.html |
Sharing my Paste-n-Build solution based on @jozefchutka research. #!/bin/bash
BUILD_DIR=$(pwd)/build-piper
rm -rf $BUILD_DIR && mkdir $BUILD_DIR
TMP=$BUILD_DIR/.tmp
[ ! -d $TMP ] && mkdir $TMP
DOCKERFILE=$TMP/piper_wasm_compile.Dockerfile
cat <<EOF > $DOCKERFILE
FROM debian:stable-slim
RUN apt-get update && \
apt-get install --yes --no-install-recommends \
build-essential \
cmake \
ca-certificates \
curl \
pkg-config \
git \
autogen \
automake \
autoconf \
libtool \
python3 && ln -sf python3 /usr/bin/python
RUN git clone --depth 1 https://github.com/emscripten-core/emsdk.git /modules/emsdk
WORKDIR /modules/emsdk
RUN ./emsdk install 3.1.41 && \
./emsdk activate 3.1.41 && \
rm -rf downloads
WORKDIR /wasm
ENTRYPOINT ["/bin/bash", "-c", "EMSDK_QUIET=1 source /modules/emsdk/emsdk_env.sh && \"\$@\"", "-s"]
CMD ["/bin/bash"]
EOF
docker buildx build -t piper-wasm-compiler -q -f $DOCKERFILE .
cat <<EOF | docker run --rm -i -v $TMP:/wasm piper-wasm-compiler /bin/bash
[ ! -d espeak-ng ] && git clone --depth 1 https://github.com/rhasspy/espeak-ng.git
cd /wasm/espeak-ng
./autogen.sh
./configure
make
cd /wasm
[ ! -d piper-phonemize ] && git clone --depth 1 https://github.com/wide-video/piper-phonemize.git
cd piper-phonemize && git pull
emmake cmake -Bbuild -DCMAKE_INSTALL_PREFIX=install -DCMAKE_TOOLCHAIN_FILE=\$EMSDK/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake -DBUILD_TESTING=OFF -G "Unix Makefiles" -DCMAKE_CXX_FLAGS="-O3 -s INVOKE_RUN=0 -s MODULARIZE=1 -s EXPORT_NAME='createPiperPhonemize' -s EXPORTED_FUNCTIONS='[_main]' -s EXPORTED_RUNTIME_METHODS='[callMain, FS]' --preload-file /wasm/espeak-ng/espeak-ng-data@/espeak-ng-data"
emmake cmake --build build --config Release # fails on "Compile intonations / Permission denied", continue with next steps
sed -i 's+\$(MAKE) \$(MAKESILENT) -f CMakeFiles/data.dir/build.make CMakeFiles/data.dir/build+#\0+g' /wasm/piper-phonemize/build/e/src/espeak_ng_external-build/CMakeFiles/Makefile2
sed -i 's/using namespace std/\/\/\0/g' /wasm/piper-phonemize/build/e/src/espeak_ng_external/src/speechPlayer/src/speechWaveGenerator.cpp
emmake cmake --build build --config Release
EOF
cp $TMP/piper-phonemize/build/piper_phonemize.* $BUILD_DIR
rm -rf $TMP This script will automatically build and copy Under the hood this script will:
|
|
@iSuslov can you provide a simple POC to test your work, i'm near to backend, but I need to implement this on a web with the less dependencies (html+js+wams if possible with no additional frameworks like node.js), i'm little lost wich where to start. Also @jozefchutka if you can share your source code for your poc will be a good starting point to understand this artifacs. Best regards! |
@puppetm4st3r Doc: https://k2-fsa.github.io/sherpa/onnx/tts/wasm/index.html huggingface space demo for wasm + tts: https://k2-fsa.github.io/sherpa/onnx/tts/wasm/index.html (Hint: You can copy the files from the huggingface space directly to your own project.) |
thanks! I'm following the doc but when I try to build the assets for an spanish model a got this stack trace.
Installed cmake with apt-get, and then with pip. both got me that stack trace... Do you know what it could be happened? the model selected was: |
How much RAM does your computer have? Could you try |
64gb, free at least 90%, will try! |
Does it work now? |
yes! thanks! |
Hey @puppetm4st3r, I see your issue is resolved, but in case my script seems to be confusing I would like to clarify:
Script will download and compile everything it needs producing wasm build in the same folder. Docker should be preinstalled. |
thanks, now i got another issue, when I compile with your script @iSuslov it works like a charm on desktop webbrowser, but did not work on iOs with OOM error. but when tryed the other solution from @csukuangfj it work on iphone, but cant get running for spanish models wih @csukuangfj method. I'm stuck :( |
Could you describe in detail about why you cannot run it? |
when I tried your advice it finally didn't work, it was a false positive my mistake, cache wont refresh and I was testing with the solution from @iSuslov, it still send me the stack trace that I attached here. But if I clone your sample code with the model in English it works (with no building process, just the sample code with wasm binaries). Tryed to compile inside a clean docker and outside the docker on my machine, both didnt work. Script from @iSuslov works, but when I tryid on iOS crash with OOM, your sample from HF space works on iOS without problem. |
Would be great if you can post error logs. Otherwise, we don't know what you mean when you say |
@puppetm4st3r just out of curiosity, when you say you testing it in iOS do you mean you test it in Safari on iPhone? I've never faced any OOM issues with wasm. Maybe there is an issue in how this script is loaded. |
I tried on iOS iphone safari/chrome but I realized that it is not the wasm for some very strange reason if I test my device using my private network address 192.168.x. and everything works fine I just discovered it, however when accessing the same device via router by public IP fails with an OOM error which makes no sense, I will remotely debug the iPhone and bring you the logs and evidence to leave the case documented in case it is of use to someone. I hope I can solve it now that I know that apparently it is an infrastructure problem... |
@csukuangfj will post later the logs (are very long) maybe I will upload to drive or something... |
@iSuslov additionally I have tested on Android and works fine, the problem is with iOS when exposing the service thru the cloud, so I think is a problem with the infra. But still cant build with the guide from @csukuangfj (have pending to attach the logs of the build process) |
For those who are looking for a ready-to-use solution, I have compiled all the knowledge shared in this thread into this library: https://github.com/diffusion-studio/vits-web . Thanks to everyone here for the awesome solutions and code snippets! |
Re "in the web browser" is tricky because we have to find someway to load these voice files for each time the voice is used, on each origin the voice is used. There is Native Messaging where we can run/control/communicate to and from native applications from the browser. This native-messaging-espeak-ng is one variation of what I've been doing with eSpeak-NG for years now, mainly because I wanted to support SSML input (see SSMLParser), which I don't see mentioned here at all. What this (using Native Messaging) means is that we don't have to compile anything to WASM. We can use piper as-is, send inoput to piper and send the output to the browser. |
An option for using I have added the Tested on Chromium Version 128.0.6586.0 (Developer Build) (64-bit) and Firefox Nightly 130.0a1. Chromium works. Firefox does not load the In pertinent part. Download the Download a couple Create a symbolic link to Install
Modify
or set
Create
Restart Terminate and restart Open DevTools, test in
|
Has anyone here managed to get GPU inference working in the browser? Seems like this could provide massive speedups and would be especially useful for long form content like video narrations or audiobook generation. From what I can see, the current packages for piper in the browser are as follows, but neither support GPU inference.
Wanted to raise this here as a central spot so work is not duplicated. Would transformers.js be usable since piper is an onnx model? Or do we need something else? I am looking to create a web-based audiobook generation program similar to my CLI project QuickPiperAudiobook. Feel free to reach out if anyone is working on similar things / wants to hack on things together. |
The vits-web version is slow. You have to load the WebAssembly module, and voices. Some voices are 60MB. Here's a fork of vits-web that you can test online for yourself https://guest271314.github.io/vits-web/. I created a Native Messaging host to control the execution of the |
@C-Loftus Ideally we compile |
Thanks for your work and context on that @guest271314 ! The native messaging work is very cool. I think I was hoping to have it run entirely in the browser with the GPU and no need to install on the host. At least for my use case, I am fine loading the voice ever time it is used (I don't need real-time speed).
Isn't this WASM compilation already done at https://github.com/diffusionstudio/piper-wasm ? Don't we just need an integration from transformers.js or wonnx? I am not as familiar with some of the lower level browser APIs so sorry if I am missing a connection between them and WebGPU you are trying to point out. |
Then you should be able to use the fork and/or the main vits-web code.
The example runs in the browser.
If you look at the source code of the GitHub Pages example, the Emscripten generated code is JavaScript, not https://github.com/guest271314/vits-web/blob/patch-1/docs/index.js#L1-L2
Ideally we just use the gloabal Something like this
where instead of Native Messaging works for me. I don't have an issue executing code on my own machine from the browser. |
It would be awesome if Piper's awesome TTS could generate the audio locally in the browser e.g: on an old phone, but the dependency on ONNX and the eSpeak variant makes this tricky.
Streaming audio to and from a server is often fine but generating the audio locally could avoid needing to setup server infrastructure, and once cached could be faster, more private and work offline, without caring about network dead spots. It could be great for browser extensions too.
There is an eSpeak-ng "espeakng.js" demo here: https://www.readbeyond.it/espeakng/
With source here: https://github.com/espeak-ng/espeak-ng/tree/master/emscripten
Obviously it's not quite as magical as Piper but I think it's exciting. I can happily hack stuff together with Python and Docker, but I'm out of my depth with compiling stuff to different architectures, so after having a look, I'm backing off for now, but I thought I'd share what I learned in case others with relevant skills were also interested:
Both eSpeak-ng and ONNX Runtime Web have different ways of being compiled, but it turns out that they both are run in browsers via Emscripten.
For whatever it's worth, someone else has a another way of building a subset here: https://github.com/ianmarmour/espeak-ng.js/tree/main
There are ONNX web runtimes too.
ONNX Runtime Web, shares it's parent projects, really massive Python build helper script, but there is a quite helpful FAQ, that indicates it has a static builds, demonstrated with build info too:
https://onnxruntime.ai/docs/build/web.html
https://www.npmjs.com/package/onnxruntime-web
Footnote:
I did have a look at container2wasm for this too, but I couldn't quickly figure out how input and output of files would work.
As well as looking at how Copy.sh's browser x86 emulator, v86 can use Arch with a successfully running Docker implementation! With v86 there are examples of doing input and output with files but getting everything working for x86 with 32 bit architecture seemed too complicated to me and might be a bit much, compared to compiling with Emscripten properly, even if it would potentially be usable for much more than cheekily running lots of arbitrary things in the browser.
P.S: awesome work @synesthesiam !
The text was updated successfully, but these errors were encountered: