DeepSpeech websocket server

This repository contains a simple service that receives audio data from clients, and serves the results of Mozilla DeepSpeech inference over a websocket. The server code in this project is a modified version of this GitHub project.

Because STT transcriptions can typically be considered "long running tasks", using websockets for client-server communication provides several benefits:

Avoids all sorts of timeouts at several points in the path - for example at the client, server, load balancer and/or proxy, etc.
Avoids the need for the client to poll the server for result, as well as avoids the complexity that is typically induced by a polling-based architecture

Configuration

Server configuration is specified in the application.conf file.

Usage

Starting the server

Make sure your model and scorer files are present in the same directory as the application.conf file. Then execute:

python -m deepspeech_server.app

Sending requests to server

The client-server request-response process looks like the following:

Client opens websocket W to server
Client sends binary audio data via W
Server responds with transcribed text via W once transcription process is completed. The server's response is in JSON format.
Server closes W

The time t taken by the transcription process depends on several factors, such as the duration of the audio, how busy the service is, etc. Under normal circumstances, t is roughly the same as the duration of the provided audio.

Because this service uses websockets, it is currently not possible to interact with it using certain HTTP clients which do not support websockets, like curl. The following example uses the Python websocket-client package.

import websocket
    
ws = websocket.WebSocket()
ws.connect("ws://localhost:8080/api/v1/stt")

with open("audiofile.wav", mode='rb') as file:  # b is important -> binary
    audio = file.read()
    ws.send_binary(audio)
    result =  ws.recv()
    print(result) # Print text transcription received from server

Example output:

{"text": "experience proves this", "time": 2.4083645999999987}

Deployment

Kubernetes

The helm directory contains an example Helm deployment, that configures an Nginx ingress to expose the DeepSpeech service. The websocket timeout on the ingress is set to 1 hour.

Contributing

Bug reports and merge requests are welcome.

Running pylint analysis

pylint deepspeech_server

Running tests

To run tests without coverage, execute:

python -m pytest tests/test_app.py

To run tests with coverage, and to print coverage to the terminal and write a coverage report, execute:

python -m pytest -p pytest_cov --cov=deepspeech_server --cov-report=xml --cov-report=term \
		  --junitxml=pytest-report.xml tests/test_app.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
deepspeech_server		deepspeech_server
helm/deepspeech-server		helm/deepspeech-server
tests		tests
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
application.conf		application.conf
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSpeech websocket server

Configuration

Usage

Starting the server

Sending requests to server

Deployment

Kubernetes

Contributing

Running pylint analysis

Running tests

About

Releases

Packages

Languages

License

opensorceror/deepspeech-websocket-server

Folders and files

Latest commit

History

Repository files navigation

DeepSpeech websocket server

Configuration

Usage

Starting the server

Sending requests to server

Deployment

Kubernetes

Contributing

Running pylint analysis

Running tests

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages