Skip to content

opensorceror/deepspeech-websocket-server

Repository files navigation

DeepSpeech websocket server

pipeline status coverage report

This repository contains a simple service that receives audio data from clients, and serves the results of Mozilla DeepSpeech inference over a websocket. The server code in this project is a modified version of this GitHub project.

Because STT transcriptions can typically be considered "long running tasks", using websockets for client-server communication provides several benefits:

  1. Avoids all sorts of timeouts at several points in the path - for example at the client, server, load balancer and/or proxy, etc.
  2. Avoids the need for the client to poll the server for result, as well as avoids the complexity that is typically induced by a polling-based architecture

Configuration

Server configuration is specified in the application.conf file.

Usage

Starting the server

Make sure your model and scorer files are present in the same directory as the application.conf file. Then execute:

python -m deepspeech_server.app

Sending requests to server

The client-server request-response process looks like the following:

  1. Client opens websocket W to server
  2. Client sends binary audio data via W
  3. Server responds with transcribed text via W once transcription process is completed. The server's response is in JSON format.
  4. Server closes W

The time t taken by the transcription process depends on several factors, such as the duration of the audio, how busy the service is, etc. Under normal circumstances, t is roughly the same as the duration of the provided audio.

Because this service uses websockets, it is currently not possible to interact with it using certain HTTP clients which do not support websockets, like curl. The following example uses the Python websocket-client package.

import websocket
    
ws = websocket.WebSocket()
ws.connect("ws://localhost:8080/api/v1/stt")

with open("audiofile.wav", mode='rb') as file:  # b is important -> binary
    audio = file.read()
    ws.send_binary(audio)
    result =  ws.recv()
    print(result) # Print text transcription received from server

Example output:

{"text": "experience proves this", "time": 2.4083645999999987}

Deployment

Kubernetes

The helm directory contains an example Helm deployment, that configures an Nginx ingress to expose the DeepSpeech service. The websocket timeout on the ingress is set to 1 hour.

Contributing

Bug reports and merge requests are welcome.

Running pylint analysis

pylint deepspeech_server

Running tests

To run tests without coverage, execute:

python -m pytest tests/test_app.py

To run tests with coverage, and to print coverage to the terminal and write a coverage report, execute:

python -m pytest -p pytest_cov --cov=deepspeech_server --cov-report=xml --cov-report=term \
		  --junitxml=pytest-report.xml tests/test_app.py

About

A server that serves Mozilla DeepSpeech inference over websockets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published