`voskjshttp` and other server examples

voskjshttp.js demo speech-to-text HTTP server
voskjshttp as RHASSPY speech-to-text remote HTTP Server
SocketIO server pseudocode

`voskjshttp.js` demo speech-to-text HTTP server

voskjshttp.js is a very simple HTTP API server able to process concurrent/multi-user transcript requests, using a specific language model.

A dedicated thread is spawned for each transcript processing request, so latency performance will be optimal if your host has multiple cores.

Currently the server support just a single endpoint:

HTTP GET /transcript
HTTP POST /transcript

Server settings:

cd examples && node voskjshttp.js

or, if you installed this package as global:

voskjshttp

Simple demo HTTP JSON server, loading a Vosk engine model to transcript speeches.
package @solyarisoftware/voskjs version 1.1.3, Vosk-api version 0.3.30

The server has two endpoints:

  HTTP GET /transcript
  The request query string arguments contain parameters,
  including a WAV file name already accessible by the server.

  HTTP POST /transcript
  The request query string arguments contain parameters,
  the request body contains the WAV file name to be submitted to the server.

Usage:

  voskjshttp --model=<model directory path> \ 
                [--port=<server port number. Default: 3000>] \ 
                [--path=<server endpoint path. Default: /transcript>] \ 
                [--no-threads]
                [--debug[=<vosk log level>]]

Server settings examples:

  voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086 --debug=2
  # stdout includes the server internal debug logs and Vosk debug logs (log level 2)

  voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086 --debug
  # stdout includes the server internal debug logs without Vosk debug logs (log level -1)

  voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086
  # stdout includes minimal info, just request and response messages

  voskjshttp --model=../models/vosk-model-small-en-us-0.15
  # stdout includes minimal info, default port number is 3000

Client requests examples:

  1. GET /transcript - query string includes just the speech file argument

  curl -s \ 
       -X GET \ 
       -H "Accept: application/json" \ 
       -G \ 
       --data-urlencode speech="../audio/2830-3980-0043.wav" \ 
       http://localhost:3000/transcript

  2. GET /transcript - query string includes arguments: speech, model

  curl -s \ 
       -X GET \ 
       -H "Accept: application/json" \ 
       -G \ 
       --data-urlencode speech="../audio/2830-3980-0043.wav" \ 
       --data-urlencode model="vosk-model-en-us-aspire-0.2" \ 
       http://localhost:3000/transcript

  3. GET /transcript - query string includes arguments: id, speech, model

  curl -s \ 
       -X GET \ 
       -H "Accept: application/json" \ 
       -G \ 
       --data-urlencode id="1620060067830" \ 
       --data-urlencode speech="../audio/2830-3980-0043.wav" \ 
       --data-urlencode model="vosk-model-en-us-aspire-0.2" \ 
       http://localhost:3000/transcript

  4. GET /transcript - includes arguments: id, speech, model, grammar

  curl -s \ 
       -X GET \ 
       -H "Accept: application/json" \ 
       -G \ 
       --data-urlencode id="1620060067830" \ 
       --data-urlencode speech="../audio/2830-3980-0043.wav" \ 
       --data-urlencode model="vosk-model-en-us-aspire-0.2" \ 
       --data-urlencode grammar="["experience proves this"]" \ 
       http://localhost:3000/transcript

  5. POST /transcript - body includes the speech file

  curl -s \ 
       -X POST \ 
       -H "Accept: application/json" \ 
       -H "Content-Type: audio/wav" \ 
       --data-binary="@../audio/2830-3980-0043.wav" \ 
       "http://localhost:3000/transcript?id=1620060067830&model=vosk-model-en-us-aspire-0.2"

Server run example:

voskjshttp --model=../models/vosk-model-small-en-us-0.15

Client call example:

curl \
  -s \
  -H "Accept: application/json" \
  -G \
  --data-urlencode id="283039800043" \
  --data-urlencode speech="../audio/2830-3980-0043.wav" \
  --data-urlencode model="vosk-model-small-en-us-0.15" \
  http://localhost:3000/transcript \
  | python3 -m json.tool

The JSON returned by the transcript endpoint:

{
    "id": "283039800043",
    "latency": 575,
    "vosk": {
        "result": [
            {
                "conf": 1,
                "end": 1.02,
                "start": 0.36,
                "word": "experience"
            },
            {
                "conf": 1,
                "end": 1.35,
                "start": 1.02,
                "word": "proves"
            },
            {
                "conf": 1,
                "end": 1.74,
                "start": 1.35,
                "word": "this"
            }
        ],
        "text": "experience proves this"
    }
}

Server side stdout:

1621335095393 Model path: ../models/vosk-model-small-en-us-0.15
1621335095395 Model name: vosk-model-small-en-us-0.15
1621335095395 HTTP server port: 3000
1621335095395 internal debug log: false
1621335095395 Vosk log level: -1
1621335095395 wait loading Vosk model: vosk-model-small-en-us-0.15 (be patient)
1621335095710 Vosk model loaded in 314 msecs
1621335095712 server voskjshttp.js running at http://localhost:3000
1621335095712 endpoint http://localhost:3000/transcript
1621335095712 press Ctrl-C to shutdown
1621335095713 ready to listen incoming requests
1621335101648 request 283039800043 ../audio/2830-3980-0043.wav vosk-model-small-en-us-0.15 undefined
1621335102223 response 283039800043 {"id":"283039800043","latency":574,"result":[{"conf":1,"end":1.02,"start":0.36,"word":"experience"},{"conf":1,"end":1.35,"start":1.02,"word":"proves"},{"conf":1,"end":1.74,"start":1.35,"word":"this"}],"text":"experience proves this"}
^[^C1621335336951 SIGINT received
1621335337010 Shutdown done

client request query string arguments

speech

The request quesry string argument is mandatory. It specifies the speech WAV file for the server speech-to-text transcription

model

The argument is optional. If specified, the server verifies if it matches with the model name of the server-side loaded model If the argument is not specified, the server doesn't make any control, just using the loaded model In this case the client call is just:

curl \
  -H "Accept: application/json" \
  -G \
  --data-urlencode speech="../audio/2830-3980-0043.wav" \
  http://localhost:3000/transcript

The HTTP server corresponding log is:

node voskjshttp --model=../models/vosk-model-small-en-us-0.15

1620312429756 Model path: ../models/vosk-model-small-en-us-0.15
1620312429758 Model name: vosk-model-small-en-us-0.15
1620312429758 HTTP server port: 3000
1620312429758 internal debug log: false
1620312429758 Vosk log level: -1
1620312429758 wait loading Vosk model: vosk-model-small-en-us-0.15 (be patient)
1620312430058 Vosk model loaded in 300 msecs
1620312430060 server voskjshttp.js running at http://localhost:3000
1620312430060 endpoint http://localhost:3000/transcript
1620312430060 press Ctrl-C to shutdown
1620312430060 ready to listen incoming requests
1620312435318 request {"id":1620312435283,"speech":"../audio/2830-3980-0043.wav","model":"vosk-model-small-en-us-0.15","grammar":["experience proves this","why should one hold on the way","your power is sufficient i said"]}
1620312435941 response 1620312435283 {"request":{"id":1620312435283,"speech":"../audio/2830-3980-0043.wav","model":"vosk-model-small-en-us-0.15","grammar":["experience proves this","why should one hold on the way","your power is sufficient i said"]},"id":1620312435283,"latency":623,"result":[{"conf":1,"end":1.02,"start":0.36,"word":"experience"},{"conf":1,"end":1.35,"start":1.02,"word":"proves"},{"conf":1,"end":1.74,"start":1.35,"word":"this"}],"text":"experience proves this"}

Server JSON response

The HTTP response returns a JSON data structure containing:

speech the name of the speech file in the request
model the name of the model (the language) in the request
id an "UUID" that's the unix epoch timestamp that identify the incoming request and it could be used for debug.
latency the elapsed time, in milliseconds, required to elaborate the request
result the data structure returned by Vosk transcript function.

HTTP Server Tests

tests/ directory contains some utility bash scripts (client*.sh) to test the server endpoint with GEt and POST methods.

`voskjshttp` as RHASSPY speech-to-text remote HTTP Server

RHASSPY is an open source, fully offline set of voice assistant services.

RHASSPY uses, as option, a Remote HTTP Server to transform speech (WAV) to text. This is typically used in a client/server set up, where Rhasspy does speech/intent recognition on a home server with decent CPU/RAM available.

You can run voskjshttp as RHASSPY speech-to-text remote HTTP Server Following these specifications:

https://rhasspy.readthedocs.io/en/latest/speech-to-text/#remote-http-server
https://rhasspy.readthedocs.io/en/latest/usage/#http-api
https://rhasspy.readthedocs.io/en/latest/reference/#http-api

Install the server

Install on your home server, as described here:
- Vosk
- npm install -g @solyarisoftware/voskjs
- A Vosk language model of your choice
Run the server

Warning: currently, because a bug in the Node-C++ interface of Vosk-API lib, multithreading causes a crash: #3 Two temporary alternative workarounds proposed:
- Vosk Multithreading enabled
  
  Use a Node version previous to v. 14. See: alphacep/vosk-api#516 (comment)
```
voskjshttp \
  --model=models/vosk-model-small-en-us-0.15 \
  --path=/api/speech-to-text \
  --port=12101
```
- Vosk Multithreading disabled
  
  Use any Node version successive v.13 but disable multithreading in voskjshttp, with a command line flag --no-threads.
  
  This option seems to be a nonsense, because in this way the server just serve one request a time (that will saturate a CPU core for hundreds of milliseconds, also blocking the Node main thread). Nevertheless the lack of multithreading could be acceptable to serve few satellites (clients) in a small (home) environment.
```
voskjshttp \
  --model=models/vosk-model-small-en-us-0.15 \
  --path=/api/speech-to-text \
  --port=12101 \
  --no-threads
```

Curl client tests

Two bash scripts are available in the tests/ directory:

clientRHASSPYtext.sh get a text/plain response from the server
```
clientRHASSPYtext.sh
```
```
experience proves this
```

clientRHASSPYjson.sh get an application/json response from the server

clientRHASSPYjson.sh

{ 
   "id": 1622012841793,
   "latency": 570,
   "vosk": {
       "result": [
           {
               "conf": 1,
               "end": 1.02,
               "start": 0.36,
               "word": "experience"
           },
           {
               "conf": 1,
               "end": 1.35,
               "start": 1.02,
               "word": "proves"
           },
           {
               "conf": 1,
               "end": 1.74,
               "start": 1.35,
               "word": "this"
           }
       ],
       "text": "experience proves this"
   }
}

SocketIO server pseudocode

HTTP server is not the only way to go! Consider by example a client-server architecture using socketio websocket-based real-time bidirectional event-based communication library.

Here below a simplified server-side pseudo-code taht shows how to use voskJs transcript:

const {transcript, toPCM } = require('voskjs')
const app = require('express')()

// get SSL certificate
const credentials = {
  key: fs.readFileSync(KEY_FILENAME, 'utf8'), 
  cert: fs.readFileSync(CERT_FILENAME, 'utf8')
}

// create the https server
const server = https.createServer(credentials, app)

// create the socketio channel 
const io = require('socket.io')(server)

// a websocket message arrived
io.on('connection', (socket) => {
  // the client sent an audio buffer
  socket.on('audioMessage', msg => {

    // save audio buffer into a local file, giving a unique name
    const audioFileCompressed = filenameUUID()
    await msgToAudioFile(audioFileCompressed, msg)

    // convert the received audio into a PCM buffer 
    const buffer = toPCM(audioFileCompressed)

    // voskjs speech to text 
    const voskResult = await transcriptFromBuffer(buffer, model)

  })
})

top | back | home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

servers.md

servers.md

`voskjshttp` and other server examples

`voskjshttp.js` demo speech-to-text HTTP server

client request query string arguments

Server JSON response

HTTP Server Tests

`voskjshttp` as RHASSPY speech-to-text remote HTTP Server

SocketIO server pseudocode

Files

servers.md

Latest commit

History

servers.md

File metadata and controls

voskjshttp and other server examples

voskjshttp.js demo speech-to-text HTTP server

client request query string arguments

Server JSON response

HTTP Server Tests

voskjshttp as RHASSPY speech-to-text remote HTTP Server

SocketIO server pseudocode

`voskjshttp` and other server examples

`voskjshttp.js` demo speech-to-text HTTP server

`voskjshttp` as RHASSPY speech-to-text remote HTTP Server