voskjshttp.js
demo speech-to-text HTTP servervoskjshttp
as RHASSPY speech-to-text remote HTTP Server- SocketIO server pseudocode
voskjshttp.js
is a very simple HTTP API server
able to process concurrent/multi-user transcript requests, using a specific language model.
A dedicated thread is spawned for each transcript processing request, so latency performance will be optimal if your host has multiple cores.
Currently the server support just a single endpoint:
- HTTP GET /transcript
- HTTP POST /transcript
Server settings:
cd examples && node voskjshttp.js
or, if you installed this package as global:
voskjshttp
Simple demo HTTP JSON server, loading a Vosk engine model to transcript speeches.
package @solyarisoftware/voskjs version 1.1.3, Vosk-api version 0.3.30
The server has two endpoints:
HTTP GET /transcript
The request query string arguments contain parameters,
including a WAV file name already accessible by the server.
HTTP POST /transcript
The request query string arguments contain parameters,
the request body contains the WAV file name to be submitted to the server.
Usage:
voskjshttp --model=<model directory path> \
[--port=<server port number. Default: 3000>] \
[--path=<server endpoint path. Default: /transcript>] \
[--no-threads]
[--debug[=<vosk log level>]]
Server settings examples:
voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086 --debug=2
# stdout includes the server internal debug logs and Vosk debug logs (log level 2)
voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086 --debug
# stdout includes the server internal debug logs without Vosk debug logs (log level -1)
voskjshttp --model=../models/vosk-model-en-us-aspire-0.2 --port=8086
# stdout includes minimal info, just request and response messages
voskjshttp --model=../models/vosk-model-small-en-us-0.15
# stdout includes minimal info, default port number is 3000
Client requests examples:
1. GET /transcript - query string includes just the speech file argument
curl -s \
-X GET \
-H "Accept: application/json" \
-G \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
http://localhost:3000/transcript
2. GET /transcript - query string includes arguments: speech, model
curl -s \
-X GET \
-H "Accept: application/json" \
-G \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
--data-urlencode model="vosk-model-en-us-aspire-0.2" \
http://localhost:3000/transcript
3. GET /transcript - query string includes arguments: id, speech, model
curl -s \
-X GET \
-H "Accept: application/json" \
-G \
--data-urlencode id="1620060067830" \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
--data-urlencode model="vosk-model-en-us-aspire-0.2" \
http://localhost:3000/transcript
4. GET /transcript - includes arguments: id, speech, model, grammar
curl -s \
-X GET \
-H "Accept: application/json" \
-G \
--data-urlencode id="1620060067830" \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
--data-urlencode model="vosk-model-en-us-aspire-0.2" \
--data-urlencode grammar="["experience proves this"]" \
http://localhost:3000/transcript
5. POST /transcript - body includes the speech file
curl -s \
-X POST \
-H "Accept: application/json" \
-H "Content-Type: audio/wav" \
--data-binary="@../audio/2830-3980-0043.wav" \
"http://localhost:3000/transcript?id=1620060067830&model=vosk-model-en-us-aspire-0.2"
Server run example:
voskjshttp --model=../models/vosk-model-small-en-us-0.15
Client call example:
curl \
-s \
-H "Accept: application/json" \
-G \
--data-urlencode id="283039800043" \
--data-urlencode speech="../audio/2830-3980-0043.wav" \
--data-urlencode model="vosk-model-small-en-us-0.15" \
http://localhost:3000/transcript \
| python3 -m json.tool
The JSON returned by the transcript endpoint:
{
"id": "283039800043",
"latency": 575,
"vosk": {
"result": [
{
"conf": 1,
"end": 1.02,
"start": 0.36,
"word": "experience"
},
{
"conf": 1,
"end": 1.35,
"start": 1.02,
"word": "proves"
},
{
"conf": 1,
"end": 1.74,
"start": 1.35,
"word": "this"
}
],
"text": "experience proves this"
}
}
Server side stdout:
1621335095393 Model path: ../models/vosk-model-small-en-us-0.15
1621335095395 Model name: vosk-model-small-en-us-0.15
1621335095395 HTTP server port: 3000
1621335095395 internal debug log: false
1621335095395 Vosk log level: -1
1621335095395 wait loading Vosk model: vosk-model-small-en-us-0.15 (be patient)
1621335095710 Vosk model loaded in 314 msecs
1621335095712 server voskjshttp.js running at http://localhost:3000
1621335095712 endpoint http://localhost:3000/transcript
1621335095712 press Ctrl-C to shutdown
1621335095713 ready to listen incoming requests
1621335101648 request 283039800043 ../audio/2830-3980-0043.wav vosk-model-small-en-us-0.15 undefined
1621335102223 response 283039800043 {"id":"283039800043","latency":574,"result":[{"conf":1,"end":1.02,"start":0.36,"word":"experience"},{"conf":1,"end":1.35,"start":1.02,"word":"proves"},{"conf":1,"end":1.74,"start":1.35,"word":"this"}],"text":"experience proves this"}
^[^C1621335336951 SIGINT received
1621335337010 Shutdown done
-
speech
The request quesry string argument is mandatory. It specifies the speech WAV file for the server speech-to-text transcription
-
model
The argument is optional. If specified, the server verifies if it matches with the model name of the server-side loaded model If the argument is not specified, the server doesn't make any control, just using the loaded model In this case the client call is just:
curl \ -H "Accept: application/json" \ -G \ --data-urlencode speech="../audio/2830-3980-0043.wav" \ http://localhost:3000/transcript
The HTTP server corresponding log is:
node voskjshttp --model=../models/vosk-model-small-en-us-0.15
1620312429756 Model path: ../models/vosk-model-small-en-us-0.15 1620312429758 Model name: vosk-model-small-en-us-0.15 1620312429758 HTTP server port: 3000 1620312429758 internal debug log: false 1620312429758 Vosk log level: -1 1620312429758 wait loading Vosk model: vosk-model-small-en-us-0.15 (be patient) 1620312430058 Vosk model loaded in 300 msecs 1620312430060 server voskjshttp.js running at http://localhost:3000 1620312430060 endpoint http://localhost:3000/transcript 1620312430060 press Ctrl-C to shutdown 1620312430060 ready to listen incoming requests 1620312435318 request {"id":1620312435283,"speech":"../audio/2830-3980-0043.wav","model":"vosk-model-small-en-us-0.15","grammar":["experience proves this","why should one hold on the way","your power is sufficient i said"]} 1620312435941 response 1620312435283 {"request":{"id":1620312435283,"speech":"../audio/2830-3980-0043.wav","model":"vosk-model-small-en-us-0.15","grammar":["experience proves this","why should one hold on the way","your power is sufficient i said"]},"id":1620312435283,"latency":623,"result":[{"conf":1,"end":1.02,"start":0.36,"word":"experience"},{"conf":1,"end":1.35,"start":1.02,"word":"proves"},{"conf":1,"end":1.74,"start":1.35,"word":"this"}],"text":"experience proves this"}
The HTTP response returns a JSON data structure containing:
speech
the name of the speech file in the requestmodel
the name of the model (the language) in the requestid
an "UUID" that's the unix epoch timestamp that identify the incoming request and it could be used for debug.latency
the elapsed time, in milliseconds, required to elaborate the requestresult
the data structure returned by Vosk transcript function.
tests/
directory contains some utility bash scripts (client*.sh) to test the server endpoint with GEt and POST methods.
RHASSPY is an open source, fully offline set of voice assistant services.
RHASSPY uses, as option, a Remote HTTP Server to transform speech (WAV) to text. This is typically used in a client/server set up, where Rhasspy does speech/intent recognition on a home server with decent CPU/RAM available.
You can run voskjshttp as RHASSPY speech-to-text remote HTTP Server Following these specifications:
- https://rhasspy.readthedocs.io/en/latest/speech-to-text/#remote-http-server
- https://rhasspy.readthedocs.io/en/latest/usage/#http-api
- https://rhasspy.readthedocs.io/en/latest/reference/#http-api
-
Install the server
Install on your home server, as described here:
- Vosk
npm install -g @solyarisoftware/voskjs
- A Vosk language model of your choice
-
Run the server
Warning: currently, because a bug in the Node-C++ interface of Vosk-API lib, multithreading causes a crash: #3 Two temporary alternative workarounds proposed:
-
Vosk Multithreading enabled
Use a Node version previous to v. 14. See: alphacep/vosk-api#516 (comment)
voskjshttp \ --model=models/vosk-model-small-en-us-0.15 \ --path=/api/speech-to-text \ --port=12101
-
Vosk Multithreading disabled
Use any Node version successive v.13 but disable multithreading in
voskjshttp
, with a command line flag--no-threads
.This option seems to be a nonsense, because in this way the server just serve one request a time (that will saturate a CPU core for hundreds of milliseconds, also blocking the Node main thread). Nevertheless the lack of multithreading could be acceptable to serve few satellites (clients) in a small (home) environment.
voskjshttp \ --model=models/vosk-model-small-en-us-0.15 \ --path=/api/speech-to-text \ --port=12101 \ --no-threads
-
-
Curl client tests
Two bash scripts are available in the tests/ directory:
-
clientRHASSPYtext.sh
get a text/plain response from the serverclientRHASSPYtext.sh
experience proves this
-
clientRHASSPYjson.sh
get an application/json response from the serverclientRHASSPYjson.sh
{ "id": 1622012841793, "latency": 570, "vosk": { "result": [ { "conf": 1, "end": 1.02, "start": 0.36, "word": "experience" }, { "conf": 1, "end": 1.35, "start": 1.02, "word": "proves" }, { "conf": 1, "end": 1.74, "start": 1.35, "word": "this" } ], "text": "experience proves this" } }
-
HTTP server is not the only way to go! Consider by example a client-server architecture using socketio websocket-based real-time bidirectional event-based communication library.
Here below a simplified server-side pseudo-code taht shows how to use voskJs transcript:
const {transcript, toPCM } = require('voskjs')
const app = require('express')()
// get SSL certificate
const credentials = {
key: fs.readFileSync(KEY_FILENAME, 'utf8'),
cert: fs.readFileSync(CERT_FILENAME, 'utf8')
}
// create the https server
const server = https.createServer(credentials, app)
// create the socketio channel
const io = require('socket.io')(server)
// a websocket message arrived
io.on('connection', (socket) => {
// the client sent an audio buffer
socket.on('audioMessage', msg => {
// save audio buffer into a local file, giving a unique name
const audioFileCompressed = filenameUUID()
await msgToAudioFile(audioFileCompressed, msg)
// convert the received audio into a PCM buffer
const buffer = toPCM(audioFileCompressed)
// voskjs speech to text
const voskResult = await transcriptFromBuffer(buffer, model)
})
})