This hook supports cross browser speech to text if the crossBrowser: true
prop is passed.
By default, the SpeechRecognition API is used which is currently only supported by Chrome browsers.
The SpeechRecognition API does not require any additional setup or API keys, everything works out of the box.
- Written in TypeScript with full type support
- 0 dependencies
- Simple APIs
- Cross-browser optional support
- Small bundle size 11.9kB (4.7kB gzipped)
npm i react-hook-speech-to-text
yarn add react-hook-speech-to-text
https://trusting-perlman-d246f0.netlify.app/
As of 02/02/20 cross-browser speech to text tested and working on Chrome, firefox, Safari for Mac, iOS 13 and 14 Safari. Have not tested on Android but should work on chromium based android browsers.
Unfortunately does not work on firefox or chrome on iOS due to apple only supporting getUserMedia
on safari (thanks apple 💩)
If cross-browser support is needed, the crossBrowser: true
prop must be passed. A Google Cloud Speech-to-Text API key is needed.
This hook makes use of a customized version of recorder.js
for recording audio, down-sampling the audio sampleRate
to <= 48000hz, and converting that audio to WAV format.
The hook then converts the WAV audio blob returned from recorder.js
and converts it into a base64
string using the FileReader
API. This is all needed in order to POST audio data to the Google Cloud Speech-to-Text REST API and get transcribed text returned all on the front-end.
Also used is hark.js
for detecting start and stopped speech events for browsers that don't support the SpeechRecognition
API. If the SpeechRecognition
API is available, SpeechRecognition
API handles start and stop speech events automatically.
import React from 'react';
import useSpeechToText from 'react-hook-speech-to-text';
export default function AnyComponent() {
const {
error,
interimResult,
isRecording,
results,
startSpeechToText,
stopSpeechToText,
} = useSpeechToText({
continuous: true,
useLegacyResults: false
});
if (error) return <p>Web Speech API is not available in this browser 🤷</p>;
return (
<div>
<h1>Recording: {isRecording.toString()}</h1>
<button onClick={isRecording ? stopSpeechToText : startSpeechToText}>
{isRecording ? 'Stop Recording' : 'Start Recording'}
</button>
<ul>
{results.map((result) => (
<li key={result.timestamp}>{result.transcript}</li>
))}
{interimResult && <li>{interimResult}</li>}
</ul>
</div>
);
}
Same code as above with crossBrowser: true
and googleApiKey
props passed
const {
error,
isRecording,
results,
startSpeechToText,
stopSpeechToText,
} = useSpeechToText({
continuous: true,
crossBrowser: true,
googleApiKey: YOUR_GOOGLE_CLOUD_API_KEY_HERE,
useLegacyResults: false
});
Arguments passed to useSpeechToText
invocation
Sets amount in milliseconds to stop recording microphone input after last detected speech event.
Note: timeout
is only applied when crossBrowser: true
arg is passed and using browser that does not support SpeechRecognition
API, else Chrome's Speech Recognition automatically stops recording after a certain amount of time.
- Type:
number
- Required:
false
- Default:
10000
Sets whether or not to keep recording microphone input after speech results are returned
- Type:
boolean
- Required:
false
- Default:
undefined
Callback function invoked on speech detection event. Only invoked on browsers that do not support SpeechRecognition
API
- Type:
() => any
- Required:
false
- Default:
undefined
Callback function invoked on speech stopped event. Only invoked on browsers that do not support SpeechRecognition
API
- Type:
() => any
- Required:
false
- Default:
undefined
Boolean to enable speech to text functionality on browsers that do not support the SpeechRecognition
API. Firefox, Safari, iOS Safari etc.
Note: googleApiKey
is required if using crossBrowser
mode.
- Type:
boolean
- Required:
false
- Default:
false
API key used for Google Cloud Speech to text API for cross browser speech to text functionality.
- Type:
string
- Required:
true
ifcrossBrowser
oruseOnlyGoogleCloud
- Default:
undefined
Thanks to https://github.com/iwgx for the suggestion #2
Optional config object to have more control over the google cloud speech settings. These options will only be passed if using crossBrowser: true
on browsers not supporting SpeechRecognition API, or on all browsers if passing useOnlyGoogleCloud: true
. All options can be found here https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig
- Type:
GoogleCloudRecognitionConfig
- Required:
false
- Default:
undefined
ex:
const { results } = useSpeechToText({
crossBrowser: true,
googleApiKey: process.env.REACT_APP_API_KEY,
googleCloudRecognitionConfig: {
languageCode: 'en-US'
}
});
Optional object of properties to have more control over Google Chrome's SpeechRecognition API. These properties only apply on browsers that support Google Chrome's SpeechRecognition API. All properties can be found here https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition#properties
- Type:
SpeechRecognitionProperties
- Required:
false
- Default:
{ interimResults: true }
ex:
const { results, interimResult } = useSpeechToText({
speechRecognitionProperties: {
lang: 'en-US',
interimResults: true // Allows for displaying real-time speech results
}
});
Boolean to force use of google cloud engine speech-to-text even on browsers that support built in SpeechRecognition API. By default, this package will try to use SpeechRecognition where available since it is free for Chrome and will provide faster results than the cross-browser google cloud engine speech-to-text method. Overwrite this functionality if you want to only use google cloud for more consistent functionality.
Note: if setting to true, you must pass a valid google cloud speech to text API key or no results will be returned.
- Type:
boolean
- Required:
false
- Default:
false
Boolean to return legacy array of string results or the new array of object results. Recommended to pass useLegacyResults: false
as the legacy array of strings results will be removed in a future version.
- Type:
boolean
- Required:
false
- Default:
false
Values returns by the useSpeechToText()
hook
ex: const { results } = useSpeechToText()
Transcribed text from speech on successful speech-to-text transcription.
- Type:
string[] | ResultType[]
- Default:
[]
type ResultType = {
speechBlob?: Blob;
timestamp: number;
transcript: string;
};
Important: When passing useLegacyResults: false
, results will return the new ResultType[]
array of objects. It is recommended to opt into the new results ASAP as the legacy results will be completely removed in a future version.
Note: speechBlob
will only be returned when using google cloud speech to text API as Chrome's SpeechRecognition API does not provide access to the captured microphone data
React setState function to manually set the results state array
Thanks to https://github.com/marharyta for the suggestion #12
- Type:
React.Dispatch<React.SetStateAction<ResultType[]>>
Real-time speech result only for SpeechRecognition
web API if opting-in using speechRecognitionProperties: { interimResults: true }
config prop. Will update using the partial results returned from the web API and gets set back to undefined
on final speech result.
- Type:
string
|undefined
- Default:
undefined
Function to start microphone input recording. Will prompt user with microphone access permission if not previously granted.
- Type:
() => Promise<void>
Function to stop microphone input recording.
- Type:
() => void
Boolean to detect if an active recording is occurring.
- Type:
boolean
- Default:
false
Error string if feature is not supported on current browser
- Type:
string
- Default:
''
npm run start
Code will be output to the /dist
folder using rollup bundler
npm run build
Feel free to open an issue on the repo with any bugs or feature requests, or fork and create a PR with fixes, features or improvements.