Add voice command integration with whisper #66

JamesHWade · 2023-03-23T00:29:05Z

Use whisper to incorporate voice commands.

h/t @jwijffels
See: https://twitter.com/jwijffels1/status/1638650905636618241?s=20

MichelNivard · 2023-03-23T15:21:45Z

We could make this a potentially powerful tool for people who have disabilities! If we let people voice type code, and it’s transcribe, we can have GPT evaluate it, and suggest edits (assuming there are going to be transcription imperfections). So Whisper hears “assign a random sample from a normal with mean 2 too variable C” then GPT writes code and its pasted, and if it’s evaluated not to be valid R code 2/3 alternates are offered?

jwijffels · 2023-03-23T20:17:58Z

Do you know of an R package which allows to record audio?

MichelNivard · 2023-03-24T10:27:06Z

Id say a pipeline thats loosely:

`
audio <- record() # https://search.r-project.org/CRAN/refmans/audio/html/record.html

transcript <- whisper_api_call(audio) # we write based on this API call: POST https://api.openai.com/v1/audio/transcriptio explained here: https://platform.openai.com/docs/api-reference/audio/create

chat_response <- gpt_chat(..., instruction = transcript)

`
My question is do we want a tiny shiny app to open up with a button to record? or button for edit, create and chat? And then the response is either in chat or at cursor? Do e want to design a prompt that if you click the button "whisper to CRAN" it tries to change your natural language into code?

jwijffels · 2023-03-24T11:18:24Z

You can indeed of course make an API call and send the voice with a http request to an openai endpoint.
In that case, there is no need to use https://github.com/bnosac/audio.whisper as that R package does the transcription offline with the whisper model.

JamesHWade · 2023-03-27T15:40:38Z

I played around with this a good bit over the weekend. It's tough to get it working consistently. Milage was quite different with different browsers and even versions of RStudio. The issue is mostly related to collecting the recording. Whisper API is straightforward.

jwijffels · 2023-03-27T21:02:44Z

Maybe a javascript library exists which does cross-platform audio recording in the browser and an R interface can be created on top of it wrapped alongside htmltools?
Probably making a htmltools interface to https://www.npmjs.com/package/recordrtc will do the trick to record the audio.

calderonsamuel · 2023-06-22T20:14:58Z

This looks promising https://github.com/coolbutuseless/carelesswhisper

JamesHWade added the feature a feature request or enhancement label Mar 23, 2023

JamesHWade added stuck 🚧 Issue has been around for a while with no obvious solution and removed feature a feature request or enhancement labels Jun 3, 2023

calderonsamuel mentioned this issue Jan 9, 2024

Upkeep for gptstudio (2024) #164

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add voice command integration with whisper #66

Add voice command integration with whisper #66

JamesHWade commented Mar 23, 2023

MichelNivard commented Mar 23, 2023

jwijffels commented Mar 23, 2023

MichelNivard commented Mar 24, 2023

jwijffels commented Mar 24, 2023

JamesHWade commented Mar 27, 2023

jwijffels commented Mar 27, 2023 •

edited

Loading

calderonsamuel commented Jun 22, 2023

Add voice command integration with whisper #66

Add voice command integration with whisper #66

Comments

JamesHWade commented Mar 23, 2023

MichelNivard commented Mar 23, 2023

jwijffels commented Mar 23, 2023

MichelNivard commented Mar 24, 2023

jwijffels commented Mar 24, 2023

JamesHWade commented Mar 27, 2023

jwijffels commented Mar 27, 2023 • edited Loading

calderonsamuel commented Jun 22, 2023

jwijffels commented Mar 27, 2023 •

edited

Loading