Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add voice command integration with whisper #66

Open
JamesHWade opened this issue Mar 23, 2023 · 7 comments
Open

Add voice command integration with whisper #66

JamesHWade opened this issue Mar 23, 2023 · 7 comments
Labels
stuck 🚧 Issue has been around for a while with no obvious solution

Comments

@JamesHWade
Copy link
Collaborator

Use whisper to incorporate voice commands.

h/t @jwijffels
See: https://twitter.com/jwijffels1/status/1638650905636618241?s=20

@JamesHWade JamesHWade added the feature a feature request or enhancement label Mar 23, 2023
@MichelNivard
Copy link
Owner

We could make this a potentially powerful tool for people who have disabilities! If we let people voice type code, and it’s transcribe, we can have GPT evaluate it, and suggest edits (assuming there are going to be transcription imperfections). So Whisper hears “assign a random sample from a normal with mean 2 too variable C” then GPT writes code and its pasted, and if it’s evaluated not to be valid R code 2/3 alternates are offered?

@jwijffels
Copy link

Do you know of an R package which allows to record audio?

@MichelNivard
Copy link
Owner

Id say a pipeline thats loosely:

`
audio <- record() # https://search.r-project.org/CRAN/refmans/audio/html/record.html

transcript <- whisper_api_call(audio) # we write based on this API call: POST https://api.openai.com/v1/audio/transcriptio explained here: https://platform.openai.com/docs/api-reference/audio/create

chat_response <- gpt_chat(..., instruction = transcript)

`
My question is do we want a tiny shiny app to open up with a button to record? or button for edit, create and chat? And then the response is either in chat or at cursor? Do e want to design a prompt that if you click the button "whisper to CRAN" it tries to change your natural language into code?

@jwijffels
Copy link

You can indeed of course make an API call and send the voice with a http request to an openai endpoint.
In that case, there is no need to use https://github.com/bnosac/audio.whisper as that R package does the transcription offline with the whisper model.

@JamesHWade
Copy link
Collaborator Author

I played around with this a good bit over the weekend. It's tough to get it working consistently. Milage was quite different with different browsers and even versions of RStudio. The issue is mostly related to collecting the recording. Whisper API is straightforward.

@jwijffels
Copy link

jwijffels commented Mar 27, 2023

Maybe a javascript library exists which does cross-platform audio recording in the browser and an R interface can be created on top of it wrapped alongside htmltools?
Probably making a htmltools interface to https://www.npmjs.com/package/recordrtc will do the trick to record the audio.

@JamesHWade JamesHWade added stuck 🚧 Issue has been around for a while with no obvious solution and removed feature a feature request or enhancement labels Jun 3, 2023
@calderonsamuel
Copy link
Collaborator

This looks promising https://github.com/coolbutuseless/carelesswhisper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stuck 🚧 Issue has been around for a while with no obvious solution
Projects
None yet
Development

No branches or pull requests

4 participants