Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to interact with Taskwarrior using (deep)speech. #101

Open
19 tasks
a-t-0 opened this issue Apr 5, 2022 · 0 comments
Open
19 tasks

Allow users to interact with Taskwarrior using (deep)speech. #101

a-t-0 opened this issue Apr 5, 2022 · 0 comments

Comments

@a-t-0
Copy link
Contributor

a-t-0 commented Apr 5, 2022

  • Set up a local server running the deepspeech model (docs) of Mozilla common voice for the language preferred by the taskwarrior user.

  • Generate a training set (list of arbitrary sentences that user can read out loud) to re-train your own deepspeech model on additional datasets, and to tailor it to your voice.

    • Allow user to easily retrain with a nice interface.
  • Set up hardcoded voice commands for basic tasks, such as: add a task, due date, priority, project, show tasks of project .. on screen. Give summary of tasks of project x.

    • Create a training set to retrain the deepspeech model on these hardcoded commands.
    • Allow user to easily retrain with a nice interface.
  • Create a neural network/AI that performs interpretation of natural speech commands (and asks for clarification on low confidence inputs). Example: "Add a project that is due next tuesday at 1500 for the swimming project with content: email all coaches and priority medium."

  • Ask user if they want to help improve the taskwarrior natural language processing model, by providing spoken task descriptions/taskwarrior commands along with the actual typed/digitial/exact commands. (Either allow:

    • Users to provide a audio fragment and a typed taskwarrior command.
    • Users to provide a audio fragment and a automatically registered taskwarrior command. (Allow users to specify a single easily understandable signal that can be used to tell the agent that the command was interpreted wrongly. E.g. saying:"WRONG". That way the label can be can be set to: Not this taskwarrior command.
    • Do not share audio, and only share natural language processing interpretation of voice command.
  • As soon as time permits, apply differential privacy to ensure the natural language processing does not train on actual taskwarrior commands but on encrypted taskwarrior commands. This to preserve user privacy whilst keeping identical performance (at the cost of (significant) loss of model understandability).

  • Support modularity of speech recognition models.

  • Support modularity of natural language interpretation models.

  • Allow a single integrated model that goes directly from audio to taskwarrior commands.

  • Set up the deepspeech over tor such that you can connect to it from your phone from anywhere in the world regardles of your local networking situation.

  • Included a slimmed down deepspeech model to run with AI accelerator chips on your phone to speed up the response time of deepspeech, if the speech/inference delay is too large to make it feel natural.

  • Set up tor connection on android that transmits speech signals to your own deepspeech server.

  • Allow the user to transmit over clearnet iso tor to speed up response times if desired: https://metrics.torproject.org/onionperf-latencies.html

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant