Skip to content

Your Live2D desktop assistant powered by LLM! Available for both Windows and MacOS, it senses your screen, retrieves clipboard content, and responds to voice commands with a unique voice. Featuring voice wake-up, singing capabilities, and full computer control for seamless interaction with your favorite character.

License

Notifications You must be signed in to change notification settings

ylxmf2005/LLM-Live2D-Desktop-Assitant

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-Live2D-Desktop-Assitant

Notice

I’m currently working on the reconstruction co-work of the upstream repository (Open-LLM-Vtuber). Once the foundational reconstruction is complete, this repository (Electron version) will be updated accordingly.

I may no longer update the repo because as I am transferring the Electron feature to the upstream repository. You can directly utilize the desktop mode in the upstream repository.

🤗Introduction

Forked From Open-LLM-VTuber and made the following modifications / new features:

  • Integrate with Electron to be the desktop partner. The desktop-mode supports both Windows and MacOS.
  • Add screen sensing function and clipboard content retrieval.
  • Write an Elaina persona prompt.
  • Set the Elaina(LSS) as the default Live2D model and create some expressions and poses.
  • Use GPTSoVITS as the TTS model to clone Elaina's timbre.
  • Improve speak_by_sentence_chain to concurrently TTS subsequent streaming sentences while the current sentence is being spoken.
  • Add a voice wake-up feature. Elaina enters a sleep mode after a certain period (10s) of inactivity following each conversation chain. She can be reactivated using the wake word "Elaina".
  • Add singing functionality using Retrieval-based-Voice-Conversion.
  • Add computer use function using Claude API.
  • Support packaging the frontend as an exe or dmg.

👀Demo

The demo videos don't reflect the latest version.

The leaked API keys in these videos also don't work.

character_switch_demo.mp4
computer_control_demo.MP4
text_io_demo.MP4
tts_and_sing_demo.MP4
vision_demo.MP4
wake_up_demo.MP4

⚠️Statement

To use this project, it is recommended to have at least basic Python programming skills.

Please refer carefully to the original project's Wiki.

For usage details and customization, you might need to consult the relevant project documentation (if you require corresponding components) and read or modify this project's code.

Due to copyright issues, some models used in this project will not be public.

🛠️Usage

Require python >= 3.11.

GPTSoVITS (if needed)
DeepLX (if needed)
  • Launch DeepLX server if you want Elaina to say Japanese (Because the model's responses usually use the same language as the system prompt/user's input), you can run docker run -itd -p 1188:1188 ghcr.io/owo-network/deeplx:latest.
Environment Configuration
  • git clone https://github.com/ylxmf2005/YourElaina
  • pip install requirements.txt
  • Modify conf.yaml according to your needs.

For more details, please read this Wiki.

Wake-up (if needed)
  • Obtain your Picovoice access key.
  • Set the accessKey in static/desktop/vad.js to your own access key.
Clipboard retrieval & Screen sensing (if needed)

Better to use with a snipping tool like Snipaste. Read def get_prompt_and_image in module/conversation_manager.py for details.

For screen sensing, please set your vllm in conf.yaml.

Computer-use (if needed)

The feature is currently running on the backend computer and will be migrated to Electron in the future.

Experimental, only for MacOS. Set your CLAUDE_API_KEY in conf.yaml.

Will support Windows in the future.

Desktop-mode (Dev, recommended)
  • npm install
  • npm start
Desktop-mode (Build, to get exe on Windows, dmg on macOS)
  • npm install
  • npm run build, the executable file (frontend) will be generated in dist/.
    • If you are using Windows, make sure the terminal running npm run build has administrative privileges.
  • python server.py to start backend service (Due to flexibility and environment management, packaging backend is not supported, but may be supported in the future)
  • Open the executable file

Tip: To deploy the frontend and backend in different device, you need to modify window.ws = new WebSocket("ws://127.0.0.1:1017/client-ws"); in static/desktop/websocket.js to your server's address and port (which can be set in conf.yaml).

Web-mode
  • python server.py --web

📋To Do List

  • Sync with the upstream repository (Continuous work).
  • Move computer functions to electron.
  • Add timbre recognition function.
  • Use smarter algorithms to detect if the user has stopped speaking.
  • Enhance the UI by adding input field, chat history.
  • Add more expressions and poses like random idle poses.
  • Allow the LLM to access the Internet.

👏Acknowledgement

About

Your Live2D desktop assistant powered by LLM! Available for both Windows and MacOS, it senses your screen, retrieves clipboard content, and responds to voice commands with a unique voice. Featuring voice wake-up, singing capabilities, and full computer control for seamless interaction with your favorite character.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages

  • Python 82.3%
  • JavaScript 10.4%
  • HTML 6.3%
  • CSS 1.0%