Skip to content

Latest commit

 

History

History
127 lines (78 loc) · 5.73 KB

README.md

File metadata and controls

127 lines (78 loc) · 5.73 KB

LLM-Live2D-Desktop-Assitant

Notice

I’m currently working on the reconstruction co-work of the upstream repository (Open-LLM-Vtuber). Once the foundational reconstruction is complete, this repository (Electron version) will be updated accordingly.

I may no longer update the repo because as I am transferring the Electron feature to the upstream repository. You can directly utilize the desktop mode in the upstream repository.

🤗Introduction

Forked From Open-LLM-VTuber and made the following modifications / new features:

  • Integrate with Electron to be the desktop partner. The desktop-mode supports both Windows and MacOS.
  • Add screen sensing function and clipboard content retrieval.
  • Write an Elaina persona prompt.
  • Set the Elaina(LSS) as the default Live2D model and create some expressions and poses.
  • Use GPTSoVITS as the TTS model to clone Elaina's timbre.
  • Improve speak_by_sentence_chain to concurrently TTS subsequent streaming sentences while the current sentence is being spoken.
  • Add a voice wake-up feature. Elaina enters a sleep mode after a certain period (10s) of inactivity following each conversation chain. She can be reactivated using the wake word "Elaina".
  • Add singing functionality using Retrieval-based-Voice-Conversion.
  • Add computer use function using Claude API.
  • Support packaging the frontend as an exe or dmg.

👀Demo

The demo videos don't reflect the latest version.

The leaked API keys in these videos also don't work.

character_switch_demo.mp4
computer_control_demo.MP4
text_io_demo.MP4
tts_and_sing_demo.MP4
vision_demo.MP4
wake_up_demo.MP4

⚠️Statement

To use this project, it is recommended to have at least basic Python programming skills.

Please refer carefully to the original project's Wiki.

For usage details and customization, you might need to consult the relevant project documentation (if you require corresponding components) and read or modify this project's code.

Due to copyright issues, some models used in this project will not be public.

🛠️Usage

Require python >= 3.11.

GPTSoVITS (if needed)
DeepLX (if needed)
  • Launch DeepLX server if you want Elaina to say Japanese (Because the model's responses usually use the same language as the system prompt/user's input), you can run docker run -itd -p 1188:1188 ghcr.io/owo-network/deeplx:latest.
Environment Configuration
  • git clone https://github.com/ylxmf2005/YourElaina
  • pip install requirements.txt
  • Modify conf.yaml according to your needs.

For more details, please read this Wiki.

Wake-up (if needed)
  • Obtain your Picovoice access key.
  • Set the accessKey in static/desktop/vad.js to your own access key.
Clipboard retrieval & Screen sensing (if needed)

Better to use with a snipping tool like Snipaste. Read def get_prompt_and_image in module/conversation_manager.py for details.

For screen sensing, please set your vllm in conf.yaml.

Computer-use (if needed)

The feature is currently running on the backend computer and will be migrated to Electron in the future.

Experimental, only for MacOS. Set your CLAUDE_API_KEY in conf.yaml.

Will support Windows in the future.

Desktop-mode (Dev, recommended)
  • npm install
  • npm start
Desktop-mode (Build, to get exe on Windows, dmg on macOS)
  • npm install
  • npm run build, the executable file (frontend) will be generated in dist/.
    • If you are using Windows, make sure the terminal running npm run build has administrative privileges.
  • python server.py to start backend service (Due to flexibility and environment management, packaging backend is not supported, but may be supported in the future)
  • Open the executable file

Tip: To deploy the frontend and backend in different device, you need to modify window.ws = new WebSocket("ws://127.0.0.1:1017/client-ws"); in static/desktop/websocket.js to your server's address and port (which can be set in conf.yaml).

Web-mode
  • python server.py --web

📋To Do List

  • Sync with the upstream repository (Continuous work).
  • Move computer functions to electron.
  • Add timbre recognition function.
  • Use smarter algorithms to detect if the user has stopped speaking.
  • Enhance the UI by adding input field, chat history.
  • Add more expressions and poses like random idle poses.
  • Allow the LLM to access the Internet.

👏Acknowledgement