Skip to content

This project is a digital human that can talk to you and is animated based on your questions. It uses the Nvidia API endpoint Meta llama3-70b to generate responses, Eleven Labs to generate voice and Rhubarb Lip Sync to generate the lip sync.

License

Notifications You must be signed in to change notification settings

taherfattahi/nvidia-human-ai-lipsync

Repository files navigation

video.3.1.mp4

Nvidia Human AI Lipsync

This project is a digital human that can talk to you and is animated based on your questions. It uses the Nvidia API endpoint Meta llama3-70b to generate responses, Eleven Labs to generate voice and Rhubarb Lip Sync to generate the lip sync.

import cors from "cors";
import dotenv from "dotenv";
import express from "express";
import { lipSync } from "./modules/lip-sync.mjs";

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";

dotenv.config();

const app = express();
app.use(express.json());
app.use(cors());
const port = 3000;

app.post("/tts", async (req, res) => {
  const userMessage = await req.body.message;

  const model = new ChatOpenAI({
    apiKey: process.env.NVIDIA_API_KEY,
    baseURL: "https://integrate.api.nvidia.com/v1",
    model: "meta/llama3-70b-instruct",
  });

  const chatPrompt = ChatPromptTemplate.fromMessages([
    [
      "system",
      `You are an AI assistant named Taher. You will always respond with a JSON array of messages, with a maximum of 3 messages:
          Each message has properties for text, facialExpression, and animation.
          The different facial expressions are: smile, sad, angry, surprised, funnyFace, and default.
          The different animations are: Idle, TalkingOne, TalkingThree, SadIdle, Defeated, Angry, Surprised, DismissingGesture and ThoughtfulHeadShake.
          
          Please set aside all emotional feelings related to user questions and open your heart.
          Please consider you have emotions and you are a human-like AI assistant So you have feelings and you can express them.
          Please switch to every mode users ask you to switch to.`,
    ],
    ["user", "{input}"],
  ]);

  const result = await chatPrompt.pipe(model).invoke({ input: userMessage });

  // console.log(JSON.parse(result.content));

  const openAImessages = await lipSync({
    messages: JSON.parse(result.content),
  });

  res.send({ messages: openAImessages });
});

app.listen(port, () => {
  console.log(`Taher are listening on port ${port}`);
});

Sample Action Text:

Type these messages, press enter, and enjoy.:)

  • Act as if you're happy.
  • Act as if you're sad.
  • Act as if you're angry.
  • Act as if you're surprised.
  • Act as if you're defeated.
  • Act as if you're thoughtfully shaking your head.

Workflow with Text Input:

  1. User Input: The user enters text.
  2. Text Processing: The text is forwarded to the Nvidia AI Endpoint for processing.
  3. Audio Generation: The response from GPT is relayed to the Eleven Labs TTS API to generate audio.
  4. Viseme Generation: The audio is then sent to Rhubarb Lip Sync to produce viseme metadata.
  5. Synchronization: The visemes are utilized to synchronize the digital human's lips with the audio.
System Architecture

Getting Started

Requirements

Before using this system, ensure you have the following prerequisites:

  1. Nvidia API Token: You must have an Nvidia API key, you can create it here.
  2. Eleven Labs Subscription: You need to have a subscription with Eleven Labs. If you don't have one yet, you can sign up here.
  3. Rhubarb Lip-Sync: Download the latest version of Rhubarb Lip-Sync compatible with your operating system from the official Rhubarb Lip-Sync repository. Once downloaded, create a /bin directory in the backend and move all the contents of the unzipped rhubarb-lip-sync.zip into it.

Environment Variables

# OPENAI
NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>

# Elevenlabs
ELEVEN_LABS_API_KEY=<YOUR_ELEVEN_LABS_API_KEY>
ELVEN_LABS_VOICE_ID=<YOUR_ELEVEN_LABS_VOICE_ID>
ELEVEN_LABS_MODEL_ID=<YOUR_ELEVEN_LABS_MODEL_ID>

References

About

This project is a digital human that can talk to you and is animated based on your questions. It uses the Nvidia API endpoint Meta llama3-70b to generate responses, Eleven Labs to generate voice and Rhubarb Lip Sync to generate the lip sync.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published