Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3 ChatQA 1.5 in 8B and 70B #292

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

philipkiely-baseten
Copy link
Member

This PR contains reference implementations for Llama 3 ChatQA at both the 8B and 70B size.

These models use the basic interface which assumes you're passing full documents or have pre-processed any retrieval steps for maximum compatibility. There is another interface style where you can use a built-in chunking model before applying context, but I'm assuming the straightforward implementation is better for testing and integrating into existing systems.

Querying the LLM uses the following data structure:

{
  "messages": [
    {"role": "user", "content": "what is the percentage change of the net income from Q4 FY23 to Q4 FY24?"}
  ],
  "context": "NVIDIA (NASDAQ: NVDA) today reported revenue for the fourth quarter ended January 28, 2024, of $22.1 billion, up 22% from the previous quarter and up 265% from a year ago.\nFor the quarter, GAAP earnings per diluted share was $4.93, up 33% from the previous quarter and up 765% from a year ago. Non-GAAP earnings per diluted share was $5.16, up 28% from the previous quarter and up 486% from a year ago.\nQ4 Fiscal 2024 Summary\nGAAP\n| $ in millions, except earnings per share | Q4 FY24 | Q3 FY24 | Q4 FY23 | Q/Q | Y/Y |\n| Revenue | $22,103 | $18,120 | $6,051 | Up 22% | Up 265% |\n| Gross margin | 76.0% | 74.0% | 63.3% | Up 2.0 pts | Up 12.7 pts |\n| Operating expenses | $3,176 | $2,983 | $2,576 | Up 6% | Up 23% |\n| Operating income | $13,615 | $10,417 | $1,257 | Up 31% | Up 983% |\n| Net income | $12,285 | $9,243 | $1,414 | Up 33% | Up 769% |\n| Diluted earnings per share | $4.93 | $3.71 | $0.57 | Up 33% | Up 765% |"
}

Expected response:

<|begin_of_text|>NVIDIA reported a net income of $12,285 million for Q4 fiscal 2024, while the net income for Q4 fiscal 2023 was $1,414 million. The percentage change is calculated using the formula ((12285 - 1414) / 1414 * 100), which results in a 769% increase.<|end_of_text|>

Performance notes:

These models are running on A100 GPUs. You can change the hardware to H100 for better performance if desired. This is not an optimized implementation with VLLM or TensorRT-LLM, so higher TPS and throughput is likely possible for a production implementation. But inference speed is quite usable as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants