Conduct Benchmark Test of Candidate LLMs #20

grayJiaaoLi · 2024-05-07T11:26:55Z

User story

Incorporating model training and obtained data
choose relevant benchmarks for CNCF-focused tasks, for example:
- Natural Language Questions (ARC or HellaSwag)
- Explanatory Tasks (MMLU)
Log the result of each LLM
The benchmark can refer to Eleuther AI Language Model Evaluation Harness

grayJiaaoLi added the User Story Label for User Stories label May 7, 2024

grayJiaaoLi changed the title ~~Conduct Benchmark Testing of Candidate LLMs~~ Conduct Benchmark Test of Candidate LLMs May 7, 2024

grayJiaaoLi changed the title ~~Conduct Benchmark Test of Candidate LLMs~~ Implement Benchmark Test of Candidate LLMs May 7, 2024

grayJiaaoLi changed the title ~~Implement Benchmark Test of Candidate LLMs~~ Conduct Benchmark Test of Candidate LLMs May 7, 2024

grayJiaaoLi added the SP 05 label May 12, 2024