LLMEvaluation

Found out that using A100 and V100 on Vicuna and Llama2 have a different result, while other model such as Falcon doesn't has such question. The results are here.

Experiment

Running the experiment on Google Colab Pro +

Model Evaluation

We use four LLM benchmarks to evaluate the model.

Hellaswag: acc
Truthfulqa_mc: mc1, mc2
Arc_challenge: acc
MMLU(HendrycksTest): Average score of all test acc.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
A100		A100
V100		V100
ModelEval.ipynb		ModelEval.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMEvaluation

Experiment

Model Evaluation

About

Releases

Packages

Languages

Paulyang80/LLMEvaluation-A100-vs-V100-

Folders and files

Latest commit

History

Repository files navigation

LLMEvaluation

Experiment

Model Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages