add FAQ

01-ai · Feb 19, 2024 · 7ef282a · 7ef282a
1 parent 02aee54
commit 7ef282a
Showing 1 changed file with 15 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -80,8 +80,8 @@ Once finished, the results will be displayed. You may find more details under th
 ## Related Work
 
 - [nuprl/MultiPL-E](https://github.com/nuprl/MultiPL-E/blob/main/prompts/humaneval-jl-transform.jsonl) contains Julia version prompts transformed from the original Python version [HumanEval](https://github.com/openai/human-eval). However, based on my limited Julia programming experience, the prompts are not that accurate and conventional.
-- [Julia-LLM-Leaderboard](https://github.com/svilupp/Julia-LLM-Leaderboard), which focused on practicality and simplicity.
-- [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html) 
+- [Julia-LLM-Leaderboard](https://github.com/svilupp/Julia-LLM-Leaderboard), which focuses on practicality and simplicity.
+- [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html)
 
 ## Future Work
 
@@ -91,6 +91,19 @@ Once finished, the results will be displayed. You may find more details under th
 
 We're hiring! If you're interested in working on code LLM at [01.ai](https://01.ai/), please contact [[email protected]](mailto:[email protected]).
 
+
+## FAQ
+
+- [What are the differences compared to the original Python version?](https://github.com/01-ai/HumanEval.jl/discussions/1)
+- [What are the limitations of this project?](https://github.com/01-ai/HumanEval.jl/discussions/2)
+- [How do LLMs perform compared to human?](https://github.com/01-ai/HumanEval.jl/discussions/3)
+- [How difficult is each problem?](https://github.com/01-ai/HumanEval.jl/discussions/4)
+- [Is GPT4 good enough?](https://github.com/01-ai/HumanEval.jl/discussions/5)
+- [How to make this evaluation higher quality?](https://github.com/01-ai/HumanEval.jl/discussions/6)
+- [How should we measure hallucinations?](https://github.com/01-ai/HumanEval.jl/discussions/7)
+- [Are there any other metrics we should care beyond pass@k?](https://github.com/01-ai/HumanEval.jl/discussions/8)
+- [Why does Yi-34B-Chat perform so poor?](https://github.com/01-ai/HumanEval.jl/discussions/9)
+
 ## Acknowledgement
 
 - This project heavily relies on many features provided by [ReTestItems.jl](https://github.com/JuliaTesting/ReTestItems.jl). Great thanks to [Nick Robinson](https://github.com/nickrobinson251)'s help during the development.