Skip to content

Commit

Permalink
add FAQ
Browse files Browse the repository at this point in the history
  • Loading branch information
findmyway committed Feb 19, 2024
1 parent 02aee54 commit 7ef282a
Showing 1 changed file with 15 additions and 2 deletions.
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,8 @@ Once finished, the results will be displayed. You may find more details under th
## Related Work

- [nuprl/MultiPL-E](https://github.com/nuprl/MultiPL-E/blob/main/prompts/humaneval-jl-transform.jsonl) contains Julia version prompts transformed from the original Python version [HumanEval](https://github.com/openai/human-eval). However, based on my limited Julia programming experience, the prompts are not that accurate and conventional.
- [Julia-LLM-Leaderboard](https://github.com/svilupp/Julia-LLM-Leaderboard), which focused on practicality and simplicity.
- [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html)
- [Julia-LLM-Leaderboard](https://github.com/svilupp/Julia-LLM-Leaderboard), which focuses on practicality and simplicity.
- [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html)

## Future Work

Expand All @@ -91,6 +91,19 @@ Once finished, the results will be displayed. You may find more details under th

We're hiring! If you're interested in working on code LLM at [01.ai](https://01.ai/), please contact [[email protected]](mailto:[email protected]).


## FAQ

- [What are the differences compared to the original Python version?](https://github.com/01-ai/HumanEval.jl/discussions/1)
- [What are the limitations of this project?](https://github.com/01-ai/HumanEval.jl/discussions/2)
- [How do LLMs perform compared to human?](https://github.com/01-ai/HumanEval.jl/discussions/3)
- [How difficult is each problem?](https://github.com/01-ai/HumanEval.jl/discussions/4)
- [Is GPT4 good enough?](https://github.com/01-ai/HumanEval.jl/discussions/5)
- [How to make this evaluation higher quality?](https://github.com/01-ai/HumanEval.jl/discussions/6)
- [How should we measure hallucinations?](https://github.com/01-ai/HumanEval.jl/discussions/7)
- [Are there any other metrics we should care beyond pass@k?](https://github.com/01-ai/HumanEval.jl/discussions/8)
- [Why does Yi-34B-Chat perform so poor?](https://github.com/01-ai/HumanEval.jl/discussions/9)

## Acknowledgement

- This project heavily relies on many features provided by [ReTestItems.jl](https://github.com/JuliaTesting/ReTestItems.jl). Great thanks to [Nick Robinson](https://github.com/nickrobinson251)'s help during the development.

0 comments on commit 7ef282a

Please sign in to comment.