How do LLMs perform compared to human? #3

findmyway · 2024-02-19T13:55:35Z

findmyway
Feb 19, 2024
Maintainer

This is how I implemented the benchmark solutions:

First extract solutions from the Python version
Rewrite the Python code into the Julia version.
Rewrite test cases based on my understanding if possible.

Once finished, I evaluated my first attempts all at once. And the pass rate in average is slight above 0.5. So I would say, many LLMs already did it better then me.

You might be surprised that my pass rate is way too low. Honestly speaking, I was also quite surprised, especially given that I've been programming in Julia for years and I had peeked the Python version solution. I found that the failed ones were mainly due to corner cases, incorrect grammar and misunderstanding of problems. Actually the Python version solutions sometimes were quite misleading.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do LLMs perform compared to human? #3

{{title}}

Replies: 0 comments

Select a reply

How do LLMs perform compared to human? #3

findmyway Feb 19, 2024 Maintainer

Replies: 0 comments

findmyway
Feb 19, 2024
Maintainer