AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators

jasonchou9877@gmail.com; {nickaliu,wigginzhou,faxonlian}@tencent.com
Hunyuan Team, Tencent
*Equal Contributions Corresponding Authors
AutoCodeBench-v2 Leaderboard
HumanEval Overfitting
HumanEval Overfitting