Running on CPU Upgrade 1 BigCodeBench Evaluator ๐ฅ Evaluate code generation models on the BigCodeBench benchmark