---
title: Arabic Function Calling Leaderboard
emoji: 🏆
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
  - arabic
  - function-calling
  - leaderboard
  - llm-evaluation
---

# 🏆 Arabic Function Calling Leaderboard

لوحة تقييم استدعاء الدوال بالعربية

## Overview

The **Arabic Function Calling Leaderboard (AFCL)** evaluates Large Language Models on their ability to:

1. Understand Arabic queries (MSA + Dialects)
2. Select appropriate functions from available options
3. Extract correct arguments from Arabic text
4. Handle parallel and complex function calls
5. Detect when no function should be called

## Models Evaluated

- **Arabic-Native**: Jais, ALLaM, SILMA, AceGPT
- **Multilingual**: Qwen, Llama, Gemma, Mistral, Phi, BLOOMZ, Aya

## Dataset

📊 **Dataset**: [HeshamHaroon/Arabic_Function_Calling](https://huggingface.co/datasets/HeshamHaroon/Arabic_Function_Calling)

- **1,470 total samples** across 10 categories
- Simple, Multiple, Parallel, Parallel Multiple
- Irrelevance Detection
- Dialect Handling (Egyptian, Gulf, Levantine)

## Evaluation

The leaderboard automatically evaluates models using the HuggingFace Inference API when the Space starts.

## Citation

```bibtex
@misc{afcl2024,
    title={Arabic Function Calling Leaderboard},
    author={Hesham Haroon},
    year={2024},
    url={https://huggingface.co/spaces/HeshamHaroon/Arabic-Function-Calling-Leaderboard}
}
```