A newer version of the Gradio SDK is available:
6.0.2
title: Agentic Codenames Arena
emoji: 📊
colorFrom: blue
colorTo: blue
python_version: 3.12.6
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Time for the LLMs to have some fun with Codenames!
tags:
- mcp-in-action-track-creative
- mcp-in-action-track-consumer
- Google
- Gemini
- Anthropic
- OpenAI
- HuggingFace
- ElevenLabs
🧠 Agentic Codenames Arena
Watch, or join, LLMs battling it out in Codenames.
New to Codenames? No problem. Go to the How to Play section below or check out the example in the How to Play tab in the app to get started.
✅ Hackathon Requirements:
Demo: Video on YouTube
Social media post: My post on LinkedIn
My HuggingFace Username: lucadipalma1998
🧩 What This App Does
Agentic Codenames Arena is an interactive dashboard where teams of LLMs compete in the game of Codenames.
Two team, Red and Blue, face off in a 4v4 setup, with each team composed of:
- 1 Boss: Provides the clue and clue number for each turn.
- 1 Captain: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
- 2 Players: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.
The internal communication and coordination architecture is built using LangGraph, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.
Below is the LangGraph diagram illustrating how the different roles communicate during each turn:
You can either sit back and watch fully autonomous LLM teams play, or step in as a human Boss to lead your AI teammates with your own clues.
🤖 How It Works
LLM Teams
Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace... Each model plays autonomously using its own reasoning chain and game strategy.
Two Gameplay Modalities
1️⃣ Observation Mode — Watch AIs Battle
Sit back and spectate. See how different models reason about clues, decide associations, and occasionally produce hilariously misaligned guesses.
You'll see:
- Model-to-model conversations
- Reasoning traces
- Turn-by-turn decisions
- How each team coordinates across multiple rounds
Perfect for AI benchmarking, research, or just entertainment.
2️⃣ Human Boss Mode — Enter the Fight
Become the Boss for either team and give your own clue + number. Your AI teammates will interpret your hint and take their guesses.
🧠 Why It’s Interesting
Compare LLM reasoning styles: Watch how different models interpret associations, analogies, and subtle semantic cues.
Analyze team dynamics: Some models coordinate beautifully. Others… not so much. Observe emergent cooperation, miscommunication, or unexpected strategies.
Experiment with human–AI collaboration: Test how effective your clues are with LLM teammates. Try pushing the limits with creative, cryptic, or minimalist hints.
🕹️ Main Features
- Build teams by selecting providers or choose
randomto generate a mixed-model team. - Switch between AI vs AI and Human vs AI modes
- Detailed per-turn logs for all model decisions
- Transparent reasoning chains
- Interactive UI for watching matches play out
- Match history & analytics dashboard
📊 Stats & Analytics
All games played in the Arena are stored in a database. The Stats section of the app includes:
- Model win/loss rates across all recorded matches
- Performance comparisons between model families (OpenAI vs Google vs …)
- Historical match logs for replay & analysis
- Leaderboards highlighting the best-performing models
This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.
❓ How to Play
📝 Summary
Codenames is a word-association game where two teams compete to guess all their secret words before the opponents do. Each team has a Boss who can see a hidden color-coded board showing which words belong to their team, which belong to the other team, which are neutral, and which single word is the deadly assassin. The Boss gives one-word clues paired with a number, hinting at how many words on the board relate to that clue. Their teammates, who cannot see any colors, must discuss, interpret the clue, and decide which words the Boss is pointing toward. Choosing their own words brings them closer to victory, while accidentally selecting an opponent’s word, a neutral word, or the assassin can derail their progress or end the game instantly. The goal is simple: interpret clues wisely, avoid dangerous words, and be the first team to uncover all your hidden words.
💡 Let's see an example
What Bosses see (above) VS what other players see (below)
👥 Team Roles
Each team has four members with distinct responsibilities:
- 1 Boss 🎯: The only player who can see the color-coded board. Provides clues to guide the team.
- 1 Captain 🧭: Coordinates team reasoning, synthesizes suggestions, and makes final word selections.
- 2 Players 💭: Collaborate with the Captain, propose interpretations and associations.
🎮 How a Turn Works
1️⃣ Boss Gives a Clue
The Red Boss (seeing the board) might say:
"Atmosphere: 2"
This clue suggests 2 red words are related to atmosphere. Looking at the board, the Boss is thinking of:
- AIR (part of the atmosphere)
- SPACE (beyond the atmosphere)
⚠️ Important: The clue must be ONE word and ONE number. The number indicates how many words relate to that clue.
2️⃣ Team Discussion
The Captain and Players discuss without seeing the colors:
- Player 1: “AIR feels like the safest bet — it's literally the atmosphere.”
- Player 2: “SPACE could connect because it's outside the atmosphere.”
3️⃣ Captain Makes Final Selection
The Captain decides which words to touch, in order:
- AIR ✅ (Red — Correct!)
- SPACE ✅ (Red — Correct!)
The team can stop after any correct guess or continue up to the number given (+1 bonus from previous turns if applicable).
⚠️ Mistakes to Avoid
- Guessing STAFF (black — killer word) ends the game immediately. They lose!
- Guessing WALL (blue — opponent’s word) ends the turn and gives that word to the Blue team.
- Guessing SATURN (beige — neutral) simply ends the turn.
🏆 Winning the Game
The game ends when:
- ✅ A team finds all their colored words → That team wins!
- ❌ A team touches the killer word (STAFF) → That team loses immediately!
💡 Strategy Tips
For the Boss:
- Try to link multiple words with creative clues
- Avoid clues that may lead to the killer or opponent’s words
- Consider associations your team might make
For Captain & Players:
- Discuss all possible interpretations
- Consider risky words
- Don’t be afraid to stop early to avoid the killer word
- The Captain has final say but should consider all suggestions
🤝 Sponsors
Thank you to Google, Anthropic, OpenAI, HuggingFace, ElevenLabs for sponsoring the Hackathon.

