Spaces:

MCP-1st-Birthday
/

Agentic-Codenames-Arena

Running

App Files Files Community

Agentic-Codenames-Arena / README.md

lucadipalma

update tags

0a26b83 7 days ago

preview code

raw

history blame contribute delete

7.93 kB

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

metadata

title: Agentic Codenames Arena
emoji: 📊
colorFrom: blue
colorTo: blue
python_version: 3.12.6
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Time for the LLMs to have some fun with Codenames!
tags:
  - mcp-in-action-track-creative
  - mcp-in-action-track-consumer
  - Google
  - Gemini
  - Anthropic
  - OpenAI
  - HuggingFace
  - ElevenLabs

🧠 Agentic Codenames Arena

Watch, or join, LLMs battling it out in Codenames.

New to Codenames? No problem. Go to the How to Play section below or check out the example in the How to Play tab in the app to get started.

✅ Hackathon Requirements:

Demo: Video on YouTube

Social media post: My post on LinkedIn

My HuggingFace Username: lucadipalma1998

🧩 What This App Does

Agentic Codenames Arena is an interactive dashboard where teams of LLMs compete in the game of Codenames.
Two team, Red and Blue, face off in a 4v4 setup, with each team composed of:

1 Boss: Provides the clue and clue number for each turn.
1 Captain: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
2 Players: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.

The internal communication and coordination architecture is built using LangGraph, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.

Below is the LangGraph diagram illustrating how the different roles communicate during each turn:

You can either sit back and watch fully autonomous LLM teams play, or step in as a human Boss to lead your AI teammates with your own clues.

🤖 How It Works

LLM Teams

Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace... Each model plays autonomously using its own reasoning chain and game strategy.

Two Gameplay Modalities

1️⃣ Observation Mode — Watch AIs Battle

Sit back and spectate. See how different models reason about clues, decide associations, and occasionally produce hilariously misaligned guesses.

You'll see:

Model-to-model conversations
Reasoning traces
Turn-by-turn decisions
How each team coordinates across multiple rounds

Perfect for AI benchmarking, research, or just entertainment.

2️⃣ Human Boss Mode — Enter the Fight

Become the Boss for either team and give your own clue + number. Your AI teammates will interpret your hint and take their guesses.

🧠 Why It’s Interesting

Compare LLM reasoning styles: Watch how different models interpret associations, analogies, and subtle semantic cues.
Analyze team dynamics: Some models coordinate beautifully. Others… not so much. Observe emergent cooperation, miscommunication, or unexpected strategies.
Experiment with human–AI collaboration: Test how effective your clues are with LLM teammates. Try pushing the limits with creative, cryptic, or minimalist hints.

🕹️ Main Features

Build teams by selecting providers or choose random to generate a mixed-model team.
Switch between AI vs AI and Human vs AI modes
Detailed per-turn logs for all model decisions
Transparent reasoning chains
Interactive UI for watching matches play out
Match history & analytics dashboard

📊 Stats & Analytics

All games played in the Arena are stored in a database. The Stats section of the app includes:

Model win/loss rates across all recorded matches
Performance comparisons between model families (OpenAI vs Google vs …)
Historical match logs for replay & analysis
Leaderboards highlighting the best-performing models

This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.

❓ How to Play

📝 Summary

Codenames is a word-association game where two teams compete to guess all their secret words before the opponents do. Each team has a Boss who can see a hidden color-coded board showing which words belong to their team, which belong to the other team, which are neutral, and which single word is the deadly assassin. The Boss gives one-word clues paired with a number, hinting at how many words on the board relate to that clue. Their teammates, who cannot see any colors, must discuss, interpret the clue, and decide which words the Boss is pointing toward. Choosing their own words brings them closer to victory, while accidentally selecting an opponent’s word, a neutral word, or the assassin can derail their progress or end the game instantly. The goal is simple: interpret clues wisely, avoid dangerous words, and be the first team to uncover all your hidden words.

💡 Let's see an example

What Bosses see (above) VS what other players see (below)

👥 Team Roles

Each team has four members with distinct responsibilities:

1 Boss 🎯: The only player who can see the color-coded board. Provides clues to guide the team.
1 Captain 🧭: Coordinates team reasoning, synthesizes suggestions, and makes final word selections.
2 Players 💭: Collaborate with the Captain, propose interpretations and associations.

🎮 How a Turn Works

1️⃣ Boss Gives a Clue

The Red Boss (seeing the board) might say:

"Atmosphere: 2"

This clue suggests 2 red words are related to atmosphere. Looking at the board, the Boss is thinking of:

AIR (part of the atmosphere)
SPACE (beyond the atmosphere)

⚠️ Important: The clue must be ONE word and ONE number. The number indicates how many words relate to that clue.

2️⃣ Team Discussion

The Captain and Players discuss without seeing the colors:

Player 1: “AIR feels like the safest bet — it's literally the atmosphere.”
Player 2: “SPACE could connect because it's outside the atmosphere.”

3️⃣ Captain Makes Final Selection

The Captain decides which words to touch, in order:

AIR ✅ (Red — Correct!)
SPACE ✅ (Red — Correct!)

The team can stop after any correct guess or continue up to the number given (+1 bonus from previous turns if applicable).

⚠️ Mistakes to Avoid

Guessing STAFF (black — killer word) ends the game immediately. They lose!
Guessing WALL (blue — opponent’s word) ends the turn and gives that word to the Blue team.
Guessing SATURN (beige — neutral) simply ends the turn.

🏆 Winning the Game

The game ends when:

✅ A team finds all their colored words → That team wins!
❌ A team touches the killer word (STAFF) → That team loses immediately!

💡 Strategy Tips

For the Boss:

Try to link multiple words with creative clues
Avoid clues that may lead to the killer or opponent’s words
Consider associations your team might make

For Captain & Players:

Discuss all possible interpretations
Consider risky words
Don’t be afraid to stop early to avoid the killer word
The Captain has final say but should consider all suggestions

🤝 Sponsors

Thank you to Google, Anthropic, OpenAI, HuggingFace, ElevenLabs for sponsoring the Hackathon.