```markdown # Agent Reasoning Flow Guide ## Overview RewardPilot uses a multi-stage reasoning process powered by Claude 3.5 Sonnet (planning) and Gemini 2.0 Flash (synthesis). This guide explains how the agent thinks through complex credit card optimization decisions. ## Why Multi-LLM Architecture? | Stage | LLM | Reason | |-------|-----|--------| | **Planning** | Claude 3.5 Sonnet | Best at strategic thinking, tool use | | **Synthesis** | Gemini 2.0 Flash | Fast context processing, cost-effective | | **Verification** | GPT-4o | High accuracy for critical decisions | **Cost Comparison:** - Single GPT-4o: $0.15 per recommendation - Multi-LLM: $0.03 per recommendation (5x cheaper) --- ## Four-Phase Reasoning Process ``` ┌─────────────────────────────────────────────────────────┐ │ USER TRANSACTION │ │ "Whole Foods, $127.50, Groceries" │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ PHASE 1: PLANNING │ │ (Claude 3.5 Sonnet) │ │ │ │ Input: Transaction context │ │ Output: Execution strategy │ │ │ │ Questions: │ │ 1. What category is this? (Groceries) │ │ 2. Which cards have grocery bonuses? │ │ 3. Are there spending caps to check? │ │ 4. Need to forecast future spending? │ │ 5. Any special merchant restrictions? │ │ │ │ Strategy: │ │ - Call Smart Wallet MCP (get card recommendations) │ │ - Call RAG MCP (check merchant acceptance) │ │ - Call Forecast MCP (check cap status) │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ PHASE 2: EXECUTION │ │ (Parallel MCP Server Calls) │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Smart Wallet │ │ Rewards RAG │ │ Forecast │ │ │ │ MCP │ │ MCP │ │ MCP │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌──────────────────────────────────────────────────┐ │ │ │ Results: │ │ │ │ - Amex Gold: 4x = $5.10 │ │ │ │ - Citi Custom: 5% but cap hit │ │ │ │ - Chase Freedom: Not in grocery quarter │ │ │ │ │ │ │ │ - Merchant: Amex accepted at Whole Foods │ │ │ │ │ │ │ │ - Forecast: $450/$500 cap remaining this month │ │ │ └──────────────────────────────────────────────────┘ │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ PHASE 3: REASONING │ │ (Gemini 2.0 Flash Exp) │ │ │ │ Input: All MCP results + transaction context │ │ Output: Synthesized explanation │ │ │ │ Reasoning Chain: │ │ │ │ 1. Compare Rewards: │ │ - Amex Gold: 4x points = $5.10 cash value │ │ - Citi Custom Cash: Would be 5% ($6.38) but │ │ monthly cap already hit │ │ - Winner: Amex Gold ($5.10 > $1.28) │ │ │ │ 2. Check Constraints: │ │ - Amex accepted at Whole Foods? ✅ Yes │ │ - Annual cap status? $2,450/$25,000 (safe) │ │ - Foreign transaction fee? ✅ None │ │ │ │ 3. Future Optimization: │ │ - Forecast shows 3 more grocery trips this month │ │ - Total: $127.50 × 3 = $382.50 │ │ - Rewards: $382.50 × 4% = $15.30 │ │ - Recommendation: Continue using Amex Gold │ │ │ │ 4. Alternative Scenarios: │ │ - If Citi cap not hit: Use Citi ($6.38 > $5.10) │ │ - If at Costco: Use Citi (Amex not accepted) │ │ - If annual cap near: Switch to Citi next month │ │ │ │ Confidence: 95% (high certainty) │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ PHASE 4: RESPONSE FORMATTING │ │ (Structured Output) │ │ │ │ { │ │ "recommended_card": { │ │ "card_id": "c_amex_gold", │ │ "card_name": "American Express Gold", │ │ "issuer": "American Express" │ │ }, │ │ "rewards": { │ │ "points_earned": 510, │ │ "cash_value": 5.10, │ │ "earn_rate": "4x points" │ │ }, │ │ "reasoning": "Amex Gold offers 4x points...", │ │ "confidence": 0.95, │ │ "alternatives": [...], │ │ "warnings": [...] │ │ } │ └─────────────────────────────────────────────────────────┘ ``` --- ## Phase 1: Planning (Claude 3.5 Sonnet) ### Implementation ```python from anthropic import Anthropic anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY")) async def create_execution_plan(transaction: dict) -> dict: """ Claude analyzes transaction and creates execution strategy """ prompt = f""" You are a credit card optimization expert. Analyze this transaction and create an execution plan. Transaction: - Merchant: {transaction['merchant']} - Category: {transaction['category']} - Amount: ${transaction['amount_usd']} - MCC Code: {transaction['mcc']} - User ID: {transaction['user_id']} Available MCP servers: 1. smart_wallet - Analyzes user's cards and calculates rewards 2. rewards_rag - Semantic search of card benefits and restrictions 3. spend_forecast - Predicts spending and cap warnings Your task: 1. Determine which MCP servers to call 2. Prioritize the calls (some may depend on others) 3. Identify key decision factors 4. Set confidence threshold for recommendation Return a JSON plan with: {{ "strategy": "optimization approach (e.g., 'max_rewards', 'cap_aware')", "mcp_calls": [ {{ "service": "smart_wallet", "priority": 1, "reason": "Need to know available cards and base rewards" }}, {{ "service": "rewards_rag", "priority": 2, "reason": "Check if merchant accepts top card" }}, {{ "service": "spend_forecast", "priority": 3, "reason": "Verify monthly cap status" }} ], "decision_factors": [ "reward_rate", "merchant_acceptance", "spending_caps", "annual_fees" ], "confidence_threshold": 0.85, "complexity": "medium" }} """ response = anthropic.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=2048, temperature=0.3, # Lower temperature for consistent planning messages=[{ "role": "user", "content": prompt }] ) # Parse JSON response plan = json.loads(response.content[0].text) return plan ``` ### Example Plans #### Simple Transaction ```json { "strategy": "max_rewards", "mcp_calls": [ { "service": "smart_wallet", "priority": 1, "reason": "Straightforward category bonus" } ], "decision_factors": ["reward_rate"], "confidence_threshold": 0.90, "complexity": "low" } ``` #### Complex Transaction ```json { "strategy": "cap_aware_optimization", "mcp_calls": [ { "service": "smart_wallet", "priority": 1, "reason": "Get all card options" }, { "service": "spend_forecast", "priority": 2, "reason": "Check if near monthly/annual caps" }, { "service": "rewards_rag", "priority": 3, "reason": "Verify merchant acceptance for top 2 cards" } ], "decision_factors": [ "reward_rate", "spending_caps", "merchant_acceptance", "future_spending" ], "confidence_threshold": 0.80, "complexity": "high" } ``` --- ## Phase 2: Execution (Parallel MCP Calls) ### Implementation ```python import asyncio import httpx async def execute_mcp_calls(plan: dict, transaction: dict) -> dict: """ Execute MCP calls based on plan """ # Sort by priority sorted_calls = sorted( plan["mcp_calls"], key=lambda x: x["priority"] ) results = {} # Execute in priority order (can parallelize same priority) current_priority = sorted_calls[0]["priority"] priority_group = [] for call in sorted_calls: if call["priority"] == current_priority: priority_group.append(call) else: # Execute current priority group in parallel group_results = await execute_priority_group( priority_group, transaction ) results.update(group_results) # Move to next priority current_priority = call["priority"] priority_group = [call] # Execute final group if priority_group: group_results = await execute_priority_group( priority_group, transaction ) results.update(group_results) return results async def execute_priority_group(calls: list, transaction: dict) -> dict: """Execute MCP calls of same priority in parallel""" tasks = [] for call in calls: if call["service"] == "smart_wallet": tasks.append(call_smart_wallet(transaction)) elif call["service"] == "rewards_rag": tasks.append(call_rewards_rag(transaction)) elif call["service"] == "spend_forecast": tasks.append(call_forecast(transaction)) results = await asyncio.gather(*tasks) return dict(zip([c["service"] for c in calls], results)) async def call_smart_wallet(transaction: dict) -> dict: """Call Smart Wallet MCP""" async with httpx.AsyncClient(timeout=30.0) as client: response = await client.post( f"{MCP_ENDPOINTS['smart_wallet']}/analyze", json=transaction ) response.raise_for_status() return response.json() # Similar for other MCP servers... ``` --- ## Phase 3: Reasoning (Gemini 2.0 Flash) ### Implementation ```python import google.generativeai as genai genai.configure(api_key=os.getenv("GEMINI_API_KEY")) model = genai.GenerativeModel("gemini-2.0-flash-exp") async def synthesize_reasoning( transaction: dict, mcp_results: dict, plan: dict ) -> str: """ Gemini synthesizes all information into coherent explanation """ prompt = f""" You are a credit card optimization expert. Synthesize the following information into a clear recommendation. Transaction: {json.dumps(transaction, indent=2)} MCP Results: {json.dumps(mcp_results, indent=2)} Decision Factors (in order of importance): {json.dumps(plan['decision_factors'], indent=2)} Your task: 1. Compare all card options on the decision factors 2. Identify the optimal card with clear reasoning 3. Explain why alternatives are suboptimal 4. Provide any warnings or caveats 5. Suggest future optimizations Format your response as: ## Recommended Card [Card name and key benefit] ## Reasoning [Step-by-step logic] ## Comparison [Table comparing top 3 options] ## Warnings [Any caveats or cap warnings] ## Future Optimization [How to maximize rewards going forward] Be specific with numbers and percentages. """ response = model.generate_content( prompt, generation_config={ "temperature": 0.7, "max_output_tokens": 2048 } ) return response.text ``` ### Example Reasoning Output ```markdown ## Recommended Card **American Express Gold** - 4x points on U.S. supermarkets ## Reasoning 1. **Reward Rate Comparison:** - Amex Gold: 4x points = $5.10 cash value (1.3 cpp transfer) - Citi Custom Cash: Would be 5% = $6.38, but monthly cap hit - Chase Freedom Flex: 1x points = $1.28 (not grocery quarter) Winner: Amex Gold ($5.10 actual rewards) 2. **Merchant Acceptance:** - Whole Foods accepts American Express ✅ - No foreign transaction fees ✅ 3. **Spending Cap Status:** - Current: $2,450 / $25,000 annual cap (9.8% used) - This transaction: $127.50 (0.5% of cap) - Safe to use ✅ 4. **Future Spending Forecast:** - Predicted 3 more grocery trips this month ($382.50 total) - Projected rewards: $15.30 - Still well under annual cap ## Comparison | Card | Earn Rate | Rewards | Cap Status | Accepted? | |------|-----------|---------|------------|-----------| | **Amex Gold** | 4x | **$5.10** | 9.8% used | ✅ Yes | | Citi Custom Cash | 5% | $1.28 | Cap hit | ✅ Yes | | Chase Freedom Flex | 1x | $1.28 | N/A | ✅ Yes | ## Warnings ⚠️ **Citi Custom Cash Cap Hit**: You've reached the $500 monthly limit on Citi Custom Cash. It will reset on Feb 1st. Consider using it for non-grocery purchases this month. ⚠️ **Annual Cap Tracking**: You're at $2,450/$25,000 on Amex Gold's supermarket bonus. At current pace, you'll hit the cap in November. Plan to switch to Citi Custom Cash after that. ## Future Optimization 1. **This Month**: Continue using Amex Gold for groceries (best rate) 2. **Next Month**: Switch to Citi Custom Cash (5% > 4x after cap resets) 3. **After $25k Cap**: Use Citi Custom Cash or Chase Freedom (if grocery quarter) 4. **Consider**: Blue Cash Preferred (6% groceries, no cap) if spending exceeds $25k/year **Estimated Annual Savings**: $523 by following this strategy vs. using single card ``` --- ## Phase 4: Response Formatting ### Implementation ```python from pydantic import BaseModel from typing import List, Optional class RecommendedCard(BaseModel): card_id: str card_name: str issuer: str class Rewards(BaseModel): points_earned: int cash_value: float earn_rate: str class Alternative(BaseModel): card_name: str rewards: float reason: str class FinalRecommendation(BaseModel): recommended_card: RecommendedCard rewards: Rewards reasoning: str confidence: float alternatives: List[Alternative] warnings: List[str] processing_time_ms: float def format_recommendation( mcp_results: dict, reasoning: str, processing_time: float ) -> FinalRecommendation: """Format final response""" smart_wallet_result = mcp_results["smart_wallet"] best_card = smart_wallet_result["recommended_card"] # Extract alternatives alternatives = [] for card in smart_wallet_result["all_cards_comparison"][1:4]: alternatives.append(Alternative( card_name=card["card_name"], rewards=card["rewards"], reason=card.get("note", "Lower rewards rate") )) # Extract warnings warnings = [] if "forecast" in mcp_results: warnings.extend(mcp_results["forecast"].get("warnings", [])) return FinalRecommendation( recommended_card=RecommendedCard(**best_card), rewards=Rewards(**smart_wallet_result["rewards"]), reasoning=reasoning, confidence=calculate_confidence(mcp_results), alternatives=alternatives, warnings=warnings, processing_time_ms=processing_time ) ``` --- ## Advanced Reasoning Patterns ### 1. Chain-of-Thought Reasoning ```python prompt = """ Let's think through this step-by-step: Step 1: Identify the category - Merchant: {merchant} - MCC: {mcc} - Likely category: ? Step 2: List cards with bonuses in this category - Card A: X% on category - Card B: Y points per dollar - Card C: Z% cashback Step 3: Calculate actual rewards - Card A: ${amount} × X% = $? - Card B: ${amount} × Y points × $0.01 = $? - Card C: ${amount} × Z% = $? Step 4: Check constraints - Is Card A accepted at merchant? - Is Card B near spending cap? - Does Card C have annual fee? Step 5: Make recommendation Based on steps 1-4, the best card is... """ ``` ### 2. Self-Consistency ```python # Generate multiple reasoning paths reasoning_paths = [] for i in range(5): response = model.generate_content(prompt, temperature=0.8) reasoning_paths.append(response.text) # Vote on most common recommendation from collections import Counter recommendations = [extract_card(path) for path in reasoning_paths] most_common = Counter(recommendations).most_common(1)[0][0] # Use the reasoning path that led to most common answer final_reasoning = next( path for path in reasoning_paths if extract_card(path) == most_common ) ``` ### 3. Reflection & Verification ```python # Initial recommendation initial_rec = await generate_recommendation(transaction, mcp_results) # Self-critique critique_prompt = f""" Review this credit card recommendation: {initial_rec} Are there any errors or oversights? - Did we miss a better card? - Are the math calculations correct? - Did we consider all constraints? - Is the reasoning sound? If you find issues, provide corrections. """ critique = model.generate_content(critique_prompt) # Refine if needed if "error" in critique.text.lower() or "issue" in critique.text.lower(): final_rec = await refine_recommendation(initial_rec, critique.text) else: final_rec = initial_rec ``` --- ## Confidence Scoring ```python def calculate_confidence(mcp_results: dict) -> float: """ Calculate confidence score based on multiple factors """ confidence = 1.0 # Factor 1: Reward difference (higher difference = higher confidence) best_reward = mcp_results["smart_wallet"]["recommended_card"]["rewards"] second_best = mcp_results["smart_wallet"]["all_cards_comparison"][1]["rewards"] reward_gap = (best_reward - second_best) / best_reward if reward_gap < 0.1: # Less than 10% difference confidence *= 0.8 # Factor 2: Merchant acceptance certainty if "rewards_rag" in mcp_results: rag_confidence = mcp_results["rewards_rag"]["sources"][0]["relevance_score"] confidence *= rag_confidence # Factor 3: Cap warnings if "forecast" in mcp_results: if mcp_results["forecast"].get("warnings"): confidence *= 0.9 # Factor 4: Data freshness # (Lower confidence for stale data) return round(confidence, 2) ``` --- ## Error Handling & Fallbacks ```python async def recommend_with_fallback(transaction: dict): """Graceful degradation if MCP servers fail""" try: # Try full reasoning pipeline plan = await create_execution_plan(transaction) mcp_results = await execute_mcp_calls(plan, transaction) reasoning = await synthesize_reasoning(transaction, mcp_results, plan) return format_recommendation(mcp_results, reasoning) except Exception as e: logger.error(f"Full pipeline failed: {e}") try: # Fallback: Use only Smart Wallet MCP result = await call_smart_wallet(transaction) return format_simple_recommendation(result) except Exception as e2: logger.error(f"Fallback failed: {e2}") # Last resort: Rule-based recommendation return rule_based_recommendation(transaction) def rule_based_recommendation(transaction: dict): """Simple rule-based fallback""" rules = { "Groceries": "Amex Gold (4x points)", "Dining": "Amex Gold (4x points)", "Travel": "Chase Sapphire Reserve (3x points)", "Gas": "Costco Anywhere Visa (4% cashback)", "Default": "Citi Double Cash (2% on everything)" } category = transaction["category"] recommended = rules.get(category, rules["Default"]) return { "recommended_card": recommended, "reasoning": f"Based on category rules for {category}", "confidence": 0.60, # Lower confidence for rule-based "warnings": ["Recommendation based on simplified rules (MCP servers unavailable)"] } ``` --- ## Testing & Evaluation ### Unit Tests ```python import pytest @pytest.mark.asyncio async def test_planning_phase(): """Test Claude's planning logic""" transaction = { "merchant": "Whole Foods", "category": "Groceries", "amount_usd": 127.50, "mcc": "5411" } plan = await create_execution_plan(transaction) assert "strategy" in plan assert "mcp_calls" in plan assert len(plan["mcp_calls"]) > 0 assert plan["confidence_threshold"] >= 0.5 @pytest.mark.asyncio async def test_reasoning_phase(): """Test Gemini's synthesis""" mcp_results = { "smart_wallet": { "recommended_card": {"card_name": "Amex Gold"}, "rewards": {"cash_value": 5.10} } } reasoning = await synthesize_reasoning({}, mcp_results, {}) assert "Amex Gold" in reasoning assert "$5.10" in reasoning ``` ### Integration Tests ```python @pytest.mark.asyncio async def test_end_to_end_recommendation(): """Test full recommendation pipeline""" transaction = { "user_id": "test_user", "merchant": "Whole Foods", "category": "Groceries", "amount_usd": 127.50, "mcc": "5411" } result = await recommend_with_fallback(transaction) assert result["recommended_card"]["card_name"] assert result["rewards"]["cash_value"] > 0 assert result["confidence"] >= 0.5 assert len(result["reasoning"]) > 100 ``` --- **Related Documentation:** - [MCP Server Implementation](./mcp_architecture.md) - [Modal Deployment Guide](./modal_deployment.md) - [LlamaIndex RAG Setup](./llamaindex_setup.md) ``` ---