Spaces:

MCP-1st-Birthday
/

rewardpilot-web-ui

Running

App Files Files Community

sammy786 commited on 10 days ago

Commit

dd916d8

verified ·

1 Parent(s): b7662d1

Create agent_reasoning.md

Browse files

Files changed (1) hide show

docs/agent_reasoning.md +787 -0

docs/agent_reasoning.md ADDED Viewed

	@@ -0,0 +1,787 @@

+```markdown
+# Agent Reasoning Flow Guide
+## Overview
+RewardPilot uses a multi-stage reasoning process powered by Claude 3.5 Sonnet (planning) and Gemini 2.0 Flash (synthesis). This guide explains how the agent thinks through complex credit card optimization decisions.
+## Why Multi-LLM Architecture?
+| Stage | LLM | Reason |
+|-------|-----|--------|
+| **Planning** | Claude 3.5 Sonnet | Best at strategic thinking, tool use |
+| **Synthesis** | Gemini 2.0 Flash | Fast context processing, cost-effective |
+| **Verification** | GPT-4o | High accuracy for critical decisions |
+**Cost Comparison:**
+- Single GPT-4o: $0.15 per recommendation
+- Multi-LLM: $0.03 per recommendation (5x cheaper)
+---
+## Four-Phase Reasoning Process
+```
+┌─────────────────────────────────────────────────────────┐
+│                 USER TRANSACTION                         │
+│   "Whole Foods, $127.50, Groceries"                     │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│              PHASE 1: PLANNING                           │
+│           (Claude 3.5 Sonnet)                           │
+│                                                          │
+│  Input: Transaction context                             │
+│  Output: Execution strategy                             │
+│                                                          │
+│  Questions:                                             │
+│  1. What category is this? (Groceries)                 │
+│  2. Which cards have grocery bonuses?                  │
+│  3. Are there spending caps to check?                  │
+│  4. Need to forecast future spending?                  │
+│  5. Any special merchant restrictions?                 │
+│                                                          │
+│  Strategy:                                              │
+│  - Call Smart Wallet MCP (get card recommendations)    │
+│  - Call RAG MCP (check merchant acceptance)            │
+│  - Call Forecast MCP (check cap status)                │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│            PHASE 2: EXECUTION                            │
+│         (Parallel MCP Server Calls)                     │
+│                                                          │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
+│  │ Smart Wallet │  │  Rewards RAG │  │   Forecast   │ │
+│  │     MCP      │  │     MCP      │  │     MCP      │ │
+│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘ │
+│         │                 │                 │          │
+│         ▼                 ▼                 ▼          │
+│  ┌──────────────────────────────────────────────────┐ │
+│  │ Results:                                         │ │
+│  │ - Amex Gold: 4x = $5.10                         │ │
+│  │ - Citi Custom: 5% but cap hit                   │ │
+│  │ - Chase Freedom: Not in grocery quarter         │ │
+│  │                                                  │ │
+│  │ - Merchant: Amex accepted at Whole Foods        │ │
+│  │                                                  │ │
+│  │ - Forecast: $450/$500 cap remaining this month  │ │
+│  └──────────────────────────────────────────────────┘ │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│            PHASE 3: REASONING                            │
+│           (Gemini 2.0 Flash Exp)                        │
+│                                                          │
+│  Input: All MCP results + transaction context           │
+│  Output: Synthesized explanation                        │
+│                                                          │
+│  Reasoning Chain:                                       │
+│                                                          │
+│  1. Compare Rewards:                                    │
+│     - Amex Gold: 4x points = $5.10 cash value          │
+│     - Citi Custom Cash: Would be 5% ($6.38) but        │
+│       monthly cap already hit                           │
+│     - Winner: Amex Gold ($5.10 > $1.28)                │
+│                                                          │
+│  2. Check Constraints:                                  │
+│     - Amex accepted at Whole Foods? ✅ Yes             │
+│     - Annual cap status? $2,450/$25,000 (safe)         │
+│     - Foreign transaction fee? ✅ None                 │
+│                                                          │
+│  3. Future Optimization:                                │
+│     - Forecast shows 3 more grocery trips this month   │
+│     - Total: $127.50 × 3 = $382.50                     │
+│     - Rewards: $382.50 × 4% = $15.30                   │
+│     - Recommendation: Continue using Amex Gold         │
+│                                                          │
+│  4. Alternative Scenarios:                              │
+│     - If Citi cap not hit: Use Citi ($6.38 > $5.10)   │
+│     - If at Costco: Use Citi (Amex not accepted)      │
+│     - If annual cap near: Switch to Citi next month    │
+│                                                          │
+│  Confidence: 95% (high certainty)                       │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│           PHASE 4: RESPONSE FORMATTING                   │
+│              (Structured Output)                         │
+│                                                          │
+│  {                                                       │
+│    "recommended_card": {                                │
+│      "card_id": "c_amex_gold",                          │
+│      "card_name": "American Express Gold",              │
+│      "issuer": "American Express"                       │
+│    },                                                    │
+│    "rewards": {                                          │
+│      "points_earned": 510,                              │
+│      "cash_value": 5.10,                                │
+│      "earn_rate": "4x points"                           │
+│    },                                                    │
+│    "reasoning": "Amex Gold offers 4x points...",        │
+│    "confidence": 0.95,                                   │
+│    "alternatives": [...],                               │
+│    "warnings": [...]                                     │
+│  }                                                       │
+└─────────────────────────────────────────────────────────┘
+```
+---
+## Phase 1: Planning (Claude 3.5 Sonnet)
+### Implementation
+```python
+from anthropic import Anthropic
+anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+async def create_execution_plan(transaction: dict) -> dict:
+    """
+    Claude analyzes transaction and creates execution strategy
+    """
+    prompt = f"""
+You are a credit card optimization expert. Analyze this transaction and create an execution plan.
+Transaction:
+- Merchant: {transaction['merchant']}
+- Category: {transaction['category']}
+- Amount: ${transaction['amount_usd']}
+- MCC Code: {transaction['mcc']}
+- User ID: {transaction['user_id']}
+Available MCP servers:
+1. smart_wallet - Analyzes user's cards and calculates rewards
+2. rewards_rag - Semantic search of card benefits and restrictions
+3. spend_forecast - Predicts spending and cap warnings
+Your task:
+1. Determine which MCP servers to call
+2. Prioritize the calls (some may depend on others)
+3. Identify key decision factors
+4. Set confidence threshold for recommendation
+Return a JSON plan with:
+{{
+  "strategy": "optimization approach (e.g., 'max_rewards', 'cap_aware')",
+  "mcp_calls": [
+    {{
+      "service": "smart_wallet",
+      "priority": 1,
+      "reason": "Need to know available cards and base rewards"
+    }},
+    {{
+      "service": "rewards_rag",
+      "priority": 2,
+      "reason": "Check if merchant accepts top card"
+    }},
+    {{
+      "service": "spend_forecast",
+      "priority": 3,
+      "reason": "Verify monthly cap status"
+    }}
+  ],
+  "decision_factors": [
+    "reward_rate",
+    "merchant_acceptance",
+    "spending_caps",
+    "annual_fees"
+  ],
+  "confidence_threshold": 0.85,
+  "complexity": "medium"
+}}
+"""
+    response = anthropic.messages.create(
+        model="claude-3-5-sonnet-20241022",
+        max_tokens=2048,
+        temperature=0.3,  # Lower temperature for consistent planning
+        messages=[{
+            "role": "user",
+            "content": prompt
+        }]
+    )
+    # Parse JSON response
+    plan = json.loads(response.content[0].text)
+    return plan
+```
+### Example Plans
+#### Simple Transaction
+```json
+{
+  "strategy": "max_rewards",
+  "mcp_calls": [
+    {
+      "service": "smart_wallet",
+      "priority": 1,
+      "reason": "Straightforward category bonus"
+    }
+  ],
+  "decision_factors": ["reward_rate"],
+  "confidence_threshold": 0.90,
+  "complexity": "low"
+}
+```
+#### Complex Transaction
+```json
+{
+  "strategy": "cap_aware_optimization",
+  "mcp_calls": [
+    {
+      "service": "smart_wallet",
+      "priority": 1,
+      "reason": "Get all card options"
+    },
+    {
+      "service": "spend_forecast",
+      "priority": 2,
+      "reason": "Check if near monthly/annual caps"
+    },
+    {
+      "service": "rewards_rag",
+      "priority": 3,
+      "reason": "Verify merchant acceptance for top 2 cards"
+    }
+  ],
+  "decision_factors": [
+    "reward_rate",
+    "spending_caps",
+    "merchant_acceptance",
+    "future_spending"
+  ],
+  "confidence_threshold": 0.80,
+  "complexity": "high"
+}
+```
+---
+## Phase 2: Execution (Parallel MCP Calls)
+### Implementation
+```python
+import asyncio
+import httpx
+async def execute_mcp_calls(plan: dict, transaction: dict) -> dict:
+    """
+    Execute MCP calls based on plan
+    """
+    # Sort by priority
+    sorted_calls = sorted(
+        plan["mcp_calls"],
+        key=lambda x: x["priority"]
+    )
+    results = {}
+    # Execute in priority order (can parallelize same priority)
+    current_priority = sorted_calls[0]["priority"]
+    priority_group = []
+    for call in sorted_calls:
+        if call["priority"] == current_priority:
+            priority_group.append(call)
+        else:
+            # Execute current priority group in parallel
+            group_results = await execute_priority_group(
+                priority_group,
+                transaction
+            )
+            results.update(group_results)
+            # Move to next priority
+            current_priority = call["priority"]
+            priority_group = [call]
+    # Execute final group
+    if priority_group:
+        group_results = await execute_priority_group(
+            priority_group,
+            transaction
+        )
+        results.update(group_results)
+    return results
+async def execute_priority_group(calls: list, transaction: dict) -> dict:
+    """Execute MCP calls of same priority in parallel"""
+    tasks = []
+    for call in calls:
+        if call["service"] == "smart_wallet":
+            tasks.append(call_smart_wallet(transaction))
+        elif call["service"] == "rewards_rag":
+            tasks.append(call_rewards_rag(transaction))
+        elif call["service"] == "spend_forecast":
+            tasks.append(call_forecast(transaction))
+    results = await asyncio.gather(*tasks)
+    return dict(zip([c["service"] for c in calls], results))
+async def call_smart_wallet(transaction: dict) -> dict:
+    """Call Smart Wallet MCP"""
+    async with httpx.AsyncClient(timeout=30.0) as client:
+        response = await client.post(
+            f"{MCP_ENDPOINTS['smart_wallet']}/analyze",
+            json=transaction
+        )
+        response.raise_for_status()
+        return response.json()
+# Similar for other MCP servers...
+```
+---
+## Phase 3: Reasoning (Gemini 2.0 Flash)
+### Implementation
+```python
+import google.generativeai as genai
+genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
+model = genai.GenerativeModel("gemini-2.0-flash-exp")
+async def synthesize_reasoning(
+    transaction: dict,
+    mcp_results: dict,
+    plan: dict
+) -> str:
+    """
+    Gemini synthesizes all information into coherent explanation
+    """
+    prompt = f"""
+You are a credit card optimization expert. Synthesize the following information into a clear recommendation.
+Transaction:
+{json.dumps(transaction, indent=2)}
+MCP Results:
+{json.dumps(mcp_results, indent=2)}
+Decision Factors (in order of importance):
+{json.dumps(plan['decision_factors'], indent=2)}
+Your task:
+1. Compare all card options on the decision factors
+2. Identify the optimal card with clear reasoning
+3. Explain why alternatives are suboptimal
+4. Provide any warnings or caveats
+5. Suggest future optimizations
+Format your response as:
+## Recommended Card
+[Card name and key benefit]
+## Reasoning
+[Step-by-step logic]
+## Comparison
+[Table comparing top 3 options]
+## Warnings
+[Any caveats or cap warnings]
+## Future Optimization
+[How to maximize rewards going forward]
+Be specific with numbers and percentages.
+"""
+    response = model.generate_content(
+        prompt,
+        generation_config={
+            "temperature": 0.7,
+            "max_output_tokens": 2048
+        }
+    )
+    return response.text
+```
+### Example Reasoning Output
+```markdown
+## Recommended Card
+**American Express Gold** - 4x points on U.S. supermarkets
+## Reasoning
+1. **Reward Rate Comparison:**
+   - Amex Gold: 4x points = $5.10 cash value (1.3 cpp transfer)
+   - Citi Custom Cash: Would be 5% = $6.38, but monthly cap hit
+   - Chase Freedom Flex: 1x points = $1.28 (not grocery quarter)
+   Winner: Amex Gold ($5.10 actual rewards)
+2. **Merchant Acceptance:**
+   - Whole Foods accepts American Express ✅
+   - No foreign transaction fees ✅
+3. **Spending Cap Status:**
+   - Current: $2,450 / $25,000 annual cap (9.8% used)
+   - This transaction: $127.50 (0.5% of cap)
+   - Safe to use ✅
+4. **Future Spending Forecast:**
+   - Predicted 3 more grocery trips this month ($382.50 total)
+   - Projected rewards: $15.30
+   - Still well under annual cap
+## Comparison
+| Card | Earn Rate | Rewards | Cap Status | Accepted? |
+|------|-----------|---------|------------|-----------|
+| **Amex Gold** | 4x | **$5.10** | 9.8% used | ✅ Yes |
+| Citi Custom Cash | 5% | $1.28 | Cap hit | ✅ Yes |
+| Chase Freedom Flex | 1x | $1.28 | N/A | ✅ Yes |
+## Warnings
+⚠️ **Citi Custom Cash Cap Hit**: You've reached the $500 monthly limit on Citi Custom Cash. It will reset on Feb 1st. Consider using it for non-grocery purchases this month.
+⚠️ **Annual Cap Tracking**: You're at $2,450/$25,000 on Amex Gold's supermarket bonus. At current pace, you'll hit the cap in November. Plan to switch to Citi Custom Cash after that.
+## Future Optimization
+1. **This Month**: Continue using Amex Gold for groceries (best rate)
+2. **Next Month**: Switch to Citi Custom Cash (5% > 4x after cap resets)
+3. **After $25k Cap**: Use Citi Custom Cash or Chase Freedom (if grocery quarter)
+4. **Consider**: Blue Cash Preferred (6% groceries, no cap) if spending exceeds $25k/year
+**Estimated Annual Savings**: $523 by following this strategy vs. using single card
+```
+---
+## Phase 4: Response Formatting
+### Implementation
+```python
+from pydantic import BaseModel
+from typing import List, Optional
+class RecommendedCard(BaseModel):
+    card_id: str
+    card_name: str
+    issuer: str
+class Rewards(BaseModel):
+    points_earned: int
+    cash_value: float
+    earn_rate: str
+class Alternative(BaseModel):
+    card_name: str
+    rewards: float
+    reason: str
+class FinalRecommendation(BaseModel):
+    recommended_card: RecommendedCard
+    rewards: Rewards
+    reasoning: str
+    confidence: float
+    alternatives: List[Alternative]
+    warnings: List[str]
+    processing_time_ms: float
+def format_recommendation(
+    mcp_results: dict,
+    reasoning: str,
+    processing_time: float
+) -> FinalRecommendation:
+    """Format final response"""
+    smart_wallet_result = mcp_results["smart_wallet"]
+    best_card = smart_wallet_result["recommended_card"]
+    # Extract alternatives
+    alternatives = []
+    for card in smart_wallet_result["all_cards_comparison"][1:4]:
+        alternatives.append(Alternative(
+            card_name=card["card_name"],
+            rewards=card["rewards"],
+            reason=card.get("note", "Lower rewards rate")
+        ))
+    # Extract warnings
+    warnings = []
+    if "forecast" in mcp_results:
+        warnings.extend(mcp_results["forecast"].get("warnings", []))
+    return FinalRecommendation(
+        recommended_card=RecommendedCard(**best_card),
+        rewards=Rewards(**smart_wallet_result["rewards"]),
+        reasoning=reasoning,
+        confidence=calculate_confidence(mcp_results),
+        alternatives=alternatives,
+        warnings=warnings,
+        processing_time_ms=processing_time
+    )
+```
+---
+## Advanced Reasoning Patterns
+### 1. Chain-of-Thought Reasoning
+```python
+prompt = """
+Let's think through this step-by-step:
+Step 1: Identify the category
+- Merchant: {merchant}
+- MCC: {mcc}
+- Likely category: ?
+Step 2: List cards with bonuses in this category
+- Card A: X% on category
+- Card B: Y points per dollar
+- Card C: Z% cashback
+Step 3: Calculate actual rewards
+- Card A: ${amount} × X% = $?
+- Card B: ${amount} × Y points × $0.01 = $?
+- Card C: ${amount} × Z% = $?
+Step 4: Check constraints
+- Is Card A accepted at merchant?
+- Is Card B near spending cap?
+- Does Card C have annual fee?
+Step 5: Make recommendation
+Based on steps 1-4, the best card is...
+"""
+```
+### 2. Self-Consistency
+```python
+# Generate multiple reasoning paths
+reasoning_paths = []
+for i in range(5):
+    response = model.generate_content(prompt, temperature=0.8)
+    reasoning_paths.append(response.text)
+# Vote on most common recommendation
+from collections import Counter
+recommendations = [extract_card(path) for path in reasoning_paths]
+most_common = Counter(recommendations).most_common(1)[0][0]
+# Use the reasoning path that led to most common answer
+final_reasoning = next(
+    path for path in reasoning_paths
+    if extract_card(path) == most_common
+)
+```
+### 3. Reflection & Verification
+```python
+# Initial recommendation
+initial_rec = await generate_recommendation(transaction, mcp_results)
+# Self-critique
+critique_prompt = f"""
+Review this credit card recommendation:
+{initial_rec}
+Are there any errors or oversights?
+- Did we miss a better card?
+- Are the math calculations correct?
+- Did we consider all constraints?
+- Is the reasoning sound?
+If you find issues, provide corrections.
+"""
+critique = model.generate_content(critique_prompt)
+# Refine if needed
+if "error" in critique.text.lower() or "issue" in critique.text.lower():
+    final_rec = await refine_recommendation(initial_rec, critique.text)
+else:
+    final_rec = initial_rec
+```
+---
+## Confidence Scoring
+```python
+def calculate_confidence(mcp_results: dict) -> float:
+    """
+    Calculate confidence score based on multiple factors
+    """
+    confidence = 1.0
+    # Factor 1: Reward difference (higher difference = higher confidence)
+    best_reward = mcp_results["smart_wallet"]["recommended_card"]["rewards"]
+    second_best = mcp_results["smart_wallet"]["all_cards_comparison"][1]["rewards"]
+    reward_gap = (best_reward - second_best) / best_reward
+    if reward_gap < 0.1:  # Less than 10% difference
+        confidence *= 0.8
+    # Factor 2: Merchant acceptance certainty
+    if "rewards_rag" in mcp_results:
+        rag_confidence = mcp_results["rewards_rag"]["sources"][0]["relevance_score"]
+        confidence *= rag_confidence
+    # Factor 3: Cap warnings
+    if "forecast" in mcp_results:
+        if mcp_results["forecast"].get("warnings"):
+            confidence *= 0.9
+    # Factor 4: Data freshness
+    # (Lower confidence for stale data)
+    return round(confidence, 2)
+```
+---
+## Error Handling & Fallbacks
+```python
+async def recommend_with_fallback(transaction: dict):
+    """Graceful degradation if MCP servers fail"""
+    try:
+        # Try full reasoning pipeline
+        plan = await create_execution_plan(transaction)
+        mcp_results = await execute_mcp_calls(plan, transaction)
+        reasoning = await synthesize_reasoning(transaction, mcp_results, plan)
+        return format_recommendation(mcp_results, reasoning)
+    except Exception as e:
+        logger.error(f"Full pipeline failed: {e}")
+        try:
+            # Fallback: Use only Smart Wallet MCP
+            result = await call_smart_wallet(transaction)
+            return format_simple_recommendation(result)
+        except Exception as e2:
+            logger.error(f"Fallback failed: {e2}")
+            # Last resort: Rule-based recommendation
+            return rule_based_recommendation(transaction)
+def rule_based_recommendation(transaction: dict):
+    """Simple rule-based fallback"""
+    rules = {
+        "Groceries": "Amex Gold (4x points)",
+        "Dining": "Amex Gold (4x points)",
+        "Travel": "Chase Sapphire Reserve (3x points)",
+        "Gas": "Costco Anywhere Visa (4% cashback)",
+        "Default": "Citi Double Cash (2% on everything)"
+    }
+    category = transaction["category"]
+    recommended = rules.get(category, rules["Default"])
+    return {
+        "recommended_card": recommended,
+        "reasoning": f"Based on category rules for {category}",
+        "confidence": 0.60,  # Lower confidence for rule-based
+        "warnings": ["Recommendation based on simplified rules (MCP servers unavailable)"]
+    }
+```
+---
+## Testing & Evaluation
+### Unit Tests
+```python
+import pytest
+@pytest.mark.asyncio
+async def test_planning_phase():
+    """Test Claude's planning logic"""
+    transaction = {
+        "merchant": "Whole Foods",
+        "category": "Groceries",
+        "amount_usd": 127.50,
+        "mcc": "5411"
+    }
+    plan = await create_execution_plan(transaction)
+    assert "strategy" in plan
+    assert "mcp_calls" in plan
+    assert len(plan["mcp_calls"]) > 0
+    assert plan["confidence_threshold"] >= 0.5
+@pytest.mark.asyncio
+async def test_reasoning_phase():
+    """Test Gemini's synthesis"""
+    mcp_results = {
+        "smart_wallet": {
+            "recommended_card": {"card_name": "Amex Gold"},
+            "rewards": {"cash_value": 5.10}
+        }
+    }
+    reasoning = await synthesize_reasoning({}, mcp_results, {})
+    assert "Amex Gold" in reasoning
+    assert "$5.10" in reasoning
+```
+### Integration Tests
+```python
+@pytest.mark.asyncio
+async def test_end_to_end_recommendation():
+    """Test full recommendation pipeline"""
+    transaction = {
+        "user_id": "test_user",
+        "merchant": "Whole Foods",
+        "category": "Groceries",
+        "amount_usd": 127.50,
+        "mcc": "5411"
+    }
+    result = await recommend_with_fallback(transaction)
+    assert result["recommended_card"]["card_name"]
+    assert result["rewards"]["cash_value"] > 0
+    assert result["confidence"] >= 0.5
+    assert len(result["reasoning"]) > 100
+```
+---
+**Related Documentation:**
+- [MCP Server Implementation](./mcp_architecture.md)
+- [Modal Deployment Guide](./modal_deployment.md)
+- [LlamaIndex RAG Setup](./llamaindex_setup.md)
+```
+---