sammy786 commited on
Commit
dd916d8
·
verified ·
1 Parent(s): b7662d1

Create agent_reasoning.md

Browse files
Files changed (1) hide show
  1. docs/agent_reasoning.md +787 -0
docs/agent_reasoning.md ADDED
@@ -0,0 +1,787 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ```markdown
2
+ # Agent Reasoning Flow Guide
3
+
4
+ ## Overview
5
+
6
+ RewardPilot uses a multi-stage reasoning process powered by Claude 3.5 Sonnet (planning) and Gemini 2.0 Flash (synthesis). This guide explains how the agent thinks through complex credit card optimization decisions.
7
+
8
+ ## Why Multi-LLM Architecture?
9
+
10
+ | Stage | LLM | Reason |
11
+ |-------|-----|--------|
12
+ | **Planning** | Claude 3.5 Sonnet | Best at strategic thinking, tool use |
13
+ | **Synthesis** | Gemini 2.0 Flash | Fast context processing, cost-effective |
14
+ | **Verification** | GPT-4o | High accuracy for critical decisions |
15
+
16
+ **Cost Comparison:**
17
+ - Single GPT-4o: $0.15 per recommendation
18
+ - Multi-LLM: $0.03 per recommendation (5x cheaper)
19
+
20
+ ---
21
+
22
+ ## Four-Phase Reasoning Process
23
+
24
+ ```
25
+ ┌─────────────────────────────────────────────────────────┐
26
+ │ USER TRANSACTION │
27
+ │ "Whole Foods, $127.50, Groceries" │
28
+ └────────────────────┬────────────────────────────────────┘
29
+
30
+
31
+ ┌─────────────────────────────────────────────────────────┐
32
+ │ PHASE 1: PLANNING │
33
+ │ (Claude 3.5 Sonnet) │
34
+ │ │
35
+ │ Input: Transaction context │
36
+ │ Output: Execution strategy │
37
+ │ │
38
+ │ Questions: │
39
+ │ 1. What category is this? (Groceries) │
40
+ │ 2. Which cards have grocery bonuses? │
41
+ │ 3. Are there spending caps to check? │
42
+ │ 4. Need to forecast future spending? │
43
+ │ 5. Any special merchant restrictions? │
44
+ │ │
45
+ │ Strategy: │
46
+ │ - Call Smart Wallet MCP (get card recommendations) │
47
+ │ - Call RAG MCP (check merchant acceptance) │
48
+ │ - Call Forecast MCP (check cap status) │
49
+ └────────────────────┬────────────────────────────────────┘
50
+
51
+
52
+ ┌─────────────────────────────────────────────────────────┐
53
+ │ PHASE 2: EXECUTION │
54
+ │ (Parallel MCP Server Calls) │
55
+ │ │
56
+ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
57
+ │ │ Smart Wallet │ │ Rewards RAG │ │ Forecast │ │
58
+ │ │ MCP │ │ MCP │ │ MCP │ │
59
+ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
60
+ │ │ │ │ │
61
+ │ ▼ ▼ ▼ │
62
+ │ ┌──────────────────────────────────────────────────┐ │
63
+ │ │ Results: │ │
64
+ │ │ - Amex Gold: 4x = $5.10 │ │
65
+ │ │ - Citi Custom: 5% but cap hit │ │
66
+ │ │ - Chase Freedom: Not in grocery quarter │ │
67
+ │ │ │ │
68
+ │ │ - Merchant: Amex accepted at Whole Foods │ │
69
+ │ │ │ │
70
+ │ │ - Forecast: $450/$500 cap remaining this month │ │
71
+ │ └──────────────────────────────────────────────────┘ │
72
+ └────────────────────┬────────────────────────────────────┘
73
+
74
+
75
+ ┌─────────────────────────────────────────────────────────┐
76
+ │ PHASE 3: REASONING │
77
+ │ (Gemini 2.0 Flash Exp) │
78
+ │ │
79
+ │ Input: All MCP results + transaction context │
80
+ │ Output: Synthesized explanation │
81
+ │ │
82
+ │ Reasoning Chain: │
83
+ │ │
84
+ │ 1. Compare Rewards: │
85
+ │ - Amex Gold: 4x points = $5.10 cash value │
86
+ │ - Citi Custom Cash: Would be 5% ($6.38) but │
87
+ │ monthly cap already hit │
88
+ │ - Winner: Amex Gold ($5.10 > $1.28) │
89
+ │ │
90
+ │ 2. Check Constraints: │
91
+ │ - Amex accepted at Whole Foods? ✅ Yes │
92
+ │ - Annual cap status? $2,450/$25,000 (safe) │
93
+ │ - Foreign transaction fee? ✅ None │
94
+ │ │
95
+ │ 3. Future Optimization: │
96
+ │ - Forecast shows 3 more grocery trips this month │
97
+ │ - Total: $127.50 × 3 = $382.50 │
98
+ │ - Rewards: $382.50 × 4% = $15.30 │
99
+ │ - Recommendation: Continue using Amex Gold │
100
+ │ │
101
+ │ 4. Alternative Scenarios: │
102
+ │ - If Citi cap not hit: Use Citi ($6.38 > $5.10) │
103
+ │ - If at Costco: Use Citi (Amex not accepted) │
104
+ │ - If annual cap near: Switch to Citi next month │
105
+ │ │
106
+ │ Confidence: 95% (high certainty) │
107
+ └────────────────────┬────────────────────────────────────┘
108
+
109
+
110
+ ┌─────────────────────────────────────────────────────────┐
111
+ │ PHASE 4: RESPONSE FORMATTING │
112
+ │ (Structured Output) │
113
+ │ │
114
+ │ { │
115
+ │ "recommended_card": { │
116
+ │ "card_id": "c_amex_gold", │
117
+ │ "card_name": "American Express Gold", │
118
+ │ "issuer": "American Express" │
119
+ │ }, │
120
+ │ "rewards": { │
121
+ │ "points_earned": 510, │
122
+ │ "cash_value": 5.10, │
123
+ │ "earn_rate": "4x points" │
124
+ │ }, │
125
+ │ "reasoning": "Amex Gold offers 4x points...", │
126
+ │ "confidence": 0.95, │
127
+ │ "alternatives": [...], │
128
+ │ "warnings": [...] │
129
+ │ } │
130
+ └─────────────────────────────────────────────────────────┘
131
+ ```
132
+
133
+ ---
134
+
135
+ ## Phase 1: Planning (Claude 3.5 Sonnet)
136
+
137
+ ### Implementation
138
+
139
+ ```python
140
+ from anthropic import Anthropic
141
+
142
+ anthropic = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
143
+
144
+ async def create_execution_plan(transaction: dict) -> dict:
145
+ """
146
+ Claude analyzes transaction and creates execution strategy
147
+ """
148
+
149
+ prompt = f"""
150
+ You are a credit card optimization expert. Analyze this transaction and create an execution plan.
151
+
152
+ Transaction:
153
+ - Merchant: {transaction['merchant']}
154
+ - Category: {transaction['category']}
155
+ - Amount: ${transaction['amount_usd']}
156
+ - MCC Code: {transaction['mcc']}
157
+ - User ID: {transaction['user_id']}
158
+
159
+ Available MCP servers:
160
+ 1. smart_wallet - Analyzes user's cards and calculates rewards
161
+ 2. rewards_rag - Semantic search of card benefits and restrictions
162
+ 3. spend_forecast - Predicts spending and cap warnings
163
+
164
+ Your task:
165
+ 1. Determine which MCP servers to call
166
+ 2. Prioritize the calls (some may depend on others)
167
+ 3. Identify key decision factors
168
+ 4. Set confidence threshold for recommendation
169
+
170
+ Return a JSON plan with:
171
+ {{
172
+ "strategy": "optimization approach (e.g., 'max_rewards', 'cap_aware')",
173
+ "mcp_calls": [
174
+ {{
175
+ "service": "smart_wallet",
176
+ "priority": 1,
177
+ "reason": "Need to know available cards and base rewards"
178
+ }},
179
+ {{
180
+ "service": "rewards_rag",
181
+ "priority": 2,
182
+ "reason": "Check if merchant accepts top card"
183
+ }},
184
+ {{
185
+ "service": "spend_forecast",
186
+ "priority": 3,
187
+ "reason": "Verify monthly cap status"
188
+ }}
189
+ ],
190
+ "decision_factors": [
191
+ "reward_rate",
192
+ "merchant_acceptance",
193
+ "spending_caps",
194
+ "annual_fees"
195
+ ],
196
+ "confidence_threshold": 0.85,
197
+ "complexity": "medium"
198
+ }}
199
+ """
200
+
201
+ response = anthropic.messages.create(
202
+ model="claude-3-5-sonnet-20241022",
203
+ max_tokens=2048,
204
+ temperature=0.3, # Lower temperature for consistent planning
205
+ messages=[{
206
+ "role": "user",
207
+ "content": prompt
208
+ }]
209
+ )
210
+
211
+ # Parse JSON response
212
+ plan = json.loads(response.content[0].text)
213
+
214
+ return plan
215
+ ```
216
+
217
+ ### Example Plans
218
+
219
+ #### Simple Transaction
220
+ ```json
221
+ {
222
+ "strategy": "max_rewards",
223
+ "mcp_calls": [
224
+ {
225
+ "service": "smart_wallet",
226
+ "priority": 1,
227
+ "reason": "Straightforward category bonus"
228
+ }
229
+ ],
230
+ "decision_factors": ["reward_rate"],
231
+ "confidence_threshold": 0.90,
232
+ "complexity": "low"
233
+ }
234
+ ```
235
+
236
+ #### Complex Transaction
237
+ ```json
238
+ {
239
+ "strategy": "cap_aware_optimization",
240
+ "mcp_calls": [
241
+ {
242
+ "service": "smart_wallet",
243
+ "priority": 1,
244
+ "reason": "Get all card options"
245
+ },
246
+ {
247
+ "service": "spend_forecast",
248
+ "priority": 2,
249
+ "reason": "Check if near monthly/annual caps"
250
+ },
251
+ {
252
+ "service": "rewards_rag",
253
+ "priority": 3,
254
+ "reason": "Verify merchant acceptance for top 2 cards"
255
+ }
256
+ ],
257
+ "decision_factors": [
258
+ "reward_rate",
259
+ "spending_caps",
260
+ "merchant_acceptance",
261
+ "future_spending"
262
+ ],
263
+ "confidence_threshold": 0.80,
264
+ "complexity": "high"
265
+ }
266
+ ```
267
+
268
+ ---
269
+
270
+ ## Phase 2: Execution (Parallel MCP Calls)
271
+
272
+ ### Implementation
273
+
274
+ ```python
275
+ import asyncio
276
+ import httpx
277
+
278
+ async def execute_mcp_calls(plan: dict, transaction: dict) -> dict:
279
+ """
280
+ Execute MCP calls based on plan
281
+ """
282
+
283
+ # Sort by priority
284
+ sorted_calls = sorted(
285
+ plan["mcp_calls"],
286
+ key=lambda x: x["priority"]
287
+ )
288
+
289
+ results = {}
290
+
291
+ # Execute in priority order (can parallelize same priority)
292
+ current_priority = sorted_calls[0]["priority"]
293
+ priority_group = []
294
+
295
+ for call in sorted_calls:
296
+ if call["priority"] == current_priority:
297
+ priority_group.append(call)
298
+ else:
299
+ # Execute current priority group in parallel
300
+ group_results = await execute_priority_group(
301
+ priority_group,
302
+ transaction
303
+ )
304
+ results.update(group_results)
305
+
306
+ # Move to next priority
307
+ current_priority = call["priority"]
308
+ priority_group = [call]
309
+
310
+ # Execute final group
311
+ if priority_group:
312
+ group_results = await execute_priority_group(
313
+ priority_group,
314
+ transaction
315
+ )
316
+ results.update(group_results)
317
+
318
+ return results
319
+
320
+ async def execute_priority_group(calls: list, transaction: dict) -> dict:
321
+ """Execute MCP calls of same priority in parallel"""
322
+
323
+ tasks = []
324
+ for call in calls:
325
+ if call["service"] == "smart_wallet":
326
+ tasks.append(call_smart_wallet(transaction))
327
+ elif call["service"] == "rewards_rag":
328
+ tasks.append(call_rewards_rag(transaction))
329
+ elif call["service"] == "spend_forecast":
330
+ tasks.append(call_forecast(transaction))
331
+
332
+ results = await asyncio.gather(*tasks)
333
+
334
+ return dict(zip([c["service"] for c in calls], results))
335
+
336
+ async def call_smart_wallet(transaction: dict) -> dict:
337
+ """Call Smart Wallet MCP"""
338
+ async with httpx.AsyncClient(timeout=30.0) as client:
339
+ response = await client.post(
340
+ f"{MCP_ENDPOINTS['smart_wallet']}/analyze",
341
+ json=transaction
342
+ )
343
+ response.raise_for_status()
344
+ return response.json()
345
+
346
+ # Similar for other MCP servers...
347
+ ```
348
+
349
+ ---
350
+
351
+ ## Phase 3: Reasoning (Gemini 2.0 Flash)
352
+
353
+ ### Implementation
354
+
355
+ ```python
356
+ import google.generativeai as genai
357
+
358
+ genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
359
+ model = genai.GenerativeModel("gemini-2.0-flash-exp")
360
+
361
+ async def synthesize_reasoning(
362
+ transaction: dict,
363
+ mcp_results: dict,
364
+ plan: dict
365
+ ) -> str:
366
+ """
367
+ Gemini synthesizes all information into coherent explanation
368
+ """
369
+
370
+ prompt = f"""
371
+ You are a credit card optimization expert. Synthesize the following information into a clear recommendation.
372
+
373
+ Transaction:
374
+ {json.dumps(transaction, indent=2)}
375
+
376
+ MCP Results:
377
+ {json.dumps(mcp_results, indent=2)}
378
+
379
+ Decision Factors (in order of importance):
380
+ {json.dumps(plan['decision_factors'], indent=2)}
381
+
382
+ Your task:
383
+ 1. Compare all card options on the decision factors
384
+ 2. Identify the optimal card with clear reasoning
385
+ 3. Explain why alternatives are suboptimal
386
+ 4. Provide any warnings or caveats
387
+ 5. Suggest future optimizations
388
+
389
+ Format your response as:
390
+
391
+ ## Recommended Card
392
+ [Card name and key benefit]
393
+
394
+ ## Reasoning
395
+ [Step-by-step logic]
396
+
397
+ ## Comparison
398
+ [Table comparing top 3 options]
399
+
400
+ ## Warnings
401
+ [Any caveats or cap warnings]
402
+
403
+ ## Future Optimization
404
+ [How to maximize rewards going forward]
405
+
406
+ Be specific with numbers and percentages.
407
+ """
408
+
409
+ response = model.generate_content(
410
+ prompt,
411
+ generation_config={
412
+ "temperature": 0.7,
413
+ "max_output_tokens": 2048
414
+ }
415
+ )
416
+
417
+ return response.text
418
+ ```
419
+
420
+ ### Example Reasoning Output
421
+
422
+ ```markdown
423
+ ## Recommended Card
424
+ **American Express Gold** - 4x points on U.S. supermarkets
425
+
426
+ ## Reasoning
427
+
428
+ 1. **Reward Rate Comparison:**
429
+ - Amex Gold: 4x points = $5.10 cash value (1.3 cpp transfer)
430
+ - Citi Custom Cash: Would be 5% = $6.38, but monthly cap hit
431
+ - Chase Freedom Flex: 1x points = $1.28 (not grocery quarter)
432
+
433
+ Winner: Amex Gold ($5.10 actual rewards)
434
+
435
+ 2. **Merchant Acceptance:**
436
+ - Whole Foods accepts American Express ✅
437
+ - No foreign transaction fees ✅
438
+
439
+ 3. **Spending Cap Status:**
440
+ - Current: $2,450 / $25,000 annual cap (9.8% used)
441
+ - This transaction: $127.50 (0.5% of cap)
442
+ - Safe to use ✅
443
+
444
+ 4. **Future Spending Forecast:**
445
+ - Predicted 3 more grocery trips this month ($382.50 total)
446
+ - Projected rewards: $15.30
447
+ - Still well under annual cap
448
+
449
+ ## Comparison
450
+
451
+ | Card | Earn Rate | Rewards | Cap Status | Accepted? |
452
+ |------|-----------|---------|------------|-----------|
453
+ | **Amex Gold** | 4x | **$5.10** | 9.8% used | ✅ Yes |
454
+ | Citi Custom Cash | 5% | $1.28 | Cap hit | ✅ Yes |
455
+ | Chase Freedom Flex | 1x | $1.28 | N/A | ✅ Yes |
456
+
457
+ ## Warnings
458
+
459
+ ⚠️ **Citi Custom Cash Cap Hit**: You've reached the $500 monthly limit on Citi Custom Cash. It will reset on Feb 1st. Consider using it for non-grocery purchases this month.
460
+
461
+ ⚠️ **Annual Cap Tracking**: You're at $2,450/$25,000 on Amex Gold's supermarket bonus. At current pace, you'll hit the cap in November. Plan to switch to Citi Custom Cash after that.
462
+
463
+ ## Future Optimization
464
+
465
+ 1. **This Month**: Continue using Amex Gold for groceries (best rate)
466
+ 2. **Next Month**: Switch to Citi Custom Cash (5% > 4x after cap resets)
467
+ 3. **After $25k Cap**: Use Citi Custom Cash or Chase Freedom (if grocery quarter)
468
+ 4. **Consider**: Blue Cash Preferred (6% groceries, no cap) if spending exceeds $25k/year
469
+
470
+ **Estimated Annual Savings**: $523 by following this strategy vs. using single card
471
+ ```
472
+
473
+ ---
474
+
475
+ ## Phase 4: Response Formatting
476
+
477
+ ### Implementation
478
+
479
+ ```python
480
+ from pydantic import BaseModel
481
+ from typing import List, Optional
482
+
483
+ class RecommendedCard(BaseModel):
484
+ card_id: str
485
+ card_name: str
486
+ issuer: str
487
+
488
+ class Rewards(BaseModel):
489
+ points_earned: int
490
+ cash_value: float
491
+ earn_rate: str
492
+
493
+ class Alternative(BaseModel):
494
+ card_name: str
495
+ rewards: float
496
+ reason: str
497
+
498
+ class FinalRecommendation(BaseModel):
499
+ recommended_card: RecommendedCard
500
+ rewards: Rewards
501
+ reasoning: str
502
+ confidence: float
503
+ alternatives: List[Alternative]
504
+ warnings: List[str]
505
+ processing_time_ms: float
506
+
507
+ def format_recommendation(
508
+ mcp_results: dict,
509
+ reasoning: str,
510
+ processing_time: float
511
+ ) -> FinalRecommendation:
512
+ """Format final response"""
513
+
514
+ smart_wallet_result = mcp_results["smart_wallet"]
515
+ best_card = smart_wallet_result["recommended_card"]
516
+
517
+ # Extract alternatives
518
+ alternatives = []
519
+ for card in smart_wallet_result["all_cards_comparison"][1:4]:
520
+ alternatives.append(Alternative(
521
+ card_name=card["card_name"],
522
+ rewards=card["rewards"],
523
+ reason=card.get("note", "Lower rewards rate")
524
+ ))
525
+
526
+ # Extract warnings
527
+ warnings = []
528
+ if "forecast" in mcp_results:
529
+ warnings.extend(mcp_results["forecast"].get("warnings", []))
530
+
531
+ return FinalRecommendation(
532
+ recommended_card=RecommendedCard(**best_card),
533
+ rewards=Rewards(**smart_wallet_result["rewards"]),
534
+ reasoning=reasoning,
535
+ confidence=calculate_confidence(mcp_results),
536
+ alternatives=alternatives,
537
+ warnings=warnings,
538
+ processing_time_ms=processing_time
539
+ )
540
+ ```
541
+
542
+ ---
543
+
544
+ ## Advanced Reasoning Patterns
545
+
546
+ ### 1. Chain-of-Thought Reasoning
547
+
548
+ ```python
549
+ prompt = """
550
+ Let's think through this step-by-step:
551
+
552
+ Step 1: Identify the category
553
+ - Merchant: {merchant}
554
+ - MCC: {mcc}
555
+ - Likely category: ?
556
+
557
+ Step 2: List cards with bonuses in this category
558
+ - Card A: X% on category
559
+ - Card B: Y points per dollar
560
+ - Card C: Z% cashback
561
+
562
+ Step 3: Calculate actual rewards
563
+ - Card A: ${amount} × X% = $?
564
+ - Card B: ${amount} × Y points × $0.01 = $?
565
+ - Card C: ${amount} × Z% = $?
566
+
567
+ Step 4: Check constraints
568
+ - Is Card A accepted at merchant?
569
+ - Is Card B near spending cap?
570
+ - Does Card C have annual fee?
571
+
572
+ Step 5: Make recommendation
573
+ Based on steps 1-4, the best card is...
574
+ """
575
+ ```
576
+
577
+ ### 2. Self-Consistency
578
+
579
+ ```python
580
+ # Generate multiple reasoning paths
581
+ reasoning_paths = []
582
+ for i in range(5):
583
+ response = model.generate_content(prompt, temperature=0.8)
584
+ reasoning_paths.append(response.text)
585
+
586
+ # Vote on most common recommendation
587
+ from collections import Counter
588
+ recommendations = [extract_card(path) for path in reasoning_paths]
589
+ most_common = Counter(recommendations).most_common(1)[0][0]
590
+
591
+ # Use the reasoning path that led to most common answer
592
+ final_reasoning = next(
593
+ path for path in reasoning_paths
594
+ if extract_card(path) == most_common
595
+ )
596
+ ```
597
+
598
+ ### 3. Reflection & Verification
599
+
600
+ ```python
601
+ # Initial recommendation
602
+ initial_rec = await generate_recommendation(transaction, mcp_results)
603
+
604
+ # Self-critique
605
+ critique_prompt = f"""
606
+ Review this credit card recommendation:
607
+
608
+ {initial_rec}
609
+
610
+ Are there any errors or oversights?
611
+ - Did we miss a better card?
612
+ - Are the math calculations correct?
613
+ - Did we consider all constraints?
614
+ - Is the reasoning sound?
615
+
616
+ If you find issues, provide corrections.
617
+ """
618
+
619
+ critique = model.generate_content(critique_prompt)
620
+
621
+ # Refine if needed
622
+ if "error" in critique.text.lower() or "issue" in critique.text.lower():
623
+ final_rec = await refine_recommendation(initial_rec, critique.text)
624
+ else:
625
+ final_rec = initial_rec
626
+ ```
627
+
628
+ ---
629
+
630
+ ## Confidence Scoring
631
+
632
+ ```python
633
+ def calculate_confidence(mcp_results: dict) -> float:
634
+ """
635
+ Calculate confidence score based on multiple factors
636
+ """
637
+
638
+ confidence = 1.0
639
+
640
+ # Factor 1: Reward difference (higher difference = higher confidence)
641
+ best_reward = mcp_results["smart_wallet"]["recommended_card"]["rewards"]
642
+ second_best = mcp_results["smart_wallet"]["all_cards_comparison"][1]["rewards"]
643
+
644
+ reward_gap = (best_reward - second_best) / best_reward
645
+ if reward_gap < 0.1: # Less than 10% difference
646
+ confidence *= 0.8
647
+
648
+ # Factor 2: Merchant acceptance certainty
649
+ if "rewards_rag" in mcp_results:
650
+ rag_confidence = mcp_results["rewards_rag"]["sources"][0]["relevance_score"]
651
+ confidence *= rag_confidence
652
+
653
+ # Factor 3: Cap warnings
654
+ if "forecast" in mcp_results:
655
+ if mcp_results["forecast"].get("warnings"):
656
+ confidence *= 0.9
657
+
658
+ # Factor 4: Data freshness
659
+ # (Lower confidence for stale data)
660
+
661
+ return round(confidence, 2)
662
+ ```
663
+
664
+ ---
665
+
666
+ ## Error Handling & Fallbacks
667
+
668
+ ```python
669
+ async def recommend_with_fallback(transaction: dict):
670
+ """Graceful degradation if MCP servers fail"""
671
+
672
+ try:
673
+ # Try full reasoning pipeline
674
+ plan = await create_execution_plan(transaction)
675
+ mcp_results = await execute_mcp_calls(plan, transaction)
676
+ reasoning = await synthesize_reasoning(transaction, mcp_results, plan)
677
+ return format_recommendation(mcp_results, reasoning)
678
+
679
+ except Exception as e:
680
+ logger.error(f"Full pipeline failed: {e}")
681
+
682
+ try:
683
+ # Fallback: Use only Smart Wallet MCP
684
+ result = await call_smart_wallet(transaction)
685
+ return format_simple_recommendation(result)
686
+
687
+ except Exception as e2:
688
+ logger.error(f"Fallback failed: {e2}")
689
+
690
+ # Last resort: Rule-based recommendation
691
+ return rule_based_recommendation(transaction)
692
+
693
+ def rule_based_recommendation(transaction: dict):
694
+ """Simple rule-based fallback"""
695
+
696
+ rules = {
697
+ "Groceries": "Amex Gold (4x points)",
698
+ "Dining": "Amex Gold (4x points)",
699
+ "Travel": "Chase Sapphire Reserve (3x points)",
700
+ "Gas": "Costco Anywhere Visa (4% cashback)",
701
+ "Default": "Citi Double Cash (2% on everything)"
702
+ }
703
+
704
+ category = transaction["category"]
705
+ recommended = rules.get(category, rules["Default"])
706
+
707
+ return {
708
+ "recommended_card": recommended,
709
+ "reasoning": f"Based on category rules for {category}",
710
+ "confidence": 0.60, # Lower confidence for rule-based
711
+ "warnings": ["Recommendation based on simplified rules (MCP servers unavailable)"]
712
+ }
713
+ ```
714
+
715
+ ---
716
+
717
+ ## Testing & Evaluation
718
+
719
+ ### Unit Tests
720
+
721
+ ```python
722
+ import pytest
723
+
724
+ @pytest.mark.asyncio
725
+ async def test_planning_phase():
726
+ """Test Claude's planning logic"""
727
+ transaction = {
728
+ "merchant": "Whole Foods",
729
+ "category": "Groceries",
730
+ "amount_usd": 127.50,
731
+ "mcc": "5411"
732
+ }
733
+
734
+ plan = await create_execution_plan(transaction)
735
+
736
+ assert "strategy" in plan
737
+ assert "mcp_calls" in plan
738
+ assert len(plan["mcp_calls"]) > 0
739
+ assert plan["confidence_threshold"] >= 0.5
740
+
741
+ @pytest.mark.asyncio
742
+ async def test_reasoning_phase():
743
+ """Test Gemini's synthesis"""
744
+ mcp_results = {
745
+ "smart_wallet": {
746
+ "recommended_card": {"card_name": "Amex Gold"},
747
+ "rewards": {"cash_value": 5.10}
748
+ }
749
+ }
750
+
751
+ reasoning = await synthesize_reasoning({}, mcp_results, {})
752
+
753
+ assert "Amex Gold" in reasoning
754
+ assert "$5.10" in reasoning
755
+ ```
756
+
757
+ ### Integration Tests
758
+
759
+ ```python
760
+ @pytest.mark.asyncio
761
+ async def test_end_to_end_recommendation():
762
+ """Test full recommendation pipeline"""
763
+ transaction = {
764
+ "user_id": "test_user",
765
+ "merchant": "Whole Foods",
766
+ "category": "Groceries",
767
+ "amount_usd": 127.50,
768
+ "mcc": "5411"
769
+ }
770
+
771
+ result = await recommend_with_fallback(transaction)
772
+
773
+ assert result["recommended_card"]["card_name"]
774
+ assert result["rewards"]["cash_value"] > 0
775
+ assert result["confidence"] >= 0.5
776
+ assert len(result["reasoning"]) > 100
777
+ ```
778
+
779
+ ---
780
+
781
+ **Related Documentation:**
782
+ - [MCP Server Implementation](./mcp_architecture.md)
783
+ - [Modal Deployment Guide](./modal_deployment.md)
784
+ - [LlamaIndex RAG Setup](./llamaindex_setup.md)
785
+ ```
786
+
787
+ ---