rewardpilot-web-ui / docs /llamaindex_setup.md
sammy786's picture
Create llamaindex_setup.md
b7662d1 verified
```markdown
# LlamaIndex RAG Setup Guide
## Overview
RewardPilot uses LlamaIndex to build a semantic search system over 50+ credit card benefit documents. This enables the agent to answer complex questions like "Which card has the best travel insurance?" or "Does Amex Gold work at Costco?"
## Why LlamaIndex + RAG?
| Problem | Traditional Approach | RAG Solution |
|---------|---------------------|--------------|
| **Card benefits change** | Hardcode rules β†’ outdated | Dynamic document retrieval |
| **Complex questions** | Manual lookup | Semantic search |
| **50+ cards** | Impossible to memorize | Vector similarity |
| **Nuanced rules** | Prone to errors | Context-aware answers |
**Example:**
- **Question:** "Can I use Chase Sapphire Reserve for airport lounge access when flying domestic?"
- **Traditional:** Check 10+ pages of terms
- **RAG:** Semantic search β†’ "Yes, Priority Pass includes domestic lounges"
---
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User Question β”‚
β”‚ "Which card has best grocery rewards?" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Query Transformation β”‚
β”‚ (Expand, rephrase, extract keywords) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Embedding Model β”‚
β”‚ OpenAI text-embedding-3-small β”‚
β”‚ (1536 dimensions) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Vector Store β”‚
β”‚ ChromaDB β”‚
β”‚ (50+ card documents) β”‚
β”‚ (10,000+ chunks) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”‚ Retrieve top-k (k=5)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Retrieved Context β”‚
β”‚ 1. Amex Gold: 4x points on U.S. supermarkets... β”‚
β”‚ 2. Citi Custom Cash: 5% on top category... β”‚
β”‚ 3. Chase Freedom Flex: 5% rotating categories... β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Reranking β”‚
β”‚ (Cohere Rerank or Cross-Encoder) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LLM Synthesis β”‚
β”‚ Gemini 2.0 Flash Exp β”‚
β”‚ (Generate answer from context) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Final Answer β”‚
β”‚ "Amex Gold offers 4x points (best rate) but has β”‚
β”‚ $25k annual cap. Citi Custom Cash gives 5% but β”‚
β”‚ only $500/month. For high spenders, use Amex." β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## Setup
### 1. Install Dependencies
```bash
pip install llama-index==0.12.5 \
llama-index-vector-stores-chroma==0.4.1 \
llama-index-embeddings-openai==0.3.1 \
llama-index-llms-gemini==0.4.2 \
chromadb==0.5.23 \
pypdf==5.1.0 \
beautifulsoup4==4.12.3
```
### 2. Prepare Card Documents
Create directory structure:
```
data/
β”œβ”€β”€ cards/
β”‚ β”œβ”€β”€ amex_gold.pdf
β”‚ β”œβ”€β”€ chase_sapphire_reserve.pdf
β”‚ β”œβ”€β”€ citi_custom_cash.pdf
β”‚ └── ... (50+ cards)
β”œβ”€β”€ terms/
β”‚ β”œβ”€β”€ amex_terms.pdf
β”‚ β”œβ”€β”€ chase_terms.pdf
β”‚ └── ...
└── guides/
β”œβ”€β”€ maximizing_rewards.md
β”œβ”€β”€ category_codes.md
└── ...
```
### 3. Document Sources
#### Option A: Scrape from Issuer Websites
```python
# scrape_card_docs.py
import requests
from bs4 import BeautifulSoup
import PyPDF2
import os
CARD_URLS = {
"amex_gold": "https://www.americanexpress.com/us/credit-cards/card/gold-card/",
"chase_sapphire_reserve": "https://creditcards.chase.com/rewards-credit-cards/sapphire/reserve",
# ... more cards
}
def scrape_card_benefits(url, output_file):
"""Scrape card benefits from issuer website"""
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# Extract benefits section
benefits = soup.find('div', class_='benefits-section')
# Save to markdown
with open(output_file, 'w') as f:
f.write(f"# {card_name}\n\n")
f.write(benefits.get_text())
# Scrape all cards
for card_name, url in CARD_URLS.items():
scrape_card_benefits(url, f"data/cards/{card_name}.md")
```
#### Option B: Manual Documentation
Create markdown files:
**File:** `data/cards/amex_gold.md`
```markdown
# American Express Gold Card
## Overview
- **Annual Fee:** $325
- **Rewards Rate:** 4x points on dining & U.S. supermarkets (up to $25k/year)
- **Welcome Bonus:** 90,000 points after $6k spend in 6 months
## Earning Structure
### 4x Points
- Restaurants worldwide (including takeout & delivery)
- U.S. supermarkets (up to $25,000 per year, then 1x)
### 3x Points
- Flights booked directly with airlines or on amextravel.com
### 1x Points
- All other purchases
## Monthly Credits
- $10 Uber Cash (Uber Eats eligible)
- $10 Grubhub/Seamless/The Cheesecake Factory/select Shake Shack
## Travel Benefits
- No foreign transaction fees
- Trip delay insurance
- Lost luggage insurance
- Car rental loss and damage insurance
## Merchant Acceptance
- **Accepted:** Most merchants worldwide
- **Not Accepted:** Costco warehouses (Costco.com works)
- **Not Accepted:** Some small businesses
## Redemption Options
- Transfer to 20+ airline/hotel partners (1:1 ratio)
- Pay with Points at Amazon (0.7 cents per point)
- Statement credits (0.6 cents per point)
- Book travel through Amex Travel (1 cent per point)
## Best For
- High grocery spending (up to $25k/year)
- Frequent dining out
- Travelers who value transfer partners
## Limitations
- $25,000 annual cap on 4x supermarket category
- Amex not accepted everywhere
- Annual fee not waived first year
```
---
## Implementation
### File: `rewards_rag_server.py`
```python
"""
LlamaIndex RAG server for credit card benefits
"""
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
StorageContext,
ServiceContext,
Settings
)
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.gemini import Gemini
from llama_index.core.node_parser import SentenceSplitter
import chromadb
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import os
# Initialize FastAPI
app = FastAPI(title="Rewards RAG MCP Server")
# Configure LlamaIndex
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key=os.getenv("OPENAI_API_KEY")
)
Settings.llm = Gemini(
model="models/gemini-2.0-flash-exp",
api_key=os.getenv("GEMINI_API_KEY")
)
Settings.chunk_size = 512
Settings.chunk_overlap = 50
# Initialize ChromaDB
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("credit_cards")
# Create vector store
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
---
## Document Loading & Indexing
def load_and_index_documents():
"""Load card documents and create vector index"""
# Load documents from directory
documents = SimpleDirectoryReader(
input_dir="./data",
recursive=True,
required_exts=[".pdf", ".md", ".txt"]
).load_data()
print(f"Loaded {len(documents)} documents")
# Parse into nodes (chunks)
node_parser = SentenceSplitter(
chunk_size=512,
chunk_overlap=50
)
nodes = node_parser.get_nodes_from_documents(documents)
print(f"Created {len(nodes)} nodes")
# Create index
index = VectorStoreIndex(
nodes=nodes,
storage_context=storage_context
)
# Persist to disk
index.storage_context.persist(persist_dir="./storage")
return index
# Load index on startup
try:
# Try loading existing index
storage_context = StorageContext.from_defaults(
vector_store=vector_store,
persist_dir="./storage"
)
index = VectorStoreIndex.from_storage_context(storage_context)
print("Loaded existing index")
except:
# Create new index
print("Creating new index...")
index = load_and_index_documents()
# Create query engine
query_engine = index.as_query_engine(
similarity_top_k=5,
response_mode="compact"
)
---
## API Endpoints
class QueryRequest(BaseModel):
query: str
card_name: str = None
top_k: int = 5
class QueryResponse(BaseModel):
answer: str
sources: list
confidence: float
@app.post("/query", response_model=QueryResponse)
async def query_benefits(request: QueryRequest):
"""
Query credit card benefits
Example:
POST /query
{
"query": "Which card has best grocery rewards?",
"top_k": 5
}
"""
try:
# Add card filter if specified
if request.card_name:
query = f"For {request.card_name}: {request.query}"
else:
query = request.query
# Query the index
response = query_engine.query(query)
# Extract sources
sources = []
for node in response.source_nodes:
sources.append({
"card_name": node.metadata.get("file_name", "Unknown"),
"content": node.text[:200] + "...",
"relevance_score": float(node.score)
})
# Calculate confidence based on top score
confidence = sources[0]["relevance_score"] if sources else 0.0
return QueryResponse(
answer=str(response),
sources=sources,
confidence=confidence
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
---
## Advanced Query Techniques
@app.post("/compare")
async def compare_cards(request: dict):
"""
Compare multiple cards on specific criteria
Example:
POST /compare
{
"cards": ["Amex Gold", "Chase Sapphire Reserve"],
"criteria": "travel benefits"
}
"""
cards = request["cards"]
criteria = request["criteria"]
# Query each card
comparisons = []
for card in cards:
query = f"What are the {criteria} for {card}?"
response = query_engine.query(query)
comparisons.append({
"card": card,
"benefits": str(response)
})
# Synthesize comparison
synthesis_prompt = f"""
Compare these cards on {criteria}:
{comparisons}
Provide a clear winner and reasoning.
"""
final_response = Settings.llm.complete(synthesis_prompt)
return {
"comparison": str(final_response),
"details": comparisons
}
---
## Metadata Filtering
def add_metadata_to_documents():
"""Add rich metadata for filtering"""
documents = SimpleDirectoryReader("./data").load_data()
for doc in documents:
# Extract card name from filename
card_name = doc.metadata["file_name"].replace(".md", "")
# Add metadata
doc.metadata.update({
"card_name": card_name,
"issuer": extract_issuer(card_name),
"annual_fee": extract_annual_fee(doc.text),
"category": extract_category(doc.text)
})
return documents
# Query with filters
@app.post("/query_filtered")
async def query_with_filters(request: dict):
"""
Query with metadata filters
Example:
POST /query_filtered
{
"query": "best travel card",
"filters": {
"issuer": "Chase",
"annual_fee": {"$lte": 500}
}
}
"""
from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
# Build filters
filters = MetadataFilters(
filters=[
ExactMatchFilter(key="issuer", value=request["filters"]["issuer"])
]
)
# Query with filters
query_engine = index.as_query_engine(
similarity_top_k=5,
filters=filters
)
response = query_engine.query(request["query"])
return {"answer": str(response)}
---
## Hybrid Search (Keyword + Semantic)
from llama_index.core.retrievers import VectorIndexRetriever, BM25Retriever
from llama_index.core.query_engine import RetrieverQueryEngine
def create_hybrid_retriever():
"""Combine vector search + keyword search"""
# Vector retriever
vector_retriever = VectorIndexRetriever(
index=index,
similarity_top_k=10
)
# BM25 keyword retriever
bm25_retriever = BM25Retriever.from_defaults(
docstore=index.docstore,
similarity_top_k=10
)
# Combine retrievers
from llama_index.core.retrievers import QueryFusionRetriever
hybrid_retriever = QueryFusionRetriever(
retrievers=[vector_retriever, bm25_retriever],
similarity_top_k=5,
num_queries=1
)
return RetrieverQueryEngine(retriever=hybrid_retriever)
---
## Reranking for Better Results
from llama_index.postprocessor.cohere_rerank import CohereRerank
def create_reranking_query_engine():
"""Add reranking for improved relevance"""
# Cohere reranker
reranker = CohereRerank(
api_key=os.getenv("COHERE_API_KEY"),
top_n=3
)
query_engine = index.as_query_engine(
similarity_top_k=10, # Retrieve more candidates
node_postprocessors=[reranker] # Rerank to top 3
)
return query_engine
---
## Evaluation & Metrics
from llama_index.core.evaluation import (
RelevancyEvaluator,
FaithfulnessEvaluator
)
async def evaluate_rag_quality():
"""Evaluate RAG system quality"""
# Test queries
test_queries = [
"Which card has best grocery rewards?",
"Does Amex Gold work at Costco?",
"What are Chase Sapphire Reserve travel benefits?"
]
# Ground truth answers
ground_truth = [
"Citi Custom Cash offers 5% on groceries...",
"No, American Express is not accepted at Costco warehouses...",
"Chase Sapphire Reserve includes Priority Pass..."
]
# Evaluators
relevancy_evaluator = RelevancyEvaluator(llm=Settings.llm)
faithfulness_evaluator = FaithfulnessEvaluator(llm=Settings.llm)
results = []
for query, truth in zip(test_queries, ground_truth):
response = query_engine.query(query)
# Evaluate relevancy
relevancy_result = await relevancy_evaluator.aevaluate(
query=query,
response=str(response)
)
# Evaluate faithfulness
faithfulness_result = await faithfulness_evaluator.aevaluate(
query=query,
response=str(response),
contexts=[node.text for node in response.source_nodes]
)
results.append({
"query": query,
"relevancy_score": relevancy_result.score,
"faithfulness_score": faithfulness_result.score
})
return results
---
## Deployment
### 1. Build Docker Image
**File:** `Dockerfile`
```dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Download and index documents on build
RUN python -c "from rewards_rag_server import load_and_index_documents; load_and_index_documents()"
# Expose port
EXPOSE 7860
# Run server
CMD ["uvicorn", "rewards_rag_server:app", "--host", "0.0.0.0", "--port", "7860"]
```
### 2. Deploy to Hugging Face Spaces
```bash
# Create Space
huggingface-cli repo create rewardpilot-rewards-rag --type space --space_sdk docker
# Push files
git add .
git commit -m "Deploy RAG server"
git push
```
---
## Performance Optimization
### 1. Caching Embeddings
```python
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_embedding(text: str):
"""Cache embeddings for repeated queries"""
return Settings.embed_model.get_text_embedding(text)
```
### 2. Batch Processing
```python
async def batch_query(queries: list):
"""Process multiple queries in parallel"""
import asyncio
tasks = [query_engine.aquery(q) for q in queries]
results = await asyncio.gather(*tasks)
return results
```
### 3. Index Optimization
```python
# Use smaller embedding model for speed
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small", # 1536 dims
# vs text-embedding-3-large (3072 dims)
)
# Reduce chunk size for faster retrieval
Settings.chunk_size = 256 # vs 512
```
---
## Monitoring
```python
import time
from prometheus_client import Counter, Histogram
# Metrics
query_counter = Counter('rag_queries_total', 'Total RAG queries')
query_duration = Histogram('rag_query_duration_seconds', 'RAG query duration')
@app.post("/query")
async def query_with_monitoring(request: QueryRequest):
query_counter.inc()
start_time = time.time()
response = query_engine.query(request.query)
duration = time.time() - start_time
query_duration.observe(duration)
return response
```
---
**Related Documentation:**
- [MCP Server Implementation](./mcp_architecture.md)
- [Modal Deployment Guide](./modal_deployment.md)
- [Agent Reasoning Flow](./agent_reasoning.md)
```
---