rewardpilot-web-ui / docs /llamaindex_setup.md
sammy786's picture
Create llamaindex_setup.md
b7662d1 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
# LlamaIndex RAG Setup Guide

## Overview

RewardPilot uses LlamaIndex to build a semantic search system over 50+ credit card benefit documents. This enables the agent to answer complex questions like "Which card has the best travel insurance?" or "Does Amex Gold work at Costco?"

## Why LlamaIndex + RAG?

| Problem | Traditional Approach | RAG Solution |
|---------|---------------------|--------------|
| **Card benefits change** | Hardcode rules β†’ outdated | Dynamic document retrieval |
| **Complex questions** | Manual lookup | Semantic search |
| **50+ cards** | Impossible to memorize | Vector similarity |
| **Nuanced rules** | Prone to errors | Context-aware answers |

**Example:**
- **Question:** "Can I use Chase Sapphire Reserve for airport lounge access when flying domestic?"
- **Traditional:** Check 10+ pages of terms
- **RAG:** Semantic search β†’ "Yes, Priority Pass includes domestic lounges"

---

## Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ User Question β”‚ β”‚ "Which card has best grocery rewards?" β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Query Transformation β”‚ β”‚ (Expand, rephrase, extract keywords) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Embedding Model β”‚ β”‚ OpenAI text-embedding-3-small β”‚ β”‚ (1536 dimensions) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Vector Store β”‚ β”‚ ChromaDB β”‚ β”‚ (50+ card documents) β”‚ β”‚ (10,000+ chunks) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ Retrieve top-k (k=5) β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Retrieved Context β”‚ β”‚ 1. Amex Gold: 4x points on U.S. supermarkets... β”‚ β”‚ 2. Citi Custom Cash: 5% on top category... β”‚ β”‚ 3. Chase Freedom Flex: 5% rotating categories... β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Reranking β”‚ β”‚ (Cohere Rerank or Cross-Encoder) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ LLM Synthesis β”‚ β”‚ Gemini 2.0 Flash Exp β”‚ β”‚ (Generate answer from context) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Final Answer β”‚ β”‚ "Amex Gold offers 4x points (best rate) but has β”‚ β”‚ $25k annual cap. Citi Custom Cash gives 5% but β”‚ β”‚ only $500/month. For high spenders, use Amex." β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜


---

## Setup

### 1. Install Dependencies

```bash
pip install llama-index==0.12.5 \
  llama-index-vector-stores-chroma==0.4.1 \
  llama-index-embeddings-openai==0.3.1 \
  llama-index-llms-gemini==0.4.2 \
  chromadb==0.5.23 \
  pypdf==5.1.0 \
  beautifulsoup4==4.12.3

2. Prepare Card Documents

Create directory structure:

data/
β”œβ”€β”€ cards/
β”‚   β”œβ”€β”€ amex_gold.pdf
β”‚   β”œβ”€β”€ chase_sapphire_reserve.pdf
β”‚   β”œβ”€β”€ citi_custom_cash.pdf
β”‚   └── ... (50+ cards)
β”œβ”€β”€ terms/
β”‚   β”œβ”€β”€ amex_terms.pdf
β”‚   β”œβ”€β”€ chase_terms.pdf
β”‚   └── ...
└── guides/
    β”œβ”€β”€ maximizing_rewards.md
    β”œβ”€β”€ category_codes.md
    └── ...

3. Document Sources

Option A: Scrape from Issuer Websites

# scrape_card_docs.py
import requests
from bs4 import BeautifulSoup
import PyPDF2
import os

CARD_URLS = {
    "amex_gold": "https://www.americanexpress.com/us/credit-cards/card/gold-card/",
    "chase_sapphire_reserve": "https://creditcards.chase.com/rewards-credit-cards/sapphire/reserve",
    # ... more cards
}

def scrape_card_benefits(url, output_file):
    """Scrape card benefits from issuer website"""
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Extract benefits section
    benefits = soup.find('div', class_='benefits-section')
    
    # Save to markdown
    with open(output_file, 'w') as f:
        f.write(f"# {card_name}\n\n")
        f.write(benefits.get_text())

# Scrape all cards
for card_name, url in CARD_URLS.items():
    scrape_card_benefits(url, f"data/cards/{card_name}.md")

Option B: Manual Documentation

Create markdown files:

File: data/cards/amex_gold.md

# American Express Gold Card

## Overview
- **Annual Fee:** $325
- **Rewards Rate:** 4x points on dining & U.S. supermarkets (up to $25k/year)
- **Welcome Bonus:** 90,000 points after $6k spend in 6 months

## Earning Structure

### 4x Points
- Restaurants worldwide (including takeout & delivery)
- U.S. supermarkets (up to $25,000 per year, then 1x)

### 3x Points
- Flights booked directly with airlines or on amextravel.com

### 1x Points
- All other purchases

## Monthly Credits
- $10 Uber Cash (Uber Eats eligible)
- $10 Grubhub/Seamless/The Cheesecake Factory/select Shake Shack

## Travel Benefits
- No foreign transaction fees
- Trip delay insurance
- Lost luggage insurance
- Car rental loss and damage insurance

## Merchant Acceptance
- **Accepted:** Most merchants worldwide
- **Not Accepted:** Costco warehouses (Costco.com works)
- **Not Accepted:** Some small businesses

## Redemption Options
- Transfer to 20+ airline/hotel partners (1:1 ratio)
- Pay with Points at Amazon (0.7 cents per point)
- Statement credits (0.6 cents per point)
- Book travel through Amex Travel (1 cent per point)

## Best For
- High grocery spending (up to $25k/year)
- Frequent dining out
- Travelers who value transfer partners

## Limitations
- $25,000 annual cap on 4x supermarket category
- Amex not accepted everywhere
- Annual fee not waived first year

Implementation

File: rewards_rag_server.py

"""
LlamaIndex RAG server for credit card benefits
"""

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    ServiceContext,
    Settings
)
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.gemini import Gemini
from llama_index.core.node_parser import SentenceSplitter
import chromadb
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import os

# Initialize FastAPI
app = FastAPI(title="Rewards RAG MCP Server")

# Configure LlamaIndex
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    api_key=os.getenv("OPENAI_API_KEY")
)
Settings.llm = Gemini(
    model="models/gemini-2.0-flash-exp",
    api_key=os.getenv("GEMINI_API_KEY")
)
Settings.chunk_size = 512
Settings.chunk_overlap = 50

# Initialize ChromaDB
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("credit_cards")

# Create vector store
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

---

## Document Loading & Indexing

def load_and_index_documents():
    """Load card documents and create vector index"""
    
    # Load documents from directory
    documents = SimpleDirectoryReader(
        input_dir="./data",
        recursive=True,
        required_exts=[".pdf", ".md", ".txt"]
    ).load_data()
    
    print(f"Loaded {len(documents)} documents")
    
    # Parse into nodes (chunks)
    node_parser = SentenceSplitter(
        chunk_size=512,
        chunk_overlap=50
    )
    nodes = node_parser.get_nodes_from_documents(documents)
    
    print(f"Created {len(nodes)} nodes")
    
    # Create index
    index = VectorStoreIndex(
        nodes=nodes,
        storage_context=storage_context
    )
    
    # Persist to disk
    index.storage_context.persist(persist_dir="./storage")
    
    return index

# Load index on startup
try:
    # Try loading existing index
    storage_context = StorageContext.from_defaults(
        vector_store=vector_store,
        persist_dir="./storage"
    )
    index = VectorStoreIndex.from_storage_context(storage_context)
    print("Loaded existing index")
except:
    # Create new index
    print("Creating new index...")
    index = load_and_index_documents()

# Create query engine
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="compact"
)

---

## API Endpoints

class QueryRequest(BaseModel):
    query: str
    card_name: str = None
    top_k: int = 5

class QueryResponse(BaseModel):
    answer: str
    sources: list
    confidence: float

@app.post("/query", response_model=QueryResponse)
async def query_benefits(request: QueryRequest):
    """
    Query credit card benefits
    
    Example:
    POST /query
    {
        "query": "Which card has best grocery rewards?",
        "top_k": 5
    }
    """
    try:
        # Add card filter if specified
        if request.card_name:
            query = f"For {request.card_name}: {request.query}"
        else:
            query = request.query
        
        # Query the index
        response = query_engine.query(query)
        
        # Extract sources
        sources = []
        for node in response.source_nodes:
            sources.append({
                "card_name": node.metadata.get("file_name", "Unknown"),
                "content": node.text[:200] + "...",
                "relevance_score": float(node.score)
            })
        
        # Calculate confidence based on top score
        confidence = sources[0]["relevance_score"] if sources else 0.0
        
        return QueryResponse(
            answer=str(response),
            sources=sources,
            confidence=confidence
        )
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

---

## Advanced Query Techniques

@app.post("/compare")
async def compare_cards(request: dict):
    """
    Compare multiple cards on specific criteria
    
    Example:
    POST /compare
    {
        "cards": ["Amex Gold", "Chase Sapphire Reserve"],
        "criteria": "travel benefits"
    }
    """
    cards = request["cards"]
    criteria = request["criteria"]
    
    # Query each card
    comparisons = []
    for card in cards:
        query = f"What are the {criteria} for {card}?"
        response = query_engine.query(query)
        
        comparisons.append({
            "card": card,
            "benefits": str(response)
        })
    
    # Synthesize comparison
    synthesis_prompt = f"""
    Compare these cards on {criteria}:
    
    {comparisons}
    
    Provide a clear winner and reasoning.
    """
    
    final_response = Settings.llm.complete(synthesis_prompt)
    
    return {
        "comparison": str(final_response),
        "details": comparisons
    }

---

## Metadata Filtering

def add_metadata_to_documents():
    """Add rich metadata for filtering"""
    
    documents = SimpleDirectoryReader("./data").load_data()
    
    for doc in documents:
        # Extract card name from filename
        card_name = doc.metadata["file_name"].replace(".md", "")
        
        # Add metadata
        doc.metadata.update({
            "card_name": card_name,
            "issuer": extract_issuer(card_name),
            "annual_fee": extract_annual_fee(doc.text),
            "category": extract_category(doc.text)
        })
    
    return documents

# Query with filters
@app.post("/query_filtered")
async def query_with_filters(request: dict):
    """
    Query with metadata filters
    
    Example:
    POST /query_filtered
    {
        "query": "best travel card",
        "filters": {
            "issuer": "Chase",
            "annual_fee": {"$lte": 500}
        }
    }
    """
    from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
    
    # Build filters
    filters = MetadataFilters(
        filters=[
            ExactMatchFilter(key="issuer", value=request["filters"]["issuer"])
        ]
    )
    
    # Query with filters
    query_engine = index.as_query_engine(
        similarity_top_k=5,
        filters=filters
    )
    
    response = query_engine.query(request["query"])
    
    return {"answer": str(response)}

---

## Hybrid Search (Keyword + Semantic)

from llama_index.core.retrievers import VectorIndexRetriever, BM25Retriever
from llama_index.core.query_engine import RetrieverQueryEngine

def create_hybrid_retriever():
    """Combine vector search + keyword search"""
    
    # Vector retriever
    vector_retriever = VectorIndexRetriever(
        index=index,
        similarity_top_k=10
    )
    
    # BM25 keyword retriever
    bm25_retriever = BM25Retriever.from_defaults(
        docstore=index.docstore,
        similarity_top_k=10
    )
    
    # Combine retrievers
    from llama_index.core.retrievers import QueryFusionRetriever
    
    hybrid_retriever = QueryFusionRetriever(
        retrievers=[vector_retriever, bm25_retriever],
        similarity_top_k=5,
        num_queries=1
    )
    
    return RetrieverQueryEngine(retriever=hybrid_retriever)

---

## Reranking for Better Results

from llama_index.postprocessor.cohere_rerank import CohereRerank

def create_reranking_query_engine():
    """Add reranking for improved relevance"""
    
    # Cohere reranker
    reranker = CohereRerank(
        api_key=os.getenv("COHERE_API_KEY"),
        top_n=3
    )
    
    query_engine = index.as_query_engine(
        similarity_top_k=10,  # Retrieve more candidates
        node_postprocessors=[reranker]  # Rerank to top 3
    )
    
    return query_engine

---

## Evaluation & Metrics

from llama_index.core.evaluation import (
    RelevancyEvaluator,
    FaithfulnessEvaluator
)

async def evaluate_rag_quality():
    """Evaluate RAG system quality"""
    
    # Test queries
    test_queries = [
        "Which card has best grocery rewards?",
        "Does Amex Gold work at Costco?",
        "What are Chase Sapphire Reserve travel benefits?"
    ]
    
    # Ground truth answers
    ground_truth = [
        "Citi Custom Cash offers 5% on groceries...",
        "No, American Express is not accepted at Costco warehouses...",
        "Chase Sapphire Reserve includes Priority Pass..."
    ]
    
    # Evaluators
    relevancy_evaluator = RelevancyEvaluator(llm=Settings.llm)
    faithfulness_evaluator = FaithfulnessEvaluator(llm=Settings.llm)
    
    results = []
    for query, truth in zip(test_queries, ground_truth):
        response = query_engine.query(query)
        
        # Evaluate relevancy
        relevancy_result = await relevancy_evaluator.aevaluate(
            query=query,
            response=str(response)
        )
        
        # Evaluate faithfulness
        faithfulness_result = await faithfulness_evaluator.aevaluate(
            query=query,
            response=str(response),
            contexts=[node.text for node in response.source_nodes]
        )
        
        results.append({
            "query": query,
            "relevancy_score": relevancy_result.score,
            "faithfulness_score": faithfulness_result.score
        })
    
    return results

---

## Deployment

### 1. Build Docker Image

**File:** `Dockerfile`
```dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Download and index documents on build
RUN python -c "from rewards_rag_server import load_and_index_documents; load_and_index_documents()"

# Expose port
EXPOSE 7860

# Run server
CMD ["uvicorn", "rewards_rag_server:app", "--host", "0.0.0.0", "--port", "7860"]

2. Deploy to Hugging Face Spaces

# Create Space
huggingface-cli repo create rewardpilot-rewards-rag --type space --space_sdk docker

# Push files
git add .
git commit -m "Deploy RAG server"
git push

Performance Optimization

1. Caching Embeddings

from functools import lru_cache

@lru_cache(maxsize=1000)
def get_embedding(text: str):
    """Cache embeddings for repeated queries"""
    return Settings.embed_model.get_text_embedding(text)

2. Batch Processing

async def batch_query(queries: list):
    """Process multiple queries in parallel"""
    import asyncio
    
    tasks = [query_engine.aquery(q) for q in queries]
    results = await asyncio.gather(*tasks)
    
    return results

3. Index Optimization

# Use smaller embedding model for speed
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",  # 1536 dims
    # vs text-embedding-3-large (3072 dims)
)

# Reduce chunk size for faster retrieval
Settings.chunk_size = 256  # vs 512

Monitoring

import time
from prometheus_client import Counter, Histogram

# Metrics
query_counter = Counter('rag_queries_total', 'Total RAG queries')
query_duration = Histogram('rag_query_duration_seconds', 'RAG query duration')

@app.post("/query")
async def query_with_monitoring(request: QueryRequest):
    query_counter.inc()
    
    start_time = time.time()
    response = query_engine.query(request.query)
    duration = time.time() - start_time
    
    query_duration.observe(duration)
    
    return response

Related Documentation:


---