Spaces:

MCP-1st-Birthday
/

rewardpilot-web-ui

Running

App Files Files Community

rewardpilot-web-ui / docs /llamaindex_setup.md

sammy786

Create llamaindex_setup.md

b7662d1 verified 13 days ago

preview code

raw

history blame contribute delete

20.6 kB

	```markdown
	# LlamaIndex RAG Setup Guide

	## Overview

	RewardPilot uses LlamaIndex to build a semantic search system over 50+ credit card benefit documents. This enables the agent to answer complex questions like "Which card has the best travel insurance?" or "Does Amex Gold work at Costco?"

	## Why LlamaIndex + RAG?

	\| Problem \| Traditional Approach \| RAG Solution \|
	\|---------\|---------------------\|--------------\|
	\| Card benefits change \| Hardcode rules → outdated \| Dynamic document retrieval \|
	\| Complex questions \| Manual lookup \| Semantic search \|
	\| 50+ cards \| Impossible to memorize \| Vector similarity \|
	\| Nuanced rules \| Prone to errors \| Context-aware answers \|

	Example:
	- Question: "Can I use Chase Sapphire Reserve for airport lounge access when flying domestic?"
	- Traditional: Check 10+ pages of terms
	- RAG: Semantic search → "Yes, Priority Pass includes domestic lounges"

	---

	## Architecture

	```
	┌─────────────────────────────────────────────────────────┐
	│ User Question │
	│ "Which card has best grocery rewards?" │
	└────────────────────┬────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Query Transformation │
	│ (Expand, rephrase, extract keywords) │
	└────────────────────┬────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Embedding Model │
	│ OpenAI text-embedding-3-small │
	│ (1536 dimensions) │
	└────────────────────┬────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Vector Store │
	│ ChromaDB │
	│ (50+ card documents) │
	│ (10,000+ chunks) │
	└────────────────────┬────────────────────────────────────┘
	│
	│ Retrieve top-k (k=5)
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Retrieved Context │
	│ 1. Amex Gold: 4x points on U.S. supermarkets... │
	│ 2. Citi Custom Cash: 5% on top category... │
	│ 3. Chase Freedom Flex: 5% rotating categories... │
	└────────────────────┬────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Reranking │
	│ (Cohere Rerank or Cross-Encoder) │
	└────────────────────┬────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ LLM Synthesis │
	│ Gemini 2.0 Flash Exp │
	│ (Generate answer from context) │
	└────────────────────┬────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ Final Answer │
	│ "Amex Gold offers 4x points (best rate) but has │
	│ $25k annual cap. Citi Custom Cash gives 5% but │
	│ only $500/month. For high spenders, use Amex." │
	└─────────────────────────────────────────────────────────┘
	```

	---

	## Setup

	### 1. Install Dependencies

	```bash
	pip install llama-index==0.12.5 \
	llama-index-vector-stores-chroma==0.4.1 \
	llama-index-embeddings-openai==0.3.1 \
	llama-index-llms-gemini==0.4.2 \
	chromadb==0.5.23 \
	pypdf==5.1.0 \
	beautifulsoup4==4.12.3
	```

	### 2. Prepare Card Documents

	Create directory structure:
	```
	data/
	├── cards/
	│ ├── amex_gold.pdf
	│ ├── chase_sapphire_reserve.pdf
	│ ├── citi_custom_cash.pdf
	│ └── ... (50+ cards)
	├── terms/
	│ ├── amex_terms.pdf
	│ ├── chase_terms.pdf
	│ └── ...
	└── guides/
	├── maximizing_rewards.md
	├── category_codes.md
	└── ...
	```

	### 3. Document Sources

	#### Option A: Scrape from Issuer Websites

	```python
	# scrape_card_docs.py
	import requests
	from bs4 import BeautifulSoup
	import PyPDF2
	import os

	CARD_URLS = {
	"amex_gold": "https://www.americanexpress.com/us/credit-cards/card/gold-card/",
	"chase_sapphire_reserve": "https://creditcards.chase.com/rewards-credit-cards/sapphire/reserve",
	# ... more cards
	}

	def scrape_card_benefits(url, output_file):
	"""Scrape card benefits from issuer website"""
	response = requests.get(url)
	soup = BeautifulSoup(response.text, 'html.parser')

	# Extract benefits section
	benefits = soup.find('div', class_='benefits-section')

	# Save to markdown
	with open(output_file, 'w') as f:
	f.write(f"# {card_name}\n\n")
	f.write(benefits.get_text())

	# Scrape all cards
	for card_name, url in CARD_URLS.items():
	scrape_card_benefits(url, f"data/cards/{card_name}.md")
	```

	#### Option B: Manual Documentation

	Create markdown files:

	File: `data/cards/amex_gold.md`
	```markdown
	# American Express Gold Card

	## Overview
	- Annual Fee: $325
	- Rewards Rate: 4x points on dining & U.S. supermarkets (up to $25k/year)
	- Welcome Bonus: 90,000 points after $6k spend in 6 months

	## Earning Structure

	### 4x Points
	- Restaurants worldwide (including takeout & delivery)
	- U.S. supermarkets (up to $25,000 per year, then 1x)

	### 3x Points
	- Flights booked directly with airlines or on amextravel.com

	### 1x Points
	- All other purchases

	## Monthly Credits
	- $10 Uber Cash (Uber Eats eligible)
	- $10 Grubhub/Seamless/The Cheesecake Factory/select Shake Shack

	## Travel Benefits
	- No foreign transaction fees
	- Trip delay insurance
	- Lost luggage insurance
	- Car rental loss and damage insurance

	## Merchant Acceptance
	- Accepted: Most merchants worldwide
	- Not Accepted: Costco warehouses (Costco.com works)
	- Not Accepted: Some small businesses

	## Redemption Options
	- Transfer to 20+ airline/hotel partners (1:1 ratio)
	- Pay with Points at Amazon (0.7 cents per point)
	- Statement credits (0.6 cents per point)
	- Book travel through Amex Travel (1 cent per point)

	## Best For
	- High grocery spending (up to $25k/year)
	- Frequent dining out
	- Travelers who value transfer partners

	## Limitations
	- $25,000 annual cap on 4x supermarket category
	- Amex not accepted everywhere
	- Annual fee not waived first year
	```

	---

	## Implementation

	### File: `rewards_rag_server.py`

	```python
	"""
	LlamaIndex RAG server for credit card benefits
	"""

	from llama_index.core import (
	VectorStoreIndex,
	SimpleDirectoryReader,
	StorageContext,
	ServiceContext,
	Settings
	)
	from llama_index.vector_stores.chroma import ChromaVectorStore
	from llama_index.embeddings.openai import OpenAIEmbedding
	from llama_index.llms.gemini import Gemini
	from llama_index.core.node_parser import SentenceSplitter
	import chromadb
	from fastapi import FastAPI, HTTPException
	from pydantic import BaseModel
	import os

	# Initialize FastAPI
	app = FastAPI(title="Rewards RAG MCP Server")

	# Configure LlamaIndex
	Settings.embed_model = OpenAIEmbedding(
	model="text-embedding-3-small",
	api_key=os.getenv("OPENAI_API_KEY")
	)
	Settings.llm = Gemini(
	model="models/gemini-2.0-flash-exp",
	api_key=os.getenv("GEMINI_API_KEY")
	)
	Settings.chunk_size = 512
	Settings.chunk_overlap = 50

	# Initialize ChromaDB
	chroma_client = chromadb.PersistentClient(path="./chroma_db")
	chroma_collection = chroma_client.get_or_create_collection("credit_cards")

	# Create vector store
	vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
	storage_context = StorageContext.from_defaults(vector_store=vector_store)

	---

	## Document Loading & Indexing

	def load_and_index_documents():
	"""Load card documents and create vector index"""

	# Load documents from directory
	documents = SimpleDirectoryReader(
	input_dir="./data",
	recursive=True,
	required_exts=[".pdf", ".md", ".txt"]
	).load_data()

	print(f"Loaded {len(documents)} documents")

	# Parse into nodes (chunks)
	node_parser = SentenceSplitter(
	chunk_size=512,
	chunk_overlap=50
	)
	nodes = node_parser.get_nodes_from_documents(documents)

	print(f"Created {len(nodes)} nodes")

	# Create index
	index = VectorStoreIndex(
	nodes=nodes,
	storage_context=storage_context
	)

	# Persist to disk
	index.storage_context.persist(persist_dir="./storage")

	return index

	# Load index on startup
	try:
	# Try loading existing index
	storage_context = StorageContext.from_defaults(
	vector_store=vector_store,
	persist_dir="./storage"
	)
	index = VectorStoreIndex.from_storage_context(storage_context)
	print("Loaded existing index")
	except:
	# Create new index
	print("Creating new index...")
	index = load_and_index_documents()

	# Create query engine
	query_engine = index.as_query_engine(
	similarity_top_k=5,
	response_mode="compact"
	)

	---

	## API Endpoints

	class QueryRequest(BaseModel):
	query: str
	card_name: str = None
	top_k: int = 5

	class QueryResponse(BaseModel):
	answer: str
	sources: list
	confidence: float

	@app.post("/query", response_model=QueryResponse)
	async def query_benefits(request: QueryRequest):
	"""
	Query credit card benefits

	Example:
	POST /query
	{
	"query": "Which card has best grocery rewards?",
	"top_k": 5
	}
	"""
	try:
	# Add card filter if specified
	if request.card_name:
	query = f"For {request.card_name}: {request.query}"
	else:
	query = request.query

	# Query the index
	response = query_engine.query(query)

	# Extract sources
	sources = []
	for node in response.source_nodes:
	sources.append({
	"card_name": node.metadata.get("file_name", "Unknown"),
	"content": node.text[:200] + "...",
	"relevance_score": float(node.score)
	})

	# Calculate confidence based on top score
	confidence = sources[0]["relevance_score"] if sources else 0.0

	return QueryResponse(
	answer=str(response),
	sources=sources,
	confidence=confidence
	)

	except Exception as e:
	raise HTTPException(status_code=500, detail=str(e))

	---

	## Advanced Query Techniques

	@app.post("/compare")
	async def compare_cards(request: dict):
	"""
	Compare multiple cards on specific criteria

	Example:
	POST /compare
	{
	"cards": ["Amex Gold", "Chase Sapphire Reserve"],
	"criteria": "travel benefits"
	}
	"""
	cards = request["cards"]
	criteria = request["criteria"]

	# Query each card
	comparisons = []
	for card in cards:
	query = f"What are the {criteria} for {card}?"
	response = query_engine.query(query)

	comparisons.append({
	"card": card,
	"benefits": str(response)
	})

	# Synthesize comparison
	synthesis_prompt = f"""
	Compare these cards on {criteria}:

	{comparisons}

	Provide a clear winner and reasoning.
	"""

	final_response = Settings.llm.complete(synthesis_prompt)

	return {
	"comparison": str(final_response),
	"details": comparisons
	}

	---

	## Metadata Filtering

	def add_metadata_to_documents():
	"""Add rich metadata for filtering"""

	documents = SimpleDirectoryReader("./data").load_data()

	for doc in documents:
	# Extract card name from filename
	card_name = doc.metadata["file_name"].replace(".md", "")

	# Add metadata
	doc.metadata.update({
	"card_name": card_name,
	"issuer": extract_issuer(card_name),
	"annual_fee": extract_annual_fee(doc.text),
	"category": extract_category(doc.text)
	})

	return documents

	# Query with filters
	@app.post("/query_filtered")
	async def query_with_filters(request: dict):
	"""
	Query with metadata filters

	Example:
	POST /query_filtered
	{
	"query": "best travel card",
	"filters": {
	"issuer": "Chase",
	"annual_fee": {"$lte": 500}
	}
	}
	"""
	from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter

	# Build filters
	filters = MetadataFilters(
	filters=[
	ExactMatchFilter(key="issuer", value=request["filters"]["issuer"])
	]
	)

	# Query with filters
	query_engine = index.as_query_engine(
	similarity_top_k=5,
	filters=filters
	)

	response = query_engine.query(request["query"])

	return {"answer": str(response)}

	---

	## Hybrid Search (Keyword + Semantic)

	from llama_index.core.retrievers import VectorIndexRetriever, BM25Retriever
	from llama_index.core.query_engine import RetrieverQueryEngine

	def create_hybrid_retriever():
	"""Combine vector search + keyword search"""

	# Vector retriever
	vector_retriever = VectorIndexRetriever(
	index=index,
	similarity_top_k=10
	)

	# BM25 keyword retriever
	bm25_retriever = BM25Retriever.from_defaults(
	docstore=index.docstore,
	similarity_top_k=10
	)

	# Combine retrievers
	from llama_index.core.retrievers import QueryFusionRetriever

	hybrid_retriever = QueryFusionRetriever(
	retrievers=[vector_retriever, bm25_retriever],
	similarity_top_k=5,
	num_queries=1
	)

	return RetrieverQueryEngine(retriever=hybrid_retriever)

	---

	## Reranking for Better Results

	from llama_index.postprocessor.cohere_rerank import CohereRerank

	def create_reranking_query_engine():
	"""Add reranking for improved relevance"""

	# Cohere reranker
	reranker = CohereRerank(
	api_key=os.getenv("COHERE_API_KEY"),
	top_n=3
	)

	query_engine = index.as_query_engine(
	similarity_top_k=10, # Retrieve more candidates
	node_postprocessors=[reranker] # Rerank to top 3
	)

	return query_engine

	---

	## Evaluation & Metrics

	from llama_index.core.evaluation import (
	RelevancyEvaluator,
	FaithfulnessEvaluator
	)

	async def evaluate_rag_quality():
	"""Evaluate RAG system quality"""

	# Test queries
	test_queries = [
	"Which card has best grocery rewards?",
	"Does Amex Gold work at Costco?",
	"What are Chase Sapphire Reserve travel benefits?"
	]

	# Ground truth answers
	ground_truth = [
	"Citi Custom Cash offers 5% on groceries...",
	"No, American Express is not accepted at Costco warehouses...",
	"Chase Sapphire Reserve includes Priority Pass..."
	]

	# Evaluators
	relevancy_evaluator = RelevancyEvaluator(llm=Settings.llm)
	faithfulness_evaluator = FaithfulnessEvaluator(llm=Settings.llm)

	results = []
	for query, truth in zip(test_queries, ground_truth):
	response = query_engine.query(query)

	# Evaluate relevancy
	relevancy_result = await relevancy_evaluator.aevaluate(
	query=query,
	response=str(response)
	)

	# Evaluate faithfulness
	faithfulness_result = await faithfulness_evaluator.aevaluate(
	query=query,
	response=str(response),
	contexts=[node.text for node in response.source_nodes]
	)

	results.append({
	"query": query,
	"relevancy_score": relevancy_result.score,
	"faithfulness_score": faithfulness_result.score
	})

	return results

	---

	## Deployment

	### 1. Build Docker Image

	File: `Dockerfile`
	```dockerfile
	FROM python:3.11-slim

	WORKDIR /app

	# Install dependencies
	COPY requirements.txt .
	RUN pip install --no-cache-dir -r requirements.txt

	# Copy application
	COPY . .

	# Download and index documents on build
	RUN python -c "from rewards_rag_server import load_and_index_documents; load_and_index_documents()"

	# Expose port
	EXPOSE 7860

	# Run server
	CMD ["uvicorn", "rewards_rag_server:app", "--host", "0.0.0.0", "--port", "7860"]
	```

	### 2. Deploy to Hugging Face Spaces

	```bash
	# Create Space
	huggingface-cli repo create rewardpilot-rewards-rag --type space --space_sdk docker

	# Push files
	git add .
	git commit -m "Deploy RAG server"
	git push
	```

	---

	## Performance Optimization

	### 1. Caching Embeddings

	```python
	from functools import lru_cache

	@lru_cache(maxsize=1000)
	def get_embedding(text: str):
	"""Cache embeddings for repeated queries"""
	return Settings.embed_model.get_text_embedding(text)
	```

	### 2. Batch Processing

	```python
	async def batch_query(queries: list):
	"""Process multiple queries in parallel"""
	import asyncio

	tasks = [query_engine.aquery(q) for q in queries]
	results = await asyncio.gather(*tasks)

	return results
	```

	### 3. Index Optimization

	```python
	# Use smaller embedding model for speed
	Settings.embed_model = OpenAIEmbedding(
	model="text-embedding-3-small", # 1536 dims
	# vs text-embedding-3-large (3072 dims)
	)

	# Reduce chunk size for faster retrieval
	Settings.chunk_size = 256 # vs 512
	```

	---

	## Monitoring

	```python
	import time
	from prometheus_client import Counter, Histogram

	# Metrics
	query_counter = Counter('rag_queries_total', 'Total RAG queries')
	query_duration = Histogram('rag_query_duration_seconds', 'RAG query duration')

	@app.post("/query")
	async def query_with_monitoring(request: QueryRequest):
	query_counter.inc()

	start_time = time.time()
	response = query_engine.query(request.query)
	duration = time.time() - start_time

	query_duration.observe(duration)

	return response
	```

	---

	Related Documentation:
	- [MCP Server Implementation](./mcp_architecture.md)
	- [Modal Deployment Guide](./modal_deployment.md)
	- [Agent Reasoning Flow](./agent_reasoning.md)
	```

	---