| ```markdown | |
| # LlamaIndex RAG Setup Guide | |
| ## Overview | |
| RewardPilot uses LlamaIndex to build a semantic search system over 50+ credit card benefit documents. This enables the agent to answer complex questions like "Which card has the best travel insurance?" or "Does Amex Gold work at Costco?" | |
| ## Why LlamaIndex + RAG? | |
| | Problem | Traditional Approach | RAG Solution | | |
| |---------|---------------------|--------------| | |
| | **Card benefits change** | Hardcode rules β outdated | Dynamic document retrieval | | |
| | **Complex questions** | Manual lookup | Semantic search | | |
| | **50+ cards** | Impossible to memorize | Vector similarity | | |
| | **Nuanced rules** | Prone to errors | Context-aware answers | | |
| **Example:** | |
| - **Question:** "Can I use Chase Sapphire Reserve for airport lounge access when flying domestic?" | |
| - **Traditional:** Check 10+ pages of terms | |
| - **RAG:** Semantic search β "Yes, Priority Pass includes domestic lounges" | |
| --- | |
| ## Architecture | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β User Question β | |
| β "Which card has best grocery rewards?" β | |
| ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Query Transformation β | |
| β (Expand, rephrase, extract keywords) β | |
| ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Embedding Model β | |
| β OpenAI text-embedding-3-small β | |
| β (1536 dimensions) β | |
| ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Vector Store β | |
| β ChromaDB β | |
| β (50+ card documents) β | |
| β (10,000+ chunks) β | |
| ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ | |
| β | |
| β Retrieve top-k (k=5) | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Retrieved Context β | |
| β 1. Amex Gold: 4x points on U.S. supermarkets... β | |
| β 2. Citi Custom Cash: 5% on top category... β | |
| β 3. Chase Freedom Flex: 5% rotating categories... β | |
| ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Reranking β | |
| β (Cohere Rerank or Cross-Encoder) β | |
| ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β LLM Synthesis β | |
| β Gemini 2.0 Flash Exp β | |
| β (Generate answer from context) β | |
| ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Final Answer β | |
| β "Amex Gold offers 4x points (best rate) but has β | |
| β $25k annual cap. Citi Custom Cash gives 5% but β | |
| β only $500/month. For high spenders, use Amex." β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## Setup | |
| ### 1. Install Dependencies | |
| ```bash | |
| pip install llama-index==0.12.5 \ | |
| llama-index-vector-stores-chroma==0.4.1 \ | |
| llama-index-embeddings-openai==0.3.1 \ | |
| llama-index-llms-gemini==0.4.2 \ | |
| chromadb==0.5.23 \ | |
| pypdf==5.1.0 \ | |
| beautifulsoup4==4.12.3 | |
| ``` | |
| ### 2. Prepare Card Documents | |
| Create directory structure: | |
| ``` | |
| data/ | |
| βββ cards/ | |
| β βββ amex_gold.pdf | |
| β βββ chase_sapphire_reserve.pdf | |
| β βββ citi_custom_cash.pdf | |
| β βββ ... (50+ cards) | |
| βββ terms/ | |
| β βββ amex_terms.pdf | |
| β βββ chase_terms.pdf | |
| β βββ ... | |
| βββ guides/ | |
| βββ maximizing_rewards.md | |
| βββ category_codes.md | |
| βββ ... | |
| ``` | |
| ### 3. Document Sources | |
| #### Option A: Scrape from Issuer Websites | |
| ```python | |
| # scrape_card_docs.py | |
| import requests | |
| from bs4 import BeautifulSoup | |
| import PyPDF2 | |
| import os | |
| CARD_URLS = { | |
| "amex_gold": "https://www.americanexpress.com/us/credit-cards/card/gold-card/", | |
| "chase_sapphire_reserve": "https://creditcards.chase.com/rewards-credit-cards/sapphire/reserve", | |
| # ... more cards | |
| } | |
| def scrape_card_benefits(url, output_file): | |
| """Scrape card benefits from issuer website""" | |
| response = requests.get(url) | |
| soup = BeautifulSoup(response.text, 'html.parser') | |
| # Extract benefits section | |
| benefits = soup.find('div', class_='benefits-section') | |
| # Save to markdown | |
| with open(output_file, 'w') as f: | |
| f.write(f"# {card_name}\n\n") | |
| f.write(benefits.get_text()) | |
| # Scrape all cards | |
| for card_name, url in CARD_URLS.items(): | |
| scrape_card_benefits(url, f"data/cards/{card_name}.md") | |
| ``` | |
| #### Option B: Manual Documentation | |
| Create markdown files: | |
| **File:** `data/cards/amex_gold.md` | |
| ```markdown | |
| # American Express Gold Card | |
| ## Overview | |
| - **Annual Fee:** $325 | |
| - **Rewards Rate:** 4x points on dining & U.S. supermarkets (up to $25k/year) | |
| - **Welcome Bonus:** 90,000 points after $6k spend in 6 months | |
| ## Earning Structure | |
| ### 4x Points | |
| - Restaurants worldwide (including takeout & delivery) | |
| - U.S. supermarkets (up to $25,000 per year, then 1x) | |
| ### 3x Points | |
| - Flights booked directly with airlines or on amextravel.com | |
| ### 1x Points | |
| - All other purchases | |
| ## Monthly Credits | |
| - $10 Uber Cash (Uber Eats eligible) | |
| - $10 Grubhub/Seamless/The Cheesecake Factory/select Shake Shack | |
| ## Travel Benefits | |
| - No foreign transaction fees | |
| - Trip delay insurance | |
| - Lost luggage insurance | |
| - Car rental loss and damage insurance | |
| ## Merchant Acceptance | |
| - **Accepted:** Most merchants worldwide | |
| - **Not Accepted:** Costco warehouses (Costco.com works) | |
| - **Not Accepted:** Some small businesses | |
| ## Redemption Options | |
| - Transfer to 20+ airline/hotel partners (1:1 ratio) | |
| - Pay with Points at Amazon (0.7 cents per point) | |
| - Statement credits (0.6 cents per point) | |
| - Book travel through Amex Travel (1 cent per point) | |
| ## Best For | |
| - High grocery spending (up to $25k/year) | |
| - Frequent dining out | |
| - Travelers who value transfer partners | |
| ## Limitations | |
| - $25,000 annual cap on 4x supermarket category | |
| - Amex not accepted everywhere | |
| - Annual fee not waived first year | |
| ``` | |
| --- | |
| ## Implementation | |
| ### File: `rewards_rag_server.py` | |
| ```python | |
| """ | |
| LlamaIndex RAG server for credit card benefits | |
| """ | |
| from llama_index.core import ( | |
| VectorStoreIndex, | |
| SimpleDirectoryReader, | |
| StorageContext, | |
| ServiceContext, | |
| Settings | |
| ) | |
| from llama_index.vector_stores.chroma import ChromaVectorStore | |
| from llama_index.embeddings.openai import OpenAIEmbedding | |
| from llama_index.llms.gemini import Gemini | |
| from llama_index.core.node_parser import SentenceSplitter | |
| import chromadb | |
| from fastapi import FastAPI, HTTPException | |
| from pydantic import BaseModel | |
| import os | |
| # Initialize FastAPI | |
| app = FastAPI(title="Rewards RAG MCP Server") | |
| # Configure LlamaIndex | |
| Settings.embed_model = OpenAIEmbedding( | |
| model="text-embedding-3-small", | |
| api_key=os.getenv("OPENAI_API_KEY") | |
| ) | |
| Settings.llm = Gemini( | |
| model="models/gemini-2.0-flash-exp", | |
| api_key=os.getenv("GEMINI_API_KEY") | |
| ) | |
| Settings.chunk_size = 512 | |
| Settings.chunk_overlap = 50 | |
| # Initialize ChromaDB | |
| chroma_client = chromadb.PersistentClient(path="./chroma_db") | |
| chroma_collection = chroma_client.get_or_create_collection("credit_cards") | |
| # Create vector store | |
| vector_store = ChromaVectorStore(chroma_collection=chroma_collection) | |
| storage_context = StorageContext.from_defaults(vector_store=vector_store) | |
| --- | |
| ## Document Loading & Indexing | |
| def load_and_index_documents(): | |
| """Load card documents and create vector index""" | |
| # Load documents from directory | |
| documents = SimpleDirectoryReader( | |
| input_dir="./data", | |
| recursive=True, | |
| required_exts=[".pdf", ".md", ".txt"] | |
| ).load_data() | |
| print(f"Loaded {len(documents)} documents") | |
| # Parse into nodes (chunks) | |
| node_parser = SentenceSplitter( | |
| chunk_size=512, | |
| chunk_overlap=50 | |
| ) | |
| nodes = node_parser.get_nodes_from_documents(documents) | |
| print(f"Created {len(nodes)} nodes") | |
| # Create index | |
| index = VectorStoreIndex( | |
| nodes=nodes, | |
| storage_context=storage_context | |
| ) | |
| # Persist to disk | |
| index.storage_context.persist(persist_dir="./storage") | |
| return index | |
| # Load index on startup | |
| try: | |
| # Try loading existing index | |
| storage_context = StorageContext.from_defaults( | |
| vector_store=vector_store, | |
| persist_dir="./storage" | |
| ) | |
| index = VectorStoreIndex.from_storage_context(storage_context) | |
| print("Loaded existing index") | |
| except: | |
| # Create new index | |
| print("Creating new index...") | |
| index = load_and_index_documents() | |
| # Create query engine | |
| query_engine = index.as_query_engine( | |
| similarity_top_k=5, | |
| response_mode="compact" | |
| ) | |
| --- | |
| ## API Endpoints | |
| class QueryRequest(BaseModel): | |
| query: str | |
| card_name: str = None | |
| top_k: int = 5 | |
| class QueryResponse(BaseModel): | |
| answer: str | |
| sources: list | |
| confidence: float | |
| @app.post("/query", response_model=QueryResponse) | |
| async def query_benefits(request: QueryRequest): | |
| """ | |
| Query credit card benefits | |
| Example: | |
| POST /query | |
| { | |
| "query": "Which card has best grocery rewards?", | |
| "top_k": 5 | |
| } | |
| """ | |
| try: | |
| # Add card filter if specified | |
| if request.card_name: | |
| query = f"For {request.card_name}: {request.query}" | |
| else: | |
| query = request.query | |
| # Query the index | |
| response = query_engine.query(query) | |
| # Extract sources | |
| sources = [] | |
| for node in response.source_nodes: | |
| sources.append({ | |
| "card_name": node.metadata.get("file_name", "Unknown"), | |
| "content": node.text[:200] + "...", | |
| "relevance_score": float(node.score) | |
| }) | |
| # Calculate confidence based on top score | |
| confidence = sources[0]["relevance_score"] if sources else 0.0 | |
| return QueryResponse( | |
| answer=str(response), | |
| sources=sources, | |
| confidence=confidence | |
| ) | |
| except Exception as e: | |
| raise HTTPException(status_code=500, detail=str(e)) | |
| --- | |
| ## Advanced Query Techniques | |
| @app.post("/compare") | |
| async def compare_cards(request: dict): | |
| """ | |
| Compare multiple cards on specific criteria | |
| Example: | |
| POST /compare | |
| { | |
| "cards": ["Amex Gold", "Chase Sapphire Reserve"], | |
| "criteria": "travel benefits" | |
| } | |
| """ | |
| cards = request["cards"] | |
| criteria = request["criteria"] | |
| # Query each card | |
| comparisons = [] | |
| for card in cards: | |
| query = f"What are the {criteria} for {card}?" | |
| response = query_engine.query(query) | |
| comparisons.append({ | |
| "card": card, | |
| "benefits": str(response) | |
| }) | |
| # Synthesize comparison | |
| synthesis_prompt = f""" | |
| Compare these cards on {criteria}: | |
| {comparisons} | |
| Provide a clear winner and reasoning. | |
| """ | |
| final_response = Settings.llm.complete(synthesis_prompt) | |
| return { | |
| "comparison": str(final_response), | |
| "details": comparisons | |
| } | |
| --- | |
| ## Metadata Filtering | |
| def add_metadata_to_documents(): | |
| """Add rich metadata for filtering""" | |
| documents = SimpleDirectoryReader("./data").load_data() | |
| for doc in documents: | |
| # Extract card name from filename | |
| card_name = doc.metadata["file_name"].replace(".md", "") | |
| # Add metadata | |
| doc.metadata.update({ | |
| "card_name": card_name, | |
| "issuer": extract_issuer(card_name), | |
| "annual_fee": extract_annual_fee(doc.text), | |
| "category": extract_category(doc.text) | |
| }) | |
| return documents | |
| # Query with filters | |
| @app.post("/query_filtered") | |
| async def query_with_filters(request: dict): | |
| """ | |
| Query with metadata filters | |
| Example: | |
| POST /query_filtered | |
| { | |
| "query": "best travel card", | |
| "filters": { | |
| "issuer": "Chase", | |
| "annual_fee": {"$lte": 500} | |
| } | |
| } | |
| """ | |
| from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter | |
| # Build filters | |
| filters = MetadataFilters( | |
| filters=[ | |
| ExactMatchFilter(key="issuer", value=request["filters"]["issuer"]) | |
| ] | |
| ) | |
| # Query with filters | |
| query_engine = index.as_query_engine( | |
| similarity_top_k=5, | |
| filters=filters | |
| ) | |
| response = query_engine.query(request["query"]) | |
| return {"answer": str(response)} | |
| --- | |
| ## Hybrid Search (Keyword + Semantic) | |
| from llama_index.core.retrievers import VectorIndexRetriever, BM25Retriever | |
| from llama_index.core.query_engine import RetrieverQueryEngine | |
| def create_hybrid_retriever(): | |
| """Combine vector search + keyword search""" | |
| # Vector retriever | |
| vector_retriever = VectorIndexRetriever( | |
| index=index, | |
| similarity_top_k=10 | |
| ) | |
| # BM25 keyword retriever | |
| bm25_retriever = BM25Retriever.from_defaults( | |
| docstore=index.docstore, | |
| similarity_top_k=10 | |
| ) | |
| # Combine retrievers | |
| from llama_index.core.retrievers import QueryFusionRetriever | |
| hybrid_retriever = QueryFusionRetriever( | |
| retrievers=[vector_retriever, bm25_retriever], | |
| similarity_top_k=5, | |
| num_queries=1 | |
| ) | |
| return RetrieverQueryEngine(retriever=hybrid_retriever) | |
| --- | |
| ## Reranking for Better Results | |
| from llama_index.postprocessor.cohere_rerank import CohereRerank | |
| def create_reranking_query_engine(): | |
| """Add reranking for improved relevance""" | |
| # Cohere reranker | |
| reranker = CohereRerank( | |
| api_key=os.getenv("COHERE_API_KEY"), | |
| top_n=3 | |
| ) | |
| query_engine = index.as_query_engine( | |
| similarity_top_k=10, # Retrieve more candidates | |
| node_postprocessors=[reranker] # Rerank to top 3 | |
| ) | |
| return query_engine | |
| --- | |
| ## Evaluation & Metrics | |
| from llama_index.core.evaluation import ( | |
| RelevancyEvaluator, | |
| FaithfulnessEvaluator | |
| ) | |
| async def evaluate_rag_quality(): | |
| """Evaluate RAG system quality""" | |
| # Test queries | |
| test_queries = [ | |
| "Which card has best grocery rewards?", | |
| "Does Amex Gold work at Costco?", | |
| "What are Chase Sapphire Reserve travel benefits?" | |
| ] | |
| # Ground truth answers | |
| ground_truth = [ | |
| "Citi Custom Cash offers 5% on groceries...", | |
| "No, American Express is not accepted at Costco warehouses...", | |
| "Chase Sapphire Reserve includes Priority Pass..." | |
| ] | |
| # Evaluators | |
| relevancy_evaluator = RelevancyEvaluator(llm=Settings.llm) | |
| faithfulness_evaluator = FaithfulnessEvaluator(llm=Settings.llm) | |
| results = [] | |
| for query, truth in zip(test_queries, ground_truth): | |
| response = query_engine.query(query) | |
| # Evaluate relevancy | |
| relevancy_result = await relevancy_evaluator.aevaluate( | |
| query=query, | |
| response=str(response) | |
| ) | |
| # Evaluate faithfulness | |
| faithfulness_result = await faithfulness_evaluator.aevaluate( | |
| query=query, | |
| response=str(response), | |
| contexts=[node.text for node in response.source_nodes] | |
| ) | |
| results.append({ | |
| "query": query, | |
| "relevancy_score": relevancy_result.score, | |
| "faithfulness_score": faithfulness_result.score | |
| }) | |
| return results | |
| --- | |
| ## Deployment | |
| ### 1. Build Docker Image | |
| **File:** `Dockerfile` | |
| ```dockerfile | |
| FROM python:3.11-slim | |
| WORKDIR /app | |
| # Install dependencies | |
| COPY requirements.txt . | |
| RUN pip install --no-cache-dir -r requirements.txt | |
| # Copy application | |
| COPY . . | |
| # Download and index documents on build | |
| RUN python -c "from rewards_rag_server import load_and_index_documents; load_and_index_documents()" | |
| # Expose port | |
| EXPOSE 7860 | |
| # Run server | |
| CMD ["uvicorn", "rewards_rag_server:app", "--host", "0.0.0.0", "--port", "7860"] | |
| ``` | |
| ### 2. Deploy to Hugging Face Spaces | |
| ```bash | |
| # Create Space | |
| huggingface-cli repo create rewardpilot-rewards-rag --type space --space_sdk docker | |
| # Push files | |
| git add . | |
| git commit -m "Deploy RAG server" | |
| git push | |
| ``` | |
| --- | |
| ## Performance Optimization | |
| ### 1. Caching Embeddings | |
| ```python | |
| from functools import lru_cache | |
| @lru_cache(maxsize=1000) | |
| def get_embedding(text: str): | |
| """Cache embeddings for repeated queries""" | |
| return Settings.embed_model.get_text_embedding(text) | |
| ``` | |
| ### 2. Batch Processing | |
| ```python | |
| async def batch_query(queries: list): | |
| """Process multiple queries in parallel""" | |
| import asyncio | |
| tasks = [query_engine.aquery(q) for q in queries] | |
| results = await asyncio.gather(*tasks) | |
| return results | |
| ``` | |
| ### 3. Index Optimization | |
| ```python | |
| # Use smaller embedding model for speed | |
| Settings.embed_model = OpenAIEmbedding( | |
| model="text-embedding-3-small", # 1536 dims | |
| # vs text-embedding-3-large (3072 dims) | |
| ) | |
| # Reduce chunk size for faster retrieval | |
| Settings.chunk_size = 256 # vs 512 | |
| ``` | |
| --- | |
| ## Monitoring | |
| ```python | |
| import time | |
| from prometheus_client import Counter, Histogram | |
| # Metrics | |
| query_counter = Counter('rag_queries_total', 'Total RAG queries') | |
| query_duration = Histogram('rag_query_duration_seconds', 'RAG query duration') | |
| @app.post("/query") | |
| async def query_with_monitoring(request: QueryRequest): | |
| query_counter.inc() | |
| start_time = time.time() | |
| response = query_engine.query(request.query) | |
| duration = time.time() - start_time | |
| query_duration.observe(duration) | |
| return response | |
| ``` | |
| --- | |
| **Related Documentation:** | |
| - [MCP Server Implementation](./mcp_architecture.md) | |
| - [Modal Deployment Guide](./modal_deployment.md) | |
| - [Agent Reasoning Flow](./agent_reasoning.md) | |
| ``` | |
| --- | |