Spaces:

MCP-1st-Birthday
/

rewardpilot-web-ui

Running

App Files Files Community

sammy786 commited on 15 days ago

Commit

b7662d1

verified ·

1 Parent(s): c1f8982

Create llamaindex_setup.md

Browse files

Files changed (1) hide show

docs/llamaindex_setup.md +704 -0

docs/llamaindex_setup.md ADDED Viewed

	@@ -0,0 +1,704 @@

+```markdown
+# LlamaIndex RAG Setup Guide
+## Overview
+RewardPilot uses LlamaIndex to build a semantic search system over 50+ credit card benefit documents. This enables the agent to answer complex questions like "Which card has the best travel insurance?" or "Does Amex Gold work at Costco?"
+## Why LlamaIndex + RAG?
+| Problem | Traditional Approach | RAG Solution |
+|---------|---------------------|--------------|
+| **Card benefits change** | Hardcode rules → outdated | Dynamic document retrieval |
+| **Complex questions** | Manual lookup | Semantic search |
+| **50+ cards** | Impossible to memorize | Vector similarity |
+| **Nuanced rules** | Prone to errors | Context-aware answers |
+**Example:**
+- **Question:** "Can I use Chase Sapphire Reserve for airport lounge access when flying domestic?"
+- **Traditional:** Check 10+ pages of terms
+- **RAG:** Semantic search → "Yes, Priority Pass includes domestic lounges"
+---
+## Architecture
+```
+┌─────────────────────────────────────────────────────────┐
+│                    User Question                         │
+│   "Which card has best grocery rewards?"                │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│              Query Transformation                        │
+│         (Expand, rephrase, extract keywords)            │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│                 Embedding Model                          │
+│            OpenAI text-embedding-3-small                │
+│              (1536 dimensions)                           │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│                  Vector Store                            │
+│                   ChromaDB                               │
+│              (50+ card documents)                        │
+│              (10,000+ chunks)                            │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     │ Retrieve top-k (k=5)
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│              Retrieved Context                           │
+│   1. Amex Gold: 4x points on U.S. supermarkets...      │
+│   2. Citi Custom Cash: 5% on top category...           │
+│   3. Chase Freedom Flex: 5% rotating categories...     │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│                 Reranking                                │
+│         (Cohere Rerank or Cross-Encoder)                │
+└────────────────────┬────────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│                   LLM Synthesis                          │
+│              Gemini 2.0 Flash Exp                       │
+│         (Generate answer from context)                  │
+└────────────────────┬───────────────────��────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────────┐
+│                 Final Answer                             │
+│   "Amex Gold offers 4x points (best rate) but has      │
+│    $25k annual cap. Citi Custom Cash gives 5% but       │
+│    only $500/month. For high spenders, use Amex."      │
+└─────────────────────────────────────────────────────────┘
+```
+---
+## Setup
+### 1. Install Dependencies
+```bash
+pip install llama-index==0.12.5 \
+  llama-index-vector-stores-chroma==0.4.1 \
+  llama-index-embeddings-openai==0.3.1 \
+  llama-index-llms-gemini==0.4.2 \
+  chromadb==0.5.23 \
+  pypdf==5.1.0 \
+  beautifulsoup4==4.12.3
+```
+### 2. Prepare Card Documents
+Create directory structure:
+```
+data/
+├── cards/
+│   ├── amex_gold.pdf
+│   ├── chase_sapphire_reserve.pdf
+│   ├── citi_custom_cash.pdf
+│   └── ... (50+ cards)
+├── terms/
+│   ├── amex_terms.pdf
+│   ├── chase_terms.pdf
+│   └── ...
+└── guides/
+    ├── maximizing_rewards.md
+    ├── category_codes.md
+    └── ...
+```
+### 3. Document Sources
+#### Option A: Scrape from Issuer Websites
+```python
+# scrape_card_docs.py
+import requests
+from bs4 import BeautifulSoup
+import PyPDF2
+import os
+CARD_URLS = {
+    "amex_gold": "https://www.americanexpress.com/us/credit-cards/card/gold-card/",
+    "chase_sapphire_reserve": "https://creditcards.chase.com/rewards-credit-cards/sapphire/reserve",
+    # ... more cards
+}
+def scrape_card_benefits(url, output_file):
+    """Scrape card benefits from issuer website"""
+    response = requests.get(url)
+    soup = BeautifulSoup(response.text, 'html.parser')
+    # Extract benefits section
+    benefits = soup.find('div', class_='benefits-section')
+    # Save to markdown
+    with open(output_file, 'w') as f:
+        f.write(f"# {card_name}\n\n")
+        f.write(benefits.get_text())
+# Scrape all cards
+for card_name, url in CARD_URLS.items():
+    scrape_card_benefits(url, f"data/cards/{card_name}.md")
+```
+#### Option B: Manual Documentation
+Create markdown files:
+**File:** `data/cards/amex_gold.md`
+```markdown
+# American Express Gold Card
+## Overview
+- **Annual Fee:** $325
+- **Rewards Rate:** 4x points on dining & U.S. supermarkets (up to $25k/year)
+- **Welcome Bonus:** 90,000 points after $6k spend in 6 months
+## Earning Structure
+### 4x Points
+- Restaurants worldwide (including takeout & delivery)
+- U.S. supermarkets (up to $25,000 per year, then 1x)
+### 3x Points
+- Flights booked directly with airlines or on amextravel.com
+### 1x Points
+- All other purchases
+## Monthly Credits
+- $10 Uber Cash (Uber Eats eligible)
+- $10 Grubhub/Seamless/The Cheesecake Factory/select Shake Shack
+## Travel Benefits
+- No foreign transaction fees
+- Trip delay insurance
+- Lost luggage insurance
+- Car rental loss and damage insurance
+## Merchant Acceptance
+- **Accepted:** Most merchants worldwide
+- **Not Accepted:** Costco warehouses (Costco.com works)
+- **Not Accepted:** Some small businesses
+## Redemption Options
+- Transfer to 20+ airline/hotel partners (1:1 ratio)
+- Pay with Points at Amazon (0.7 cents per point)
+- Statement credits (0.6 cents per point)
+- Book travel through Amex Travel (1 cent per point)
+## Best For
+- High grocery spending (up to $25k/year)
+- Frequent dining out
+- Travelers who value transfer partners
+## Limitations
+- $25,000 annual cap on 4x supermarket category
+- Amex not accepted everywhere
+- Annual fee not waived first year
+```
+---
+## Implementation
+### File: `rewards_rag_server.py`
+```python
+"""
+LlamaIndex RAG server for credit card benefits
+"""
+from llama_index.core import (
+    VectorStoreIndex,
+    SimpleDirectoryReader,
+    StorageContext,
+    ServiceContext,
+    Settings
+)
+from llama_index.vector_stores.chroma import ChromaVectorStore
+from llama_index.embeddings.openai import OpenAIEmbedding
+from llama_index.llms.gemini import Gemini
+from llama_index.core.node_parser import SentenceSplitter
+import chromadb
+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+import os
+# Initialize FastAPI
+app = FastAPI(title="Rewards RAG MCP Server")
+# Configure LlamaIndex
+Settings.embed_model = OpenAIEmbedding(
+    model="text-embedding-3-small",
+    api_key=os.getenv("OPENAI_API_KEY")
+)
+Settings.llm = Gemini(
+    model="models/gemini-2.0-flash-exp",
+    api_key=os.getenv("GEMINI_API_KEY")
+)
+Settings.chunk_size = 512
+Settings.chunk_overlap = 50
+# Initialize ChromaDB
+chroma_client = chromadb.PersistentClient(path="./chroma_db")
+chroma_collection = chroma_client.get_or_create_collection("credit_cards")
+# Create vector store
+vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
+storage_context = StorageContext.from_defaults(vector_store=vector_store)
+---
+## Document Loading & Indexing
+def load_and_index_documents():
+    """Load card documents and create vector index"""
+    # Load documents from directory
+    documents = SimpleDirectoryReader(
+        input_dir="./data",
+        recursive=True,
+        required_exts=[".pdf", ".md", ".txt"]
+    ).load_data()
+    print(f"Loaded {len(documents)} documents")
+    # Parse into nodes (chunks)
+    node_parser = SentenceSplitter(
+        chunk_size=512,
+        chunk_overlap=50
+    )
+    nodes = node_parser.get_nodes_from_documents(documents)
+    print(f"Created {len(nodes)} nodes")
+    # Create index
+    index = VectorStoreIndex(
+        nodes=nodes,
+        storage_context=storage_context
+    )
+    # Persist to disk
+    index.storage_context.persist(persist_dir="./storage")
+    return index
+# Load index on startup
+try:
+    # Try loading existing index
+    storage_context = StorageContext.from_defaults(
+        vector_store=vector_store,
+        persist_dir="./storage"
+    )
+    index = VectorStoreIndex.from_storage_context(storage_context)
+    print("Loaded existing index")
+except:
+    # Create new index
+    print("Creating new index...")
+    index = load_and_index_documents()
+# Create query engine
+query_engine = index.as_query_engine(
+    similarity_top_k=5,
+    response_mode="compact"
+)
+---
+## API Endpoints
+class QueryRequest(BaseModel):
+    query: str
+    card_name: str = None
+    top_k: int = 5
+class QueryResponse(BaseModel):
+    answer: str
+    sources: list
+    confidence: float
+@app.post("/query", response_model=QueryResponse)
+async def query_benefits(request: QueryRequest):
+    """
+    Query credit card benefits
+    Example:
+    POST /query
+    {
+        "query": "Which card has best grocery rewards?",
+        "top_k": 5
+    }
+    """
+    try:
+        # Add card filter if specified
+        if request.card_name:
+            query = f"For {request.card_name}: {request.query}"
+        else:
+            query = request.query
+        # Query the index
+        response = query_engine.query(query)
+        # Extract sources
+        sources = []
+        for node in response.source_nodes:
+            sources.append({
+                "card_name": node.metadata.get("file_name", "Unknown"),
+                "content": node.text[:200] + "...",
+                "relevance_score": float(node.score)
+            })
+        # Calculate confidence based on top score
+        confidence = sources[0]["relevance_score"] if sources else 0.0
+        return QueryResponse(
+            answer=str(response),
+            sources=sources,
+            confidence=confidence
+        )
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+---
+## Advanced Query Techniques
+@app.post("/compare")
+async def compare_cards(request: dict):
+    """
+    Compare multiple cards on specific criteria
+    Example:
+    POST /compare
+    {
+        "cards": ["Amex Gold", "Chase Sapphire Reserve"],
+        "criteria": "travel benefits"
+    }
+    """
+    cards = request["cards"]
+    criteria = request["criteria"]
+    # Query each card
+    comparisons = []
+    for card in cards:
+        query = f"What are the {criteria} for {card}?"
+        response = query_engine.query(query)
+        comparisons.append({
+            "card": card,
+            "benefits": str(response)
+        })
+    # Synthesize comparison
+    synthesis_prompt = f"""
+    Compare these cards on {criteria}:
+    {comparisons}
+    Provide a clear winner and reasoning.
+    """
+    final_response = Settings.llm.complete(synthesis_prompt)
+    return {
+        "comparison": str(final_response),
+        "details": comparisons
+    }
+---
+## Metadata Filtering
+def add_metadata_to_documents():
+    """Add rich metadata for filtering"""
+    documents = SimpleDirectoryReader("./data").load_data()
+    for doc in documents:
+        # Extract card name from filename
+        card_name = doc.metadata["file_name"].replace(".md", "")
+        # Add metadata
+        doc.metadata.update({
+            "card_name": card_name,
+            "issuer": extract_issuer(card_name),
+            "annual_fee": extract_annual_fee(doc.text),
+            "category": extract_category(doc.text)
+        })
+    return documents
+# Query with filters
+@app.post("/query_filtered")
+async def query_with_filters(request: dict):
+    """
+    Query with metadata filters
+    Example:
+    POST /query_filtered
+    {
+        "query": "best travel card",
+        "filters": {
+            "issuer": "Chase",
+            "annual_fee": {"$lte": 500}
+        }
+    }
+    """
+    from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
+    # Build filters
+    filters = MetadataFilters(
+        filters=[
+            ExactMatchFilter(key="issuer", value=request["filters"]["issuer"])
+        ]
+    )
+    # Query with filters
+    query_engine = index.as_query_engine(
+        similarity_top_k=5,
+        filters=filters
+    )
+    response = query_engine.query(request["query"])
+    return {"answer": str(response)}
+---
+## Hybrid Search (Keyword + Semantic)
+from llama_index.core.retrievers import VectorIndexRetriever, BM25Retriever
+from llama_index.core.query_engine import RetrieverQueryEngine
+def create_hybrid_retriever():
+    """Combine vector search + keyword search"""
+    # Vector retriever
+    vector_retriever = VectorIndexRetriever(
+        index=index,
+        similarity_top_k=10
+    )
+    # BM25 keyword retriever
+    bm25_retriever = BM25Retriever.from_defaults(
+        docstore=index.docstore,
+        similarity_top_k=10
+    )
+    # Combine retrievers
+    from llama_index.core.retrievers import QueryFusionRetriever
+    hybrid_retriever = QueryFusionRetriever(
+        retrievers=[vector_retriever, bm25_retriever],
+        similarity_top_k=5,
+        num_queries=1
+    )
+    return RetrieverQueryEngine(retriever=hybrid_retriever)
+---
+## Reranking for Better Results
+from llama_index.postprocessor.cohere_rerank import CohereRerank
+def create_reranking_query_engine():
+    """Add reranking for improved relevance"""
+    # Cohere reranker
+    reranker = CohereRerank(
+        api_key=os.getenv("COHERE_API_KEY"),
+        top_n=3
+    )
+    query_engine = index.as_query_engine(
+        similarity_top_k=10,  # Retrieve more candidates
+        node_postprocessors=[reranker]  # Rerank to top 3
+    )
+    return query_engine
+---
+## Evaluation & Metrics
+from llama_index.core.evaluation import (
+    RelevancyEvaluator,
+    FaithfulnessEvaluator
+)
+async def evaluate_rag_quality():
+    """Evaluate RAG system quality"""
+    # Test queries
+    test_queries = [
+        "Which card has best grocery rewards?",
+        "Does Amex Gold work at Costco?",
+        "What are Chase Sapphire Reserve travel benefits?"
+    ]
+    # Ground truth answers
+    ground_truth = [
+        "Citi Custom Cash offers 5% on groceries...",
+        "No, American Express is not accepted at Costco warehouses...",
+        "Chase Sapphire Reserve includes Priority Pass..."
+    ]
+    # Evaluators
+    relevancy_evaluator = RelevancyEvaluator(llm=Settings.llm)
+    faithfulness_evaluator = FaithfulnessEvaluator(llm=Settings.llm)
+    results = []
+    for query, truth in zip(test_queries, ground_truth):
+        response = query_engine.query(query)
+        # Evaluate relevancy
+        relevancy_result = await relevancy_evaluator.aevaluate(
+            query=query,
+            response=str(response)
+        )
+        # Evaluate faithfulness
+        faithfulness_result = await faithfulness_evaluator.aevaluate(
+            query=query,
+            response=str(response),
+            contexts=[node.text for node in response.source_nodes]
+        )
+        results.append({
+            "query": query,
+            "relevancy_score": relevancy_result.score,
+            "faithfulness_score": faithfulness_result.score
+        })
+    return results
+---
+## Deployment
+### 1. Build Docker Image
+**File:** `Dockerfile`
+```dockerfile
+FROM python:3.11-slim
+WORKDIR /app
+# Install dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application
+COPY . .
+# Download and index documents on build
+RUN python -c "from rewards_rag_server import load_and_index_documents; load_and_index_documents()"
+# Expose port
+EXPOSE 7860
+# Run server
+CMD ["uvicorn", "rewards_rag_server:app", "--host", "0.0.0.0", "--port", "7860"]
+```
+### 2. Deploy to Hugging Face Spaces
+```bash
+# Create Space
+huggingface-cli repo create rewardpilot-rewards-rag --type space --space_sdk docker
+# Push files
+git add .
+git commit -m "Deploy RAG server"
+git push
+```
+---
+## Performance Optimization
+### 1. Caching Embeddings
+```python
+from functools import lru_cache
+@lru_cache(maxsize=1000)
+def get_embedding(text: str):
+    """Cache embeddings for repeated queries"""
+    return Settings.embed_model.get_text_embedding(text)
+```
+### 2. Batch Processing
+```python
+async def batch_query(queries: list):
+    """Process multiple queries in parallel"""
+    import asyncio
+    tasks = [query_engine.aquery(q) for q in queries]
+    results = await asyncio.gather(*tasks)
+    return results
+```
+### 3. Index Optimization
+```python
+# Use smaller embedding model for speed
+Settings.embed_model = OpenAIEmbedding(
+    model="text-embedding-3-small",  # 1536 dims
+    # vs text-embedding-3-large (3072 dims)
+)
+# Reduce chunk size for faster retrieval
+Settings.chunk_size = 256  # vs 512
+```
+---
+## Monitoring
+```python
+import time
+from prometheus_client import Counter, Histogram
+# Metrics
+query_counter = Counter('rag_queries_total', 'Total RAG queries')
+query_duration = Histogram('rag_query_duration_seconds', 'RAG query duration')
+@app.post("/query")
+async def query_with_monitoring(request: QueryRequest):
+    query_counter.inc()
+    start_time = time.time()
+    response = query_engine.query(request.query)
+    duration = time.time() - start_time
+    query_duration.observe(duration)
+    return response
+```
+---
+**Related Documentation:**
+- [MCP Server Implementation](./mcp_architecture.md)
+- [Modal Deployment Guide](./modal_deployment.md)
+- [Agent Reasoning Flow](./agent_reasoning.md)
+```
+---