sammy786 commited on
Commit
b7662d1
·
verified ·
1 Parent(s): c1f8982

Create llamaindex_setup.md

Browse files
Files changed (1) hide show
  1. docs/llamaindex_setup.md +704 -0
docs/llamaindex_setup.md ADDED
@@ -0,0 +1,704 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ```markdown
2
+ # LlamaIndex RAG Setup Guide
3
+
4
+ ## Overview
5
+
6
+ RewardPilot uses LlamaIndex to build a semantic search system over 50+ credit card benefit documents. This enables the agent to answer complex questions like "Which card has the best travel insurance?" or "Does Amex Gold work at Costco?"
7
+
8
+ ## Why LlamaIndex + RAG?
9
+
10
+ | Problem | Traditional Approach | RAG Solution |
11
+ |---------|---------------------|--------------|
12
+ | **Card benefits change** | Hardcode rules → outdated | Dynamic document retrieval |
13
+ | **Complex questions** | Manual lookup | Semantic search |
14
+ | **50+ cards** | Impossible to memorize | Vector similarity |
15
+ | **Nuanced rules** | Prone to errors | Context-aware answers |
16
+
17
+ **Example:**
18
+ - **Question:** "Can I use Chase Sapphire Reserve for airport lounge access when flying domestic?"
19
+ - **Traditional:** Check 10+ pages of terms
20
+ - **RAG:** Semantic search → "Yes, Priority Pass includes domestic lounges"
21
+
22
+ ---
23
+
24
+ ## Architecture
25
+
26
+ ```
27
+ ┌─────────────────────────────────────────────────────────┐
28
+ │ User Question │
29
+ │ "Which card has best grocery rewards?" │
30
+ └────────────────────┬────────────────────────────────────┘
31
+
32
+
33
+ ┌─────────────────────────────────────────────────────────┐
34
+ │ Query Transformation │
35
+ │ (Expand, rephrase, extract keywords) │
36
+ └────────────────────┬────────────────────────────────────┘
37
+
38
+
39
+ ┌─────────────────────────────────────────────────────────┐
40
+ │ Embedding Model │
41
+ │ OpenAI text-embedding-3-small │
42
+ │ (1536 dimensions) │
43
+ └────────────────────┬────────────────────────────────────┘
44
+
45
+
46
+ ┌─────────────────────────────────────────────────────────┐
47
+ │ Vector Store │
48
+ │ ChromaDB │
49
+ │ (50+ card documents) │
50
+ │ (10,000+ chunks) │
51
+ └────────────────────┬────────────────────────────────────┘
52
+
53
+ │ Retrieve top-k (k=5)
54
+
55
+ ┌─────────────────────────────────────────────────────────┐
56
+ │ Retrieved Context │
57
+ │ 1. Amex Gold: 4x points on U.S. supermarkets... │
58
+ │ 2. Citi Custom Cash: 5% on top category... │
59
+ │ 3. Chase Freedom Flex: 5% rotating categories... │
60
+ └────────────────────┬────────────────────────────────────┘
61
+
62
+
63
+ ┌─────────────────────────────────────────────────────────┐
64
+ │ Reranking │
65
+ │ (Cohere Rerank or Cross-Encoder) │
66
+ └────────────────────┬────────────────────────────────────┘
67
+
68
+
69
+ ┌─────────────────────────────────────────────────────────┐
70
+ │ LLM Synthesis │
71
+ │ Gemini 2.0 Flash Exp │
72
+ │ (Generate answer from context) │
73
+ └────────────────────┬───────────────────��────────────────┘
74
+
75
+
76
+ ┌─────────────────────────────────────────────────────────┐
77
+ │ Final Answer │
78
+ │ "Amex Gold offers 4x points (best rate) but has │
79
+ │ $25k annual cap. Citi Custom Cash gives 5% but │
80
+ │ only $500/month. For high spenders, use Amex." │
81
+ └─────────────────────────────────────────────────────────┘
82
+ ```
83
+
84
+ ---
85
+
86
+ ## Setup
87
+
88
+ ### 1. Install Dependencies
89
+
90
+ ```bash
91
+ pip install llama-index==0.12.5 \
92
+ llama-index-vector-stores-chroma==0.4.1 \
93
+ llama-index-embeddings-openai==0.3.1 \
94
+ llama-index-llms-gemini==0.4.2 \
95
+ chromadb==0.5.23 \
96
+ pypdf==5.1.0 \
97
+ beautifulsoup4==4.12.3
98
+ ```
99
+
100
+ ### 2. Prepare Card Documents
101
+
102
+ Create directory structure:
103
+ ```
104
+ data/
105
+ ├── cards/
106
+ │ ├── amex_gold.pdf
107
+ │ ├── chase_sapphire_reserve.pdf
108
+ │ ├── citi_custom_cash.pdf
109
+ │ └── ... (50+ cards)
110
+ ├── terms/
111
+ │ ├── amex_terms.pdf
112
+ │ ├── chase_terms.pdf
113
+ │ └── ...
114
+ └── guides/
115
+ ├── maximizing_rewards.md
116
+ ├── category_codes.md
117
+ └── ...
118
+ ```
119
+
120
+ ### 3. Document Sources
121
+
122
+ #### Option A: Scrape from Issuer Websites
123
+
124
+ ```python
125
+ # scrape_card_docs.py
126
+ import requests
127
+ from bs4 import BeautifulSoup
128
+ import PyPDF2
129
+ import os
130
+
131
+ CARD_URLS = {
132
+ "amex_gold": "https://www.americanexpress.com/us/credit-cards/card/gold-card/",
133
+ "chase_sapphire_reserve": "https://creditcards.chase.com/rewards-credit-cards/sapphire/reserve",
134
+ # ... more cards
135
+ }
136
+
137
+ def scrape_card_benefits(url, output_file):
138
+ """Scrape card benefits from issuer website"""
139
+ response = requests.get(url)
140
+ soup = BeautifulSoup(response.text, 'html.parser')
141
+
142
+ # Extract benefits section
143
+ benefits = soup.find('div', class_='benefits-section')
144
+
145
+ # Save to markdown
146
+ with open(output_file, 'w') as f:
147
+ f.write(f"# {card_name}\n\n")
148
+ f.write(benefits.get_text())
149
+
150
+ # Scrape all cards
151
+ for card_name, url in CARD_URLS.items():
152
+ scrape_card_benefits(url, f"data/cards/{card_name}.md")
153
+ ```
154
+
155
+ #### Option B: Manual Documentation
156
+
157
+ Create markdown files:
158
+
159
+ **File:** `data/cards/amex_gold.md`
160
+ ```markdown
161
+ # American Express Gold Card
162
+
163
+ ## Overview
164
+ - **Annual Fee:** $325
165
+ - **Rewards Rate:** 4x points on dining & U.S. supermarkets (up to $25k/year)
166
+ - **Welcome Bonus:** 90,000 points after $6k spend in 6 months
167
+
168
+ ## Earning Structure
169
+
170
+ ### 4x Points
171
+ - Restaurants worldwide (including takeout & delivery)
172
+ - U.S. supermarkets (up to $25,000 per year, then 1x)
173
+
174
+ ### 3x Points
175
+ - Flights booked directly with airlines or on amextravel.com
176
+
177
+ ### 1x Points
178
+ - All other purchases
179
+
180
+ ## Monthly Credits
181
+ - $10 Uber Cash (Uber Eats eligible)
182
+ - $10 Grubhub/Seamless/The Cheesecake Factory/select Shake Shack
183
+
184
+ ## Travel Benefits
185
+ - No foreign transaction fees
186
+ - Trip delay insurance
187
+ - Lost luggage insurance
188
+ - Car rental loss and damage insurance
189
+
190
+ ## Merchant Acceptance
191
+ - **Accepted:** Most merchants worldwide
192
+ - **Not Accepted:** Costco warehouses (Costco.com works)
193
+ - **Not Accepted:** Some small businesses
194
+
195
+ ## Redemption Options
196
+ - Transfer to 20+ airline/hotel partners (1:1 ratio)
197
+ - Pay with Points at Amazon (0.7 cents per point)
198
+ - Statement credits (0.6 cents per point)
199
+ - Book travel through Amex Travel (1 cent per point)
200
+
201
+ ## Best For
202
+ - High grocery spending (up to $25k/year)
203
+ - Frequent dining out
204
+ - Travelers who value transfer partners
205
+
206
+ ## Limitations
207
+ - $25,000 annual cap on 4x supermarket category
208
+ - Amex not accepted everywhere
209
+ - Annual fee not waived first year
210
+ ```
211
+
212
+ ---
213
+
214
+ ## Implementation
215
+
216
+ ### File: `rewards_rag_server.py`
217
+
218
+ ```python
219
+ """
220
+ LlamaIndex RAG server for credit card benefits
221
+ """
222
+
223
+ from llama_index.core import (
224
+ VectorStoreIndex,
225
+ SimpleDirectoryReader,
226
+ StorageContext,
227
+ ServiceContext,
228
+ Settings
229
+ )
230
+ from llama_index.vector_stores.chroma import ChromaVectorStore
231
+ from llama_index.embeddings.openai import OpenAIEmbedding
232
+ from llama_index.llms.gemini import Gemini
233
+ from llama_index.core.node_parser import SentenceSplitter
234
+ import chromadb
235
+ from fastapi import FastAPI, HTTPException
236
+ from pydantic import BaseModel
237
+ import os
238
+
239
+ # Initialize FastAPI
240
+ app = FastAPI(title="Rewards RAG MCP Server")
241
+
242
+ # Configure LlamaIndex
243
+ Settings.embed_model = OpenAIEmbedding(
244
+ model="text-embedding-3-small",
245
+ api_key=os.getenv("OPENAI_API_KEY")
246
+ )
247
+ Settings.llm = Gemini(
248
+ model="models/gemini-2.0-flash-exp",
249
+ api_key=os.getenv("GEMINI_API_KEY")
250
+ )
251
+ Settings.chunk_size = 512
252
+ Settings.chunk_overlap = 50
253
+
254
+ # Initialize ChromaDB
255
+ chroma_client = chromadb.PersistentClient(path="./chroma_db")
256
+ chroma_collection = chroma_client.get_or_create_collection("credit_cards")
257
+
258
+ # Create vector store
259
+ vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
260
+ storage_context = StorageContext.from_defaults(vector_store=vector_store)
261
+
262
+ ---
263
+
264
+ ## Document Loading & Indexing
265
+
266
+ def load_and_index_documents():
267
+ """Load card documents and create vector index"""
268
+
269
+ # Load documents from directory
270
+ documents = SimpleDirectoryReader(
271
+ input_dir="./data",
272
+ recursive=True,
273
+ required_exts=[".pdf", ".md", ".txt"]
274
+ ).load_data()
275
+
276
+ print(f"Loaded {len(documents)} documents")
277
+
278
+ # Parse into nodes (chunks)
279
+ node_parser = SentenceSplitter(
280
+ chunk_size=512,
281
+ chunk_overlap=50
282
+ )
283
+ nodes = node_parser.get_nodes_from_documents(documents)
284
+
285
+ print(f"Created {len(nodes)} nodes")
286
+
287
+ # Create index
288
+ index = VectorStoreIndex(
289
+ nodes=nodes,
290
+ storage_context=storage_context
291
+ )
292
+
293
+ # Persist to disk
294
+ index.storage_context.persist(persist_dir="./storage")
295
+
296
+ return index
297
+
298
+ # Load index on startup
299
+ try:
300
+ # Try loading existing index
301
+ storage_context = StorageContext.from_defaults(
302
+ vector_store=vector_store,
303
+ persist_dir="./storage"
304
+ )
305
+ index = VectorStoreIndex.from_storage_context(storage_context)
306
+ print("Loaded existing index")
307
+ except:
308
+ # Create new index
309
+ print("Creating new index...")
310
+ index = load_and_index_documents()
311
+
312
+ # Create query engine
313
+ query_engine = index.as_query_engine(
314
+ similarity_top_k=5,
315
+ response_mode="compact"
316
+ )
317
+
318
+ ---
319
+
320
+ ## API Endpoints
321
+
322
+ class QueryRequest(BaseModel):
323
+ query: str
324
+ card_name: str = None
325
+ top_k: int = 5
326
+
327
+ class QueryResponse(BaseModel):
328
+ answer: str
329
+ sources: list
330
+ confidence: float
331
+
332
+ @app.post("/query", response_model=QueryResponse)
333
+ async def query_benefits(request: QueryRequest):
334
+ """
335
+ Query credit card benefits
336
+
337
+ Example:
338
+ POST /query
339
+ {
340
+ "query": "Which card has best grocery rewards?",
341
+ "top_k": 5
342
+ }
343
+ """
344
+ try:
345
+ # Add card filter if specified
346
+ if request.card_name:
347
+ query = f"For {request.card_name}: {request.query}"
348
+ else:
349
+ query = request.query
350
+
351
+ # Query the index
352
+ response = query_engine.query(query)
353
+
354
+ # Extract sources
355
+ sources = []
356
+ for node in response.source_nodes:
357
+ sources.append({
358
+ "card_name": node.metadata.get("file_name", "Unknown"),
359
+ "content": node.text[:200] + "...",
360
+ "relevance_score": float(node.score)
361
+ })
362
+
363
+ # Calculate confidence based on top score
364
+ confidence = sources[0]["relevance_score"] if sources else 0.0
365
+
366
+ return QueryResponse(
367
+ answer=str(response),
368
+ sources=sources,
369
+ confidence=confidence
370
+ )
371
+
372
+ except Exception as e:
373
+ raise HTTPException(status_code=500, detail=str(e))
374
+
375
+ ---
376
+
377
+ ## Advanced Query Techniques
378
+
379
+ @app.post("/compare")
380
+ async def compare_cards(request: dict):
381
+ """
382
+ Compare multiple cards on specific criteria
383
+
384
+ Example:
385
+ POST /compare
386
+ {
387
+ "cards": ["Amex Gold", "Chase Sapphire Reserve"],
388
+ "criteria": "travel benefits"
389
+ }
390
+ """
391
+ cards = request["cards"]
392
+ criteria = request["criteria"]
393
+
394
+ # Query each card
395
+ comparisons = []
396
+ for card in cards:
397
+ query = f"What are the {criteria} for {card}?"
398
+ response = query_engine.query(query)
399
+
400
+ comparisons.append({
401
+ "card": card,
402
+ "benefits": str(response)
403
+ })
404
+
405
+ # Synthesize comparison
406
+ synthesis_prompt = f"""
407
+ Compare these cards on {criteria}:
408
+
409
+ {comparisons}
410
+
411
+ Provide a clear winner and reasoning.
412
+ """
413
+
414
+ final_response = Settings.llm.complete(synthesis_prompt)
415
+
416
+ return {
417
+ "comparison": str(final_response),
418
+ "details": comparisons
419
+ }
420
+
421
+ ---
422
+
423
+ ## Metadata Filtering
424
+
425
+ def add_metadata_to_documents():
426
+ """Add rich metadata for filtering"""
427
+
428
+ documents = SimpleDirectoryReader("./data").load_data()
429
+
430
+ for doc in documents:
431
+ # Extract card name from filename
432
+ card_name = doc.metadata["file_name"].replace(".md", "")
433
+
434
+ # Add metadata
435
+ doc.metadata.update({
436
+ "card_name": card_name,
437
+ "issuer": extract_issuer(card_name),
438
+ "annual_fee": extract_annual_fee(doc.text),
439
+ "category": extract_category(doc.text)
440
+ })
441
+
442
+ return documents
443
+
444
+ # Query with filters
445
+ @app.post("/query_filtered")
446
+ async def query_with_filters(request: dict):
447
+ """
448
+ Query with metadata filters
449
+
450
+ Example:
451
+ POST /query_filtered
452
+ {
453
+ "query": "best travel card",
454
+ "filters": {
455
+ "issuer": "Chase",
456
+ "annual_fee": {"$lte": 500}
457
+ }
458
+ }
459
+ """
460
+ from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
461
+
462
+ # Build filters
463
+ filters = MetadataFilters(
464
+ filters=[
465
+ ExactMatchFilter(key="issuer", value=request["filters"]["issuer"])
466
+ ]
467
+ )
468
+
469
+ # Query with filters
470
+ query_engine = index.as_query_engine(
471
+ similarity_top_k=5,
472
+ filters=filters
473
+ )
474
+
475
+ response = query_engine.query(request["query"])
476
+
477
+ return {"answer": str(response)}
478
+
479
+ ---
480
+
481
+ ## Hybrid Search (Keyword + Semantic)
482
+
483
+ from llama_index.core.retrievers import VectorIndexRetriever, BM25Retriever
484
+ from llama_index.core.query_engine import RetrieverQueryEngine
485
+
486
+ def create_hybrid_retriever():
487
+ """Combine vector search + keyword search"""
488
+
489
+ # Vector retriever
490
+ vector_retriever = VectorIndexRetriever(
491
+ index=index,
492
+ similarity_top_k=10
493
+ )
494
+
495
+ # BM25 keyword retriever
496
+ bm25_retriever = BM25Retriever.from_defaults(
497
+ docstore=index.docstore,
498
+ similarity_top_k=10
499
+ )
500
+
501
+ # Combine retrievers
502
+ from llama_index.core.retrievers import QueryFusionRetriever
503
+
504
+ hybrid_retriever = QueryFusionRetriever(
505
+ retrievers=[vector_retriever, bm25_retriever],
506
+ similarity_top_k=5,
507
+ num_queries=1
508
+ )
509
+
510
+ return RetrieverQueryEngine(retriever=hybrid_retriever)
511
+
512
+ ---
513
+
514
+ ## Reranking for Better Results
515
+
516
+ from llama_index.postprocessor.cohere_rerank import CohereRerank
517
+
518
+ def create_reranking_query_engine():
519
+ """Add reranking for improved relevance"""
520
+
521
+ # Cohere reranker
522
+ reranker = CohereRerank(
523
+ api_key=os.getenv("COHERE_API_KEY"),
524
+ top_n=3
525
+ )
526
+
527
+ query_engine = index.as_query_engine(
528
+ similarity_top_k=10, # Retrieve more candidates
529
+ node_postprocessors=[reranker] # Rerank to top 3
530
+ )
531
+
532
+ return query_engine
533
+
534
+ ---
535
+
536
+ ## Evaluation & Metrics
537
+
538
+ from llama_index.core.evaluation import (
539
+ RelevancyEvaluator,
540
+ FaithfulnessEvaluator
541
+ )
542
+
543
+ async def evaluate_rag_quality():
544
+ """Evaluate RAG system quality"""
545
+
546
+ # Test queries
547
+ test_queries = [
548
+ "Which card has best grocery rewards?",
549
+ "Does Amex Gold work at Costco?",
550
+ "What are Chase Sapphire Reserve travel benefits?"
551
+ ]
552
+
553
+ # Ground truth answers
554
+ ground_truth = [
555
+ "Citi Custom Cash offers 5% on groceries...",
556
+ "No, American Express is not accepted at Costco warehouses...",
557
+ "Chase Sapphire Reserve includes Priority Pass..."
558
+ ]
559
+
560
+ # Evaluators
561
+ relevancy_evaluator = RelevancyEvaluator(llm=Settings.llm)
562
+ faithfulness_evaluator = FaithfulnessEvaluator(llm=Settings.llm)
563
+
564
+ results = []
565
+ for query, truth in zip(test_queries, ground_truth):
566
+ response = query_engine.query(query)
567
+
568
+ # Evaluate relevancy
569
+ relevancy_result = await relevancy_evaluator.aevaluate(
570
+ query=query,
571
+ response=str(response)
572
+ )
573
+
574
+ # Evaluate faithfulness
575
+ faithfulness_result = await faithfulness_evaluator.aevaluate(
576
+ query=query,
577
+ response=str(response),
578
+ contexts=[node.text for node in response.source_nodes]
579
+ )
580
+
581
+ results.append({
582
+ "query": query,
583
+ "relevancy_score": relevancy_result.score,
584
+ "faithfulness_score": faithfulness_result.score
585
+ })
586
+
587
+ return results
588
+
589
+ ---
590
+
591
+ ## Deployment
592
+
593
+ ### 1. Build Docker Image
594
+
595
+ **File:** `Dockerfile`
596
+ ```dockerfile
597
+ FROM python:3.11-slim
598
+
599
+ WORKDIR /app
600
+
601
+ # Install dependencies
602
+ COPY requirements.txt .
603
+ RUN pip install --no-cache-dir -r requirements.txt
604
+
605
+ # Copy application
606
+ COPY . .
607
+
608
+ # Download and index documents on build
609
+ RUN python -c "from rewards_rag_server import load_and_index_documents; load_and_index_documents()"
610
+
611
+ # Expose port
612
+ EXPOSE 7860
613
+
614
+ # Run server
615
+ CMD ["uvicorn", "rewards_rag_server:app", "--host", "0.0.0.0", "--port", "7860"]
616
+ ```
617
+
618
+ ### 2. Deploy to Hugging Face Spaces
619
+
620
+ ```bash
621
+ # Create Space
622
+ huggingface-cli repo create rewardpilot-rewards-rag --type space --space_sdk docker
623
+
624
+ # Push files
625
+ git add .
626
+ git commit -m "Deploy RAG server"
627
+ git push
628
+ ```
629
+
630
+ ---
631
+
632
+ ## Performance Optimization
633
+
634
+ ### 1. Caching Embeddings
635
+
636
+ ```python
637
+ from functools import lru_cache
638
+
639
+ @lru_cache(maxsize=1000)
640
+ def get_embedding(text: str):
641
+ """Cache embeddings for repeated queries"""
642
+ return Settings.embed_model.get_text_embedding(text)
643
+ ```
644
+
645
+ ### 2. Batch Processing
646
+
647
+ ```python
648
+ async def batch_query(queries: list):
649
+ """Process multiple queries in parallel"""
650
+ import asyncio
651
+
652
+ tasks = [query_engine.aquery(q) for q in queries]
653
+ results = await asyncio.gather(*tasks)
654
+
655
+ return results
656
+ ```
657
+
658
+ ### 3. Index Optimization
659
+
660
+ ```python
661
+ # Use smaller embedding model for speed
662
+ Settings.embed_model = OpenAIEmbedding(
663
+ model="text-embedding-3-small", # 1536 dims
664
+ # vs text-embedding-3-large (3072 dims)
665
+ )
666
+
667
+ # Reduce chunk size for faster retrieval
668
+ Settings.chunk_size = 256 # vs 512
669
+ ```
670
+
671
+ ---
672
+
673
+ ## Monitoring
674
+
675
+ ```python
676
+ import time
677
+ from prometheus_client import Counter, Histogram
678
+
679
+ # Metrics
680
+ query_counter = Counter('rag_queries_total', 'Total RAG queries')
681
+ query_duration = Histogram('rag_query_duration_seconds', 'RAG query duration')
682
+
683
+ @app.post("/query")
684
+ async def query_with_monitoring(request: QueryRequest):
685
+ query_counter.inc()
686
+
687
+ start_time = time.time()
688
+ response = query_engine.query(request.query)
689
+ duration = time.time() - start_time
690
+
691
+ query_duration.observe(duration)
692
+
693
+ return response
694
+ ```
695
+
696
+ ---
697
+
698
+ **Related Documentation:**
699
+ - [MCP Server Implementation](./mcp_architecture.md)
700
+ - [Modal Deployment Guide](./modal_deployment.md)
701
+ - [Agent Reasoning Flow](./agent_reasoning.md)
702
+ ```
703
+
704
+ ---