Paid Contract: Deploy Fine-Tuned Qwen3-235B-A22B + Full Production Integration

#38
by philosop - opened

I have a private fine-tuned Qwen3-235B-A22B with LoRA adapters (13.3GB) that I need deployed and fully integrated into a production system.
What I have:

Fine-tuned LoRA adapters on Hugging Face (private repo)
RunPod account with 4x H100 access
Existing front-end UI (React/Next.js)
Existing back-end API (FastAPI)
Existing RAG system (ChromaDB)

What I need (end-to-end):
Model Deployment:

Merge LoRA adapters with base Qwen3-235B-A22B
Quantize to FP8 (NOT BitsAndBytes - I know BNB 8-bit doesn't work with vLLM for MoE architectures)
Deploy on RunPod (4x H100) using SGLang or vLLM

Full Integration:
4. Streaming API endpoint (SSE/WebSocket, 2000+ tokens)
5. Connect existing front-end UI to deployed model
6. Integrate existing RAG system with conversational AI
7. Session memory/persistence (Redis + PostgreSQL)
8. Long-context support (128K window, 30-50 message exchanges)
9. Mobile browser compatibility - must work on Vanadium (GrapheneOS) with <3s response time
10. Start/stop controls for GPU cost management
11. Documentation and handoff
End result: A fully functional AI companion with streaming responses, deep multi-turn conversations, relational memory across sessions, and RAG-enhanced knowledge retrieval - production-ready.
Budget: Open but needs to be reasonable - I will accept offers.
Timeline: 2 weeks
Requirements:

Hands-on experience deploying 100B+ MoE models (not API wrappers)
Able to provide verifiable proof of previous deployments (I've been burned multiple times by fake credentials and will verify all claims)
Full-stack capability or willingness to collaborate with my existing back-end developer

If you can only do the ML/deployment side (steps 1-3), still reach out - I may split this into two roles.
Reply here if you have genuine experience. Serious inquiries only please.

Sign up or log in to comment