Personalized RAG Chatbot
A retrieval-augmented chatbot running on a locally hosted Mistral-7B with GPU-accelerated inference, grounded in a curated knowledge base and served through a custom web interface.
The problem
General LLMs don't know your specific context, and sending private data to hosted APIs isn't always acceptable. I wanted a chatbot that answers from a controlled knowledge base while running entirely on local hardware.
Approach
I ran Mistral-7B-Instruct locally with GPU-accelerated inference on an RTX 3060 Ti, built a retrieval step that injects relevant context into each prompt, and wrapped it in a Flask API. A custom web interface, exposed online through Cloudflare, made it usable from anywhere while the model stayed on local hardware.
My role
Independent project — retrieval design, prompt construction, the Flask API, local model hosting, and deployment.