Skip to content
Back to work
AI & ML2025

Personalized RAG Chatbot

A retrieval-augmented chatbot running on a locally hosted Mistral-7B with GPU-accelerated inference, grounded in a curated knowledge base and served through a custom web interface.

The problem

General LLMs don't know your specific context, and sending private data to hosted APIs isn't always acceptable. I wanted a chatbot that answers from a controlled knowledge base while running entirely on local hardware.

Approach

I ran Mistral-7B-Instruct locally with GPU-accelerated inference on an RTX 3060 Ti, built a retrieval step that injects relevant context into each prompt, and wrapped it in a Flask API. A custom web interface, exposed online through Cloudflare, made it usable from anywhere while the model stayed on local hardware.

My role

Independent project — retrieval design, prompt construction, the Flask API, local model hosting, and deployment.