NeuralRAG
NeuralRAG is a production-grade Self-Correcting RAG pipeline that goes beyond basic retrieval and generation. It validates its own outputs, catches hallucinations, and automatically refines queries when the answers aren’t good enough.
Self-Correcting Pipeline
Built with LangGraph, the system runs a multi-step correction loop:
- Retrieval: Hybrid BM25 + vector search across ChromaDB
- Relevance Grading: LLM-based assessment of how relevant the retrieved docs actually are
- Generation: Context-aware answer generation with LLaMA3
- Hallucination Check: Verifies that generated answers are grounded in the source documents
- Quality Assessment: Checks if the answer actually addresses the original question
- Auto-Correction: Reformulates the query and re-retrieves if quality isn’t good enough
Multi-Modal Document Processing
- PDF, DOCX, TXT, Markdown ingestion with intelligent chunking
- Recursive text splitting with configurable chunk size and overlap
- Metadata preservation for source attribution
Key Features
- Hybrid Search combining BM25 keyword matching with semantic vector similarity
- Self-Correction Loop with up to 3 automatic retry cycles and query reformulation
- Hallucination Detection ensuring answers stay grounded in retrieved context
- Real-time Streaming via WebSocket-based responses
- Modern Dashboard built with Next.js for document upload and chat
Technologies
- Python: Backend pipeline
- LLaMA3 (Ollama): Local LLM inference
- ChromaDB: Vector database
- LangGraph: Orchestration framework
- FastAPI: REST API
- Next.js: Frontend dashboard
- Docker: Containerized deployment