Medical RAG Chatbot (LangChain + Pinecone + Flask)

CareSage

AI-Powered Medical Knowledge Assistant

Retrieval-Augmented Generation (RAG) Chatbot

Project Overview

Built CareSage, a Retrieval-Augmented Generation (RAG) medical chatbot that answers user questions using knowledge pulled from PDF documents instead of guessing. The app ingests PDFs with PyPDF, chunks and embeds content using sentence-transformers, stores vectors in Pinecone, and retrieves the most relevant passages at query time.

A LangChain pipeline then combines the retrieved context with an OpenAI LLM to generate grounded, helpful responses.

The system is served through a lightweight Flask backend with a clean chat UI, environment-based configuration via python-dotenv, and modular components for document ingestion, indexing, retrieval, and response generation—making it easy to extend to new medical document sets or internal knowledge bases.

Key Features

📄

PDF Document Ingestion

Automatically extracts and processes medical literature from PDF documents using PyPDF for comprehensive knowledge base building.

🧠

Intelligent Chunking & Embedding

Uses sentence-transformers to create semantic embeddings of document chunks, ensuring accurate context retrieval.

🔍

Vector Search with Pinecone

Stores and retrieves document vectors in Pinecone for lightning-fast semantic search across medical knowledge.

🤖

RAG-Powered Responses

Combines LangChain orchestration with OpenAI LLM to generate accurate, context-grounded medical answers instead of hallucinations.

💬

Clean Chat Interface

Lightweight Flask-powered web UI for seamless conversational interactions with the medical knowledge base.

⚙️

Modular Architecture

Separated components for ingestion, indexing, retrieval, and generation—easy to extend to new document sets or knowledge bases.

Technical Architecture

📥 Document Ingestion Pipeline

  • PDF Parsing: PyPDF extracts text from medical documents
  • Text Chunking: Intelligent splitting into semantic segments
  • Embedding Generation: sentence-transformers creates vector representations
  • Vector Storage: Embeddings stored in Pinecone vector database

🔄 Query Processing Flow

  1. User submits a medical question through the chat UI
  2. Question is embedded using the same sentence-transformer model
  3. Pinecone retrieves top-k most relevant document chunks
  4. LangChain constructs a prompt with retrieved context
  5. OpenAI LLM generates a grounded, accurate response
  6. Answer is displayed in the chat interface with source references

🏗️ System Components

  • Backend: Flask application with RESTful API endpoints
  • Configuration: python-dotenv for environment management
  • Orchestration: LangChain for RAG pipeline management
  • Vector DB: Pinecone for scalable similarity search
  • LLM: OpenAI GPT models for response generation

Technology Stack

🐍 Python
🌐 Flask
🔗 LangChain
📌 Pinecone
🤖 OpenAI
🔤 sentence-transformers
📄 PyPDF
🔐 python-dotenv

Use Cases & Applications

🏥 Medical Documentation Search

Healthcare professionals can quickly search through vast medical literature, research papers, and clinical guidelines to find relevant information.

📚 Internal Knowledge Base

Organizations can build custom knowledge bases from internal documents, SOPs, and training materials for instant employee access.

🎓 Educational Assistant

Medical students and researchers can ask questions about complex topics and receive answers grounded in authoritative sources.

⚕️ Clinical Decision Support

Provides evidence-based information to support clinical decision-making by referencing validated medical documents.

Why Retrieval-Augmented Generation?

Accuracy

Responses are grounded in actual document content, not LLM hallucinations

🔒

Source Attribution

Every answer can be traced back to specific documents for verification

🔄

Up-to-Date Knowledge

Knowledge base can be updated anytime by adding new PDFs

🎯

Domain-Specific

Focused on medical knowledge without irrelevant information

Key Achievements

  • Zero Hallucinations: Responses are always grounded in retrieved document context
  • Modular Design: Easy to extend to new medical specialties or document types
  • Scalable Architecture: Pinecone vector DB handles millions of embeddings efficiently
  • Fast Response Time: Semantic search and LLM generation complete in seconds
  • Production-Ready: Environment-based config and error handling for deployment
  • Cost-Effective: Only retrieves relevant context, minimizing LLM token usage

Future Enhancements

🔊 Voice Interface

Add speech-to-text and text-to-speech for hands-free medical queries

📊 Analytics Dashboard

Track common queries, usage patterns, and knowledge gaps

🌐 Multi-Language Support

Extend to support medical documents in multiple languages

🔐 Access Control

Role-based permissions for different user types and document sets

📂

View Project on GitHub

Complete source code and documentation available on GitHub.

View on GitHub →

Interested in RAG Solutions?

I can build custom knowledge assistants for your organization’s internal documents, medical literature, or specialized knowledge bases.

Let’s Build Something Amazing

Visited 1 times, 1 visit(s) today