When you’re knee-deep in PDF notes, textbooks, or research papers, wouldn’t it be nice to just ask a question and get a clear answer from your documents? That’s exactly why I built StudyBuddy — an intelligent chatbot that turns static study material into an interactive conversation.

And it's powered by some amazing tools:

🧭 LangGraph for managing multi-step, tool-aware reasoning.
🤖 Gemini for both LLM-powered answers and document embeddings.
ChromaDB as the vector database that enables fast, semantic search over your PDFs.

💡 Why I Built This
I’ve been experimenting with Retrieval-Augmented Generation (RAG) as part of the GenAI course on Kaggle, and I wanted a system that could:

Let me upload any document (PDFs for now and should probably enable that in chat tba),
Ask natural language questions about it,
Get accurate, contextual answers — with memory and topic awareness.
Instead of just bolting together an LLM and a vector DB, I wanted actual conversation flow — so I turned to LangGraph.

🛠️ How It Works
Here's the high-level setup:

  • Embeddings
    When you upload a PDF, I chunk the text and generate embeddings using Gemini's gemini-embedding-03 model.

  • Storage
    Those chunks are stored in ChromaDB, a fast, local vector database that lets me search by semantic meaning, not just keywords.
    Conversational Reasoning with LangGraph
    The chatbot is a graph. Each step is a node:
    human_node is responsible for user interaction whereas the study_node handles the underlying tools like summarize_chunks, define_term, clarify_question, etc. that LangGraph routes to based on user input.

Gemini acts as the brain behind it all, generating responses using retrieved context.

LLM Output
Once the graph finishes routing through tools, Gemini generates a thoughtful, helpful response — often with references to the source material.

🧪 Example Use Case
User: "What’s the difference between supervised and unsupervised learning?"
StudyBuddy: (retrieves relevant chunks from your ML notes, then responds)
“Supervised learning uses labeled datasets… while unsupervised learning identifies patterns without labels. See page 3 of ‘ml_notes.pdf’ for a breakdown.”