Hey Devs! 👋
Have you ever wanted to chat with your favorite books like you're texting a friend? 📖💬
Well, you're in the right place! In this blog post, I’ll walk you through how I built BookChatBot, an AI-powered chatbot that can answer questions about a book, using:
- 🧠 LangChain (for LLM logic)
- 🌲 Pinecone (for vector search)
- 🧾 PDF loading and splitting
- ⚡ Google Gemini (for answering questions)
- 🧪 Flask (as the web framework)
You can find the full code on GitHub:
👉 GitHub Repo
💡 What are we building?
We're building a chatbot web app that can read PDFs (like a book 📘), store them in Pinecone’s vector database, and allow users to ask questions about the content!
The AI will retrieve the most relevant chunks and generate human-like answers using Google's Gemini model.
💸 Note: Pinecone’s free tier only allows one index. So for now, you can't dynamically upload new books — but once set up, it's super efficient for Q&A!
🗂️ Project Structure
bookchatbot-/
├── app.py # Flask app and RAG chain
├── helper.py # PDF loading, chunking, and embeddings
├── src/
│ ├── prompt.py # System prompt for LLM
├── data/ # Folder with your PDF files
├── templates/
│ └── chat.html # Simple frontend
├── .env # API keys (not shared!)
🧠 How does it work?
This is a RAG (Retrieval-Augmented Generation) pipeline:
- Load and split PDFs into chunks
- Convert chunks into vector embeddings
- Store in Pinecone (vector DB)
- Accept user question
- Find the top relevant chunks (via Pinecone)
- Use Gemini to answer based on retrieved content
🧾 helper.py – Preprocessing the PDFs
def load_pdf(data):
loader = DirectoryLoader(data, glob="*.pdf", loader_cls=PyMuPDFLoader)
return loader.load()
📥 We load all PDFs from the data/
folder.
def text_splitter(extraced_date):
text_split = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
return text_split.split_documents(extraced_date)
📚 We split documents into manageable 500-token chunks to help with better retrieval.
def load_geneni_embeddings():
embeddings = GoogleGenerativeAIEmbeddings(
model="models/embedding-001",
google_api_key=os.getenv("GOOGLE_API_KEY")
)
return embeddings
🔍 We use Google's embedding model to turn text chunks into vectors!
🚀 app.py – The Flask App + AI Brain
We start by setting up Pinecone:
pc = Pinecone(api_key= PINECONE_API_KEY)
docsearch = PineconeVectorStore.from_existing_index(index_name="bookchat", embedding=embeddings)
retriver = docsearch.as_retriever(search_type="similarity", search_kwargs={"k": 3})
🧠 This allows us to retrieve 3 most similar chunks from our stored book.
Then we build a prompt + Gemini LLM:
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
("human", "{input}"),
])
💬 system_prompt
defines how the AI should behave (e.g., polite, detailed).
Create the RAG chain:
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriver, question_answer_chain)
💡 This is the brain of the chatbot — retrieval + generation.
Finally, the Flask endpoints:
@app.route("/")
def index():
return render_template('chat.html')
@app.route("/get", methods=["GET", "POST"])
def chat():
msg = request.form["msg"]
response = rag_chain.invoke({"input": msg})
return str(response["answer"])
📡 The front end sends a message → gets a smart reply from the AI!
🧪 Testing it Out
Just run:
python app.py
Then open http://localhost:8080
and start chatting with your book! 🗨️📕
⚠️ Limitations
- Pinecone’s free tier = only one index. So, you can't upload new books at runtime unless you upgrade or manage your own embedding storage.
- Static loading: you must re-run the app if you want to embed a different book.
- Basic HTML frontend – could be upgraded with React, Tailwind, or Chat UI kits.
🛠️ Ideas for Improvements
- Add file upload (if using a paid Pinecone plan or local vector store like FAISS)
- Use streaming responses for a more chat-like feel
- Add authentication and user-specific history
- Display source chunk(s) below each answer for transparency
🌐 Conclusion
Building an AI chatbot like this is easier than ever thanks to:
- 🧠 LangChain for chaining LLM workflows
- 🌲 Pinecone for fast vector search
- ⚡ Google Gemini for intelligent responses
- 🧪 Flask for quick APIs
If you liked this post, don’t forget to ⭐ the GitHub repo and follow me here on Dev.to!
Got questions or ideas? Drop them below! 💬👇