Hey Devs! 👋
Have you ever wanted to chat with your favorite books like you're texting a friend? 📖💬
Well, you're in the right place! In this blog post, I’ll walk you through how I built BookChatBot, an AI-powered chatbot that can answer questions about a book, using:

  • 🧠 LangChain (for LLM logic)
  • 🌲 Pinecone (for vector search)
  • 🧾 PDF loading and splitting
  • ⚡ Google Gemini (for answering questions)
  • 🧪 Flask (as the web framework)

You can find the full code on GitHub:
👉 GitHub Repo


💡 What are we building?

We're building a chatbot web app that can read PDFs (like a book 📘), store them in Pinecone’s vector database, and allow users to ask questions about the content!
The AI will retrieve the most relevant chunks and generate human-like answers using Google's Gemini model.

💸 Note: Pinecone’s free tier only allows one index. So for now, you can't dynamically upload new books — but once set up, it's super efficient for Q&A!


🗂️ Project Structure

bookchatbot-/
├── app.py               # Flask app and RAG chain
├── helper.py            # PDF loading, chunking, and embeddings
├── src/
│   ├── prompt.py        # System prompt for LLM
├── data/                # Folder with your PDF files
├── templates/
│   └── chat.html        # Simple frontend
├── .env                 # API keys (not shared!)

🧠 How does it work?

This is a RAG (Retrieval-Augmented Generation) pipeline:

  1. Load and split PDFs into chunks
  2. Convert chunks into vector embeddings
  3. Store in Pinecone (vector DB)
  4. Accept user question
  5. Find the top relevant chunks (via Pinecone)
  6. Use Gemini to answer based on retrieved content

🧾 helper.py – Preprocessing the PDFs

def load_pdf(data):
    loader = DirectoryLoader(data, glob="*.pdf", loader_cls=PyMuPDFLoader)
    return loader.load()

📥 We load all PDFs from the data/ folder.

def text_splitter(extraced_date):
    text_split = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
    return text_split.split_documents(extraced_date)

📚 We split documents into manageable 500-token chunks to help with better retrieval.

def load_geneni_embeddings():
    embeddings = GoogleGenerativeAIEmbeddings(
        model="models/embedding-001",
        google_api_key=os.getenv("GOOGLE_API_KEY")
    )
    return embeddings

🔍 We use Google's embedding model to turn text chunks into vectors!


🚀 app.py – The Flask App + AI Brain

We start by setting up Pinecone:

pc = Pinecone(api_key= PINECONE_API_KEY)
docsearch = PineconeVectorStore.from_existing_index(index_name="bookchat", embedding=embeddings)
retriver = docsearch.as_retriever(search_type="similarity", search_kwargs={"k": 3})

🧠 This allows us to retrieve 3 most similar chunks from our stored book.

Then we build a prompt + Gemini LLM:

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}"),
])

💬 system_prompt defines how the AI should behave (e.g., polite, detailed).

Create the RAG chain:

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriver, question_answer_chain)

💡 This is the brain of the chatbot — retrieval + generation.

Finally, the Flask endpoints:

@app.route("/")
def index():
    return render_template('chat.html')

@app.route("/get", methods=["GET", "POST"])
def chat():
    msg = request.form["msg"]
    response = rag_chain.invoke({"input": msg})
    return str(response["answer"])

📡 The front end sends a message → gets a smart reply from the AI!


🧪 Testing it Out

Just run:

python app.py

Then open http://localhost:8080 and start chatting with your book! 🗨️📕


⚠️ Limitations

  • Pinecone’s free tier = only one index. So, you can't upload new books at runtime unless you upgrade or manage your own embedding storage.
  • Static loading: you must re-run the app if you want to embed a different book.
  • Basic HTML frontend – could be upgraded with React, Tailwind, or Chat UI kits.

🛠️ Ideas for Improvements

  • Add file upload (if using a paid Pinecone plan or local vector store like FAISS)
  • Use streaming responses for a more chat-like feel
  • Add authentication and user-specific history
  • Display source chunk(s) below each answer for transparency

🌐 Conclusion

Building an AI chatbot like this is easier than ever thanks to:

  • 🧠 LangChain for chaining LLM workflows
  • 🌲 Pinecone for fast vector search
  • ⚡ Google Gemini for intelligent responses
  • 🧪 Flask for quick APIs

If you liked this post, don’t forget to ⭐ the GitHub repo and follow me here on Dev.to!

Got questions or ideas? Drop them below! 💬👇