📄 AI with Java & Spring Boot – Part 4: Ask Questions About PDF Files with LangChain4j

Hey devs! 👋

In Part 3, we added memory to our Java-based AI assistant using LangChain4j and OpenAI.

Now it’s time to take it next level:

🤖 "Upload a PDF and ask anything about it."

We’re going to:

  • Parse documents
  • Store their content in vector format
  • Ask questions and get context-aware answers

Let’s build a file Q&A bot using LangChain4j, Spring Boot, and OpenAI embeddings!


🧠 What You’ll Learn

  • How to read PDFs in Java
  • Embed document chunks using OpenAI
  • Store/retrieve from in-memory vector store (or connect to a real one later)
  • Query the content with LangChain4j

⚙️ Tools Used

  • Java 17+
  • Spring Boot 3
  • LangChain4j 0.25.0
  • Apache PDFBox (PDF reading)
  • OpenAI Embeddings
  • In-memory vector store

🛠 Step-by-Step: PDF Q&A Bot

1. Add Maven Dependencies

dev.langchain4j
    langchain4j
    0.25.0


    dev.langchain4j
    langchain4j-openai
    0.25.0




    org.apache.pdfbox
    pdfbox
    2.0.29

2. Read PDF Content

public String extractTextFromPdf(MultipartFile file) throws IOException {
    try (PDDocument document = PDDocument.load(file.getInputStream())) {
        PDFTextStripper stripper = new PDFTextStripper();
        return stripper.getText(document);
    }
}

3. Embed and Store Document

@Component
public class DocumentEmbedService {

    private final EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
        .apiKey("YOUR_API_KEY")
        .modelName("text-embedding-ada-002")
        .build();

    private final InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();

    public void index(String content) {
        TextSplitter splitter = new DocumentTextSplitter(500, 50);
        List<TextSegment> segments = splitter.split(content);

        List<Embedding> embeddings = embeddingModel.embedAll(segments).content();
        store.addAll(segments, embeddings);
    }

    public List<TextSegment> search(String question) {
        Embedding queryEmbedding = embeddingModel.embed(question).content();
        return store.findRelevant(queryEmbedding, 3); // top 3 matches
    }
}

4. Create Q&A Logic

@Component
public class FileQAService {

    private final DocumentEmbedService embedService;
    private final ChatLanguageModel model;

    public FileQAService(DocumentEmbedService embedService) {
        this.embedService = embedService;
        this.model = OpenAiChatModel.builder()
            .apiKey("YOUR_API_KEY")
            .modelName("gpt-3.5-turbo")
            .build();
    }

    public String answer(String question) {
        List<TextSegment> context = embedService.search(question);

        StringBuilder prompt = new StringBuilder("Context:\n");
        context.forEach(segment -> prompt.append(segment.text()).append("\n"));
        prompt.append("\nQuestion: ").append(question);

        return model.generate(prompt.toString());
    }
}

5. Create the REST Controller

@RestController
@RequestMapping("/api/docs")
public class FileQAController {

    private final DocumentEmbedService embedService;
    private final FileQAService fileQAService;

    public FileQAController(DocumentEmbedService embedService, FileQAService fileQAService) {
        this.embedService = embedService;
        this.fileQAService = fileQAService;
    }

    @PostMapping("/upload")
    public ResponseEntity<String> uploadPdf(@RequestParam("file") MultipartFile file) throws IOException {
        String content = extractTextFromPdf(file);
        embedService.index(content);
        return ResponseEntity.ok("File processed and indexed.");
    }

    @PostMapping("/ask")
    public ResponseEntity<String> ask(@RequestBody Map<String, String> body) {
        String question = body.get("question");
        return ResponseEntity.ok(fileQAService.answer(question));
    }
}

🧪 How to Use

  1. Upload a file:
curl -X POST -F '[email protected]' http://localhost:8080/api/docs/upload
  1. Ask a question:
curl -X POST http://localhost:8080/api/docs/ask \
-H "Content-Type: application/json" \
-d '{"question": "What is the main conclusion of the document?"}'

Boom. Your AI just read a document and answered your question.


📌 Bonus Ideas

  • Add vector DB support (ChromaDB, Pinecone, Qdrant)
  • Store PDF chunks in DB with metadata
  • Track history per user
  • Add web frontend to upload/ask

🔜 Coming in Part 5...

We’ll wrap the series with:

  • Multi-modal prompts (images + text)
  • Java-based tool integrations (calculators, web browsing, etc.)
  • LangChain agents + reasoning flows

🧡 Enjoying the series?

Follow, bookmark, and drop a comment! I’m building this in public, and your feedback helps guide the journey.

Until next time,

RF 👨‍💻