🧠 AI with Java & Spring Boot – Part 2: Streaming ChatGPT Responses

Hey again, devs! 👋

In Part 1 of this series, we built a text summarizer using Java, Spring Boot, and the OpenAI GPT API. If you haven’t checked that out yet, I recommend starting there.

Now, in Part 2, let’s level up and make our app more dynamic by streaming responses from the ChatGPT API — just like the real thing. 🔥


💡 What’s Streaming?

When using OpenAI’s API, instead of waiting for the full response, you can stream the output as it's generated. This is:

  • ⚡ Faster (you see results immediately)
  • 💬 More conversational
  • 🧠 Great for chatbot-style apps

Let’s make that happen in Java!


⚙️ What We'll Build

We’ll extend our Spring Boot app to:

  • Hit OpenAI’s API with stream=true
  • Read data chunk-by-chunk using Server-Sent Events (SSE)
  • Stream it to the client

🛠️ Step-by-Step Guide

1. Enable Streaming in OpenAI Request

Update your OpenAIService.java:

public Flux<String> streamChatResponse(String userPrompt) {
    WebClient webClient = WebClient.builder()
        .baseUrl("https://api.openai.com/v1/chat/completions")
        .defaultHeader(HttpHeaders.AUTHORIZATION, "Bearer " + apiKey)
        .build();

    Map<String, Object> message = Map.of(
        "role", "user",
        "content", userPrompt
    );

    Map<String, Object> requestBody = Map.of(
        "model", model,
        "messages", List.of(message),
        "stream", true
    );

    return webClient.post()
        .contentType(MediaType.APPLICATION_JSON)
        .bodyValue(requestBody)
        .retrieve()
        .bodyToFlux(String.class) // Handle as Server-Sent Events
        .flatMap(response -> {
            // parse out the delta content
            if (response.startsWith("data: ")) {
                response = response.substring(6);
            }
            if (response.trim().equals("[DONE]")) return Flux.empty();

            try {
                ObjectMapper mapper = new ObjectMapper();
                JsonNode json = mapper.readTree(response);
                return Flux.just(json.at("/choices/0/delta/content").asText());
            } catch (Exception e) {
                return Flux.empty();
            }
        });
}

2. Create Controller Endpoint for SSE

@RestController
@RequestMapping("/api/ai")
public class AIStreamController {

    private final OpenAIService openAIService;

    public AIStreamController(OpenAIService openAIService) {
        this.openAIService = openAIService;
    }

    @GetMapping(value = "/chat-stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> stream(@RequestParam String prompt) {
        return openAIService.streamChatResponse(prompt);
    }
}

3. Test with curl or JS (Client-Side)

curl:

curl http://localhost:8080/api/ai/chat-stream?prompt=Tell+me+a+joke

JavaScript (SSE):

const evtSource = new EventSource("/api/ai/chat-stream?prompt=Tell+me+a+joke");

evtSource.onmessage = function(event) {
    console.log("🧠", event.data);
    // Update DOM here
};

✅ Output

Once everything's up, you’ll get a streaming ChatGPT-style response, line-by-line. Much more responsive and realistic!


🔚 What’s Next?

In Part 3, we’ll explore:

  • Using LangChain4J to build AI agents in Java
  • Creating a memory-aware chat session
  • Maybe even file Q&A support with documents

💬 Thoughts?

💡 Suggestions for next topics?

Drop them in the comments below!