Modern ticketing platforms like BookMyShow and DISTRICT handle massive traffic spikes (100k+ users) while maintaining fairness and real-time updates. This guide explores how to architect such a system using WebSockets, Redis, and a scalable backend.

📌 Problem Statement & Requirements
When selling limited tickets for high-demand events (concerts, festivals), we need:

✔ Massive Scalability – Handle 100k+ concurrent users.
✔ Fair Queueing – First-come-first-served (FCFS) using a real-time queue.
✔ Resilience – Survive server crashes, network issues.
✔ Real-Time Updates – Users see their live position in the queue.
✔ Purchase Notifications – Alert users when it's their turn to buy.

🏗 System Architecture Overview
🔹 Core Components

  1. Client (Web/Mobile) – Connects via WebSocket for real-time updates.
  2. Load Balancer (NGINX, AWS ALB) – Distributes WebSocket connections.
  3. WebSocket Servers (Node.js, Go, etc.) – Maintain live connections.
  4. API Server (REST/GraphQL) – Handles ticket purchases.
  5. Redis – Manages queue (Sorted Set) and Pub/Sub for real-time sync.

Sequence Diagram - LLD
Let's understand the whole flow with the help of sequence diagram:

BookMyShow Ticketing Queue for ColdPlay Concert in India

Explaining the Sequence Diagram (Detailed Full Lifecycle Flow)
This sequence diagram represents a complete lifecycle of a user joining the queue, getting updates in real time, buying a ticket, and how the system maintains synchronization across multiple servers.

Let's break it down step by step:

👤 1. User Connects to Queue via WebSocket

  1. A user opens the app or website and initiates a WebSocket connection.
  2. The Load Balancer (e.g., NGINX, AWS ALB) routes this connection to one of the WebSocket servers (WS).

➕ 2. User is Added to the Queue in Redis

  1. The WebSocket server adds the user to the sorted set (ZADD) in Redis.
  2. Additionally, it stores a mapping of user to server (HSET socket_clients) for monitoring/debugging (optional).

✅ 3. User Receives Confirmation & Position

  1. Redis acknowledges the insert.
  2. The server calculates the user's rank in the queue (ZRANK) and sends the initial position to the user.

🔁 4. Periodic Heartbeat (Health Check)

  1. Client sends periodic ping messages to keep the connection alive.
  2. Server responds with pong. This prevents timeout and helps detect disconnections.

🎟️ 5. User Buys a Ticket via API

  1. User clicks “Buy Ticket”.
  2. The API (can be same as WS or separate) removes them from the queue using ZREM.
  3. Publishes an event via PUBLISH to the queue_update_channel in Redis.

📢 6. Redis Pub/Sub Broadcasts to All Servers

  1. All WebSocket servers are subscribed to queue_update_channel.
  2. They receive the published message (userA-left) and act accordingly.

📬 7. Servers Calculate and Send Updated Queue Positions

  1. After someone leaves the queue, everyone behind them needs a new position.
  2. Each WebSocket server queries Redis (ZRANK) for the latest positions of their connected users.
  3. Sends updates via WebSocket.

🧹 8. Cleanup After Leaving
Once the user leaves or buys the ticket, we remove their socket mapping (optional).

🤝 Server-to-Server Communication (Why This Works)

  1. Servers don’t talk to each other directly — instead:
  2. All servers are subscribers to the same Redis Pub/Sub channel.
  3. When a queue change happens (buy/leave), a message is published to Redis.
  4. Redis broadcasts it to all subscribers (all WS servers).
  5. Each server reacts

🚀 Benefits of This Architecture

  1. No tight coupling between servers.
  2. Highly scalable — just add more WS servers.
  3. Single source of truth (Redis).
  4. Can easily recover from failure (via reconnect + Redis state).

🔌 Deep Dive: Key Mechanisms
1️⃣ Redis Data Structure (Sorted Set - ZSET)

  1. ZADD concert_queue
  2. ZRANK concert_queue → Returns position (0 = first).
  3. ZREM concert_queue → Removes user after purchase.

✅ Why Sorted Set?

  1. O(log N) time complexity for inserts/removals.
  2. Atomic operations prevent race conditions.
  3. 2️⃣ WebSocket Flow (Real-Time Updates) Client-Side Connection
const socket = new ReconnectingWebSocket("wss://tickets.example.com/queue");

socket.onopen = () => {
  socket.send(JSON.stringify({ 
    type: "join_queue", 
    userId: "user123", 
    authToken: "Bearer xyz" 
  }));
};

socket.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === "position_update") {
    console.log(`Your position: ${data.position}`);
  }
};

Server-Side Handling

const WebSocketMap = new Map(); // userId → WebSocket

wss.on("connection", (socket) => {
  socket.on("message", async (msg) => {
    const { type, userId } = JSON.parse(msg);

    if (type === "join_queue") {
      WebSocketMap.set(userId, socket);
      const position = await redis.zrank("concert_queue", userId);
      socket.send(JSON.stringify({ type: "position_update", position }));
    }
  });
});

3️⃣ Pub/Sub for Cross-Server Sync
When a user buys a ticket (or leaves), we remove them from the queue and notify all servers:

// API Server (after ticket purchase)
await redis.zrem("concert_queue", userId);
await redis.publish("queue_updates", JSON.stringify({ 
  event: "user_left", 
  userId 
}));

// All WebSocket Servers (subscribed)
redis.subscribe("queue_updates");
redis.on("message", (channel, msg) => {
  const { event, userId } = JSON.parse(msg);
  if (event === "user_left") {
    // Recalculate & broadcast positions
    WebSocketMap.forEach(async (socket, userId) => {
      const position = await redis.zrank("concert_queue", userId);
      socket.send(JSON.stringify({ type: "position_update", position }));
    });
  }
});

Let's Understand it by going one step more deeper

🔌 WebSocket Flow (Client ↔️ Server)

  1. Client Connects to WebSocket
const socket = new WebSocket("wss://ticket.com/ws");
socket.send(JSON.stringify({ type: "join", userId: "userA" }));
  1. Server Handles Connection
wss.on("connection", (socket) => {
  socket.on("message", async (msg) => {
    const data = JSON.parse(msg);
    if (data.type === "join") {
      WebSocketMap.set(data.userId, socket);
      const position = await redis.zrank("concert_queue", data.userId);
      socket.send(JSON.stringify({ type: "queue_update", position }));
    }
  });
});
  1. Queue Position Update (Purchase or Leave)
// Example: userB buys ticket
await redis.zrem("concert_queue", "userB");

// Publish event
redis.publish("queue_update_channel", JSON.stringify({ userId: "userB" }));
  1. Pub/Sub Broadcasting Every server listens:
redis.subscribe("queue_update_channel");

redis.on("message", async (channel, msg) => {
  const data = JSON.parse(msg);
  // broadcast to all WebSocket clients
  WebSocketMap.forEach(async (socket, userId) => {
    const position = await redis.zrank("concert_queue", userId);
    socket.send(JSON.stringify({ type: "queue_update", position }));
  });
});

🧩 Handling Edge Cases
🔴 1. WebSocket Server Crashes
Problem: In-memory WebSocketMap is lost.

Solution:

  1. Clients auto-reconnect using ReconnectingWebSocket.
  2. On reconnect, server re-fetches position from Redis.

🔴 2. Client Disconnects (Network Issues)
Solution:

  1. Heartbeat checks (ping/pong).
  2. If no response for 30s, remove from queue:
await redis.zrem("concert_queue", userId);
redis.publish("queue_updates", JSON.stringify({ userId }));

🔴 3. Redis High Availability
Solution:

  1. Use Redis Cluster + Persistent Storage.
  2. Fallback to database-backed queue if Redis fails.

⚡ Scaling Strategies
Component Scaling Approach

  1. WebSocket Servers -> Horizontal scaling + sticky sessions.
  2. Redis -> Sharding (if queue exceeds memory).
  3. API Servers -> Stateless, auto-scaling (Kubernetes).

✅ Final Architecture - FlowChart Diagram

FlowChart for demonstrating Scalable ticket queue artitechture

🚀 Conclusion
This architecture ensures:

✔ Real-time queue updates via WebSockets.
✔ Scalability with Redis + horizontal scaling.
✔ Resilience against crashes and disconnects.

Used by: DICE, DISTRICT, Ticketmaster’s virtual queues.

*What are Next Steps - *

  1. Add rate limiting to prevent abuse.
  2. Implement priority queues (e.g., premium users).
  3. Use Kubernetes for auto-scaling.

This is how we can make scalable, reliable and maintable real time artitechtures where we can handle millions of traffic!

Read this far ? hope you liked the article 😊 Yes ? Give the thumps Up! and leave your feedback. It gives me motivation to write more and share with the community!

LinkedIn: https://www.linkedin.com/in/gauravsingh9356/