Go-Scheduler: Understanding Why Goroutines Are So Lightweight

Goroutines in Go are famously lightweight — much more so than OS threads or traditional language-level threads like those in Java or C++. But what makes goroutines so efficient and scalable? The secret lies in Go’s runtime scheduler.

In this blog, we’ll explore the core concepts behind Go’s concurrency model by diving deep into the Go Scheduler — the engine that powers millions of goroutines behind the scenes. We’ll look at how it works, why it’s different from thread-based models, and how features like M:N scheduling, work-stealing, preemption, and network polling all work together to make Go a concurrency powerhouse.

🧵 Why Are Goroutines So Lightweight?

At a high level:

🪶 Goroutines start with very small stacks (as little as 2KB), which grow and shrink dynamically.
🔄 They are multiplexed onto a smaller set of OS threads, instead of 1:1 mapping.
🧠 The Go runtime manages scheduling — avoiding OS-level context-switching overheads.

This efficiency is made possible by the Go scheduler, a core part of the Go runtime.

⚙️ The M:N Scheduling Model

Go uses an M:N model to schedule goroutines:

M (Machine): an OS thread
P (Processor): a logical processor
G (Goroutine): a lightweight task

📌 How it works:

Each P holds a queue of G (goroutines).
A P must have an M to execute the goroutines.
If a goroutine blocks (on I/O, syscall), its M detaches from P, and another M is scheduled.

Image description

🔁 Lifecycle of a Goroutine

A goroutine can be in one of the following states:

State	Description
Runnable	✅ Ready to run, waiting for a `P`
Running	🚀 Actively executing on a `P`
Waiting	⏳ Blocked on I/O, syscall, channel, etc.
Dead	⚰️ Finished execution

🔍 The scheduler tracks these using:

✅ Local run queues per P
🌍 A global run queue for overflow
🌐 Network pollers for I/O-bound goroutines

🔄 Work Stealing

To keep CPUs busy and avoid idling:

Every P has its own local run queue.
If a P runs out of work, it steals goroutines from another P.

💡 This strategy uses lock-free and atomic operations for efficiency and minimal overhead.

⏱️ Preemptive Scheduling

Go initially used cooperative scheduling (goroutines yielded at safe points like function calls).

🚀 From Go 1.14+, preemptive scheduling was introduced:

The runtime injects asynchronous preemption signals.
Prevents a goroutine from hogging the CPU.
Increases fairness and responsiveness across the system.

🔧 Handling Syscalls and Blocking I/O

Go handles blocking calls smartly:

When a goroutine makes a blocking syscall, its M is detached.
Another M is assigned to the P so that other goroutines can continue running.
Once the blocking call completes, the goroutine re-enters the scheduling system.

🔋 This keeps the system non-blocking and highly scalable.

🌐 Network Polling

For efficient I/O, Go uses OS-specific mechanisms:

🐧 epoll on Linux
🍎 kqueue on macOS/BSD
🪟 IOCP on Windows

A dedicated M (OS thread) watches for I/O readiness:

🛌 Sleeps until network events occur
🔔 Wakes the appropriate goroutines
🔄 Allows zero scheduler blocking

🚀 The `Custom`Go Scheduler

func (s *Scheduler) RunMachine(m *M) {
    m.running = true // assign a kernel thread
    m.boundP = s.Ps[m.id % len(s.Ps)] // static round-robin
    p := m.boundP

    fmt.Printf("M[%d] BOUND to P[%d]\n", m.id, p.id)

    var g *G // Assigned goroutine

    for {
        select {
        case g = <-p.runQueue:
        case g = <-s.globalQueue:
        default:
            // Work stealing from other Ps
            for _, otherP := range s.Ps {
                if otherP.id != p.id {
                    select {
                    case g = <-otherP.runQueue:
                        fmt.Printf("M[%d] STEALING FROM P[%d]\n", m.id, otherP.id)
                        goto EXEC
                    default:
                    }
                }
            }

            // Network Polling
            select {
            case g = <-s.networkPoller:
                fmt.Printf("M[%d] WOKE G[%d] NETWORK POLLER\n", m.id, g.id)
                goto EXEC
            default:
            }

            // Sleep to avoid busy waiting
            time.Sleep(10 * time.Millisecond)
            continue
        }

    EXEC:
        g.state = "running"
        fmt.Printf("[State] G[%d] state changed to RUNNING by M[%d]\n", g.id, m.id)

        done := make(chan struct{})

        go func() {
            if rand.Intn(10) < 2 {
                fmt.Printf("[SysCall] G[%d] performing BLOCKING syscall\n", g.id)
                g.state = "blocked"
                s.blockedG <- g
                return
            }
            g.task()
            close(done)
        }()

        select {
        case <-done:
            fmt.Printf("[State] G[%d] finished\n", g.id)
        case <-time.After(100 * time.Millisecond):
            fmt.Printf("[Preempt] G[%d] preempted\n", g.id)
            g.state = "runnable"
            p.runQueue <- g
        }
    }
}

🔧 RunMachine Explained

This function represents an OS thread (M) executing goroutines (G) bound to a logical processor (P). Here's the high-level idea:

Each M is statically bound to a P using round-robin.
Goroutines are fetched from:
- The bound P’s run queue
- The global queue
- Other P’s queues (via work stealing)
- The network poller queue
Tasks may be preempted or blocked (e.g., on syscall).
Syscalls detach M from G, and reschedule work on the same P.

Output

M[0] BOUND to P[0]
M[1] BOUND to P[1]
M[2] BOUND to P[0]
M[2] WOKE G[10] NETWORK POLLER
[State] G[10] state changed to RUNNING by M[2]
[SysCall] G[10] performing BLOCKING syscall
[Preempt] G[10] preempted
[SyscallReturn]: G[10] returning from Syscall

M[0] WOKE G[11] NETWORK POLLER
[State] G[11] state changed to RUNNING by M[0]
[NetPoll]: Handling network Event
[State] G[11] finished

M[1] WOKE G[12] NETWORK POLLER
[State] G[12] state changed to RUNNING by M[1]
[NetPoll]: Handling network Event
[State] G[12] finished

🧾 Conclusion

Go’s scheduler is what allows it to scale thousands or even millions of goroutines with minimal resource usage.

Whether you're:

Running concurrent web servers
Managing network services
Building real-time systems

Goroutines and the Go Scheduler ensure scalability, responsiveness, and developer simplicity.

If you liked this post, give it a ❤️ or 🦄, and leave your thoughts or questions below!

➡️ Code Repo: github.com/mery-top/GO-Scheduler

➡️ Proc.go Source Code: Go Runtime Scheduler Source

Go-Scheduler: Understanding Why Goroutines Are So Lightweight

🧵 Why Are Goroutines So Lightweight?

⚙️ The M:N Scheduling Model

🔁 Lifecycle of a Goroutine

🔄 Work Stealing

⏱️ Preemptive Scheduling

🔧 Handling Syscalls and Blocking I/O

🌐 Network Polling

🚀 The `Custom`Go Scheduler

🔧 RunMachine Explained

Output

🧾 Conclusion

Comments (0)

Read More

#reading

#popular

Go-Scheduler: Understanding Why Goroutines Are So Lightweight

🧵 Why Are Goroutines So Lightweight?

⚙️ The M:N Scheduling Model

🔁 Lifecycle of a Goroutine

🔄 Work Stealing

⏱️ Preemptive Scheduling

🔧 Handling Syscalls and Blocking I/O

🌐 Network Polling

🚀 The CustomGo Scheduler

🔧 RunMachine Explained

Output

🧾 Conclusion

Comments (0)

Read More

Go vs C++ 变量生存周期对比

TWCT-T-D55: The Iron Throne of Current Sensing

What Are the Differences Between http.Handle and http.HandleFunc in Go?

Why Don't Middleware Attributes Appear in Derived Loggers?

#reading

#popular

🚀 The `Custom`Go Scheduler