Goroutines in Go are famously lightweight — much more so than OS threads or traditional language-level threads like those in Java or C++. But what makes goroutines so efficient and scalable? The secret lies in Go’s runtime scheduler.
In this blog, we’ll explore the core concepts behind Go’s concurrency model by diving deep into the Go Scheduler — the engine that powers millions of goroutines behind the scenes. We’ll look at how it works, why it’s different from thread-based models, and how features like M:N scheduling, work-stealing, preemption, and network polling all work together to make Go a concurrency powerhouse.
🧵 Why Are Goroutines So Lightweight?
At a high level:
- 🪶 Goroutines start with very small stacks (as little as 2KB), which grow and shrink dynamically.
- 🔄 They are multiplexed onto a smaller set of OS threads, instead of 1:1 mapping.
- 🧠 The Go runtime manages scheduling — avoiding OS-level context-switching overheads.
This efficiency is made possible by the Go scheduler, a core part of the Go runtime.
⚙️ The M:N Scheduling Model
Go uses an M:N model to schedule goroutines:
- M (Machine): an OS thread
- P (Processor): a logical processor
- G (Goroutine): a lightweight task
📌 How it works:
- Each
P
holds a queue ofG
(goroutines). - A
P
must have anM
to execute the goroutines. - If a goroutine blocks (on I/O, syscall), its
M
detaches fromP
, and anotherM
is scheduled.
🔁 Lifecycle of a Goroutine
A goroutine can be in one of the following states:
State | Description |
---|---|
Runnable | ✅ Ready to run, waiting for a P
|
Running | 🚀 Actively executing on a P
|
Waiting | ⏳ Blocked on I/O, syscall, channel, etc. |
Dead | ⚰️ Finished execution |
🔍 The scheduler tracks these using:
- ✅ Local run queues per
P
- 🌍 A global run queue for overflow
- 🌐 Network pollers for I/O-bound goroutines
🔄 Work Stealing
To keep CPUs busy and avoid idling:
- Every
P
has its own local run queue. - If a
P
runs out of work, it steals goroutines from anotherP
.
💡 This strategy uses lock-free and atomic operations for efficiency and minimal overhead.
⏱️ Preemptive Scheduling
Go initially used cooperative scheduling (goroutines yielded at safe points like function calls).
🚀 From Go 1.14+, preemptive scheduling was introduced:
- The runtime injects asynchronous preemption signals.
- Prevents a goroutine from hogging the CPU.
- Increases fairness and responsiveness across the system.
🔧 Handling Syscalls and Blocking I/O
Go handles blocking calls smartly:
- When a goroutine makes a blocking syscall, its
M
is detached. - Another
M
is assigned to theP
so that other goroutines can continue running. - Once the blocking call completes, the goroutine re-enters the scheduling system.
🔋 This keeps the system non-blocking and highly scalable.
🌐 Network Polling
For efficient I/O, Go uses OS-specific mechanisms:
- 🐧
epoll
on Linux - 🍎
kqueue
on macOS/BSD - 🪟
IOCP
on Windows
A dedicated M (OS thread) watches for I/O readiness:
- 🛌 Sleeps until network events occur
- 🔔 Wakes the appropriate goroutines
- 🔄 Allows zero scheduler blocking
🚀 The Custom
Go Scheduler
func (s *Scheduler) RunMachine(m *M) {
m.running = true // assign a kernel thread
m.boundP = s.Ps[m.id % len(s.Ps)] // static round-robin
p := m.boundP
fmt.Printf("M[%d] BOUND to P[%d]\n", m.id, p.id)
var g *G // Assigned goroutine
for {
select {
case g = <-p.runQueue:
case g = <-s.globalQueue:
default:
// Work stealing from other Ps
for _, otherP := range s.Ps {
if otherP.id != p.id {
select {
case g = <-otherP.runQueue:
fmt.Printf("M[%d] STEALING FROM P[%d]\n", m.id, otherP.id)
goto EXEC
default:
}
}
}
// Network Polling
select {
case g = <-s.networkPoller:
fmt.Printf("M[%d] WOKE G[%d] NETWORK POLLER\n", m.id, g.id)
goto EXEC
default:
}
// Sleep to avoid busy waiting
time.Sleep(10 * time.Millisecond)
continue
}
EXEC:
g.state = "running"
fmt.Printf("[State] G[%d] state changed to RUNNING by M[%d]\n", g.id, m.id)
done := make(chan struct{})
go func() {
if rand.Intn(10) < 2 {
fmt.Printf("[SysCall] G[%d] performing BLOCKING syscall\n", g.id)
g.state = "blocked"
s.blockedG <- g
return
}
g.task()
close(done)
}()
select {
case <-done:
fmt.Printf("[State] G[%d] finished\n", g.id)
case <-time.After(100 * time.Millisecond):
fmt.Printf("[Preempt] G[%d] preempted\n", g.id)
g.state = "runnable"
p.runQueue <- g
}
}
}
🔧 RunMachine Explained
This function represents an OS thread (M) executing goroutines (G
) bound to a logical processor (P
). Here's the high-level idea:
- Each M is statically bound to a P using round-robin.
-
Goroutines are fetched from:
- The bound P’s run queue
- The global queue
- Other P’s queues (via work stealing)
- The network poller queue
- Tasks may be preempted or blocked (e.g., on syscall).
- Syscalls detach M from G, and reschedule work on the same P.
Output
M[0] BOUND to P[0]
M[1] BOUND to P[1]
M[2] BOUND to P[0]
M[2] WOKE G[10] NETWORK POLLER
[State] G[10] state changed to RUNNING by M[2]
[SysCall] G[10] performing BLOCKING syscall
[Preempt] G[10] preempted
[SyscallReturn]: G[10] returning from Syscall
M[0] WOKE G[11] NETWORK POLLER
[State] G[11] state changed to RUNNING by M[0]
[NetPoll]: Handling network Event
[State] G[11] finished
M[1] WOKE G[12] NETWORK POLLER
[State] G[12] state changed to RUNNING by M[1]
[NetPoll]: Handling network Event
[State] G[12] finished
🧾 Conclusion
Go’s scheduler is what allows it to scale thousands or even millions of goroutines with minimal resource usage.
Whether you're:
- Running concurrent web servers
- Managing network services
- Building real-time systems
Goroutines and the Go Scheduler ensure scalability, responsiveness, and developer simplicity.
If you liked this post, give it a ❤️ or 🦄, and leave your thoughts or questions below!
➡️ Code Repo: github.com/mery-top/GO-Scheduler
➡️ Proc.go Source Code: Go Runtime Scheduler Source