Back-of-the-Envelope Thinking for Scalable System Design

Have you ever been assigned a project where you designed an architecture using all the latest state-of-the-art tools — sharded databases, message queues, event buses, and more? At first glance, the architecture looks impressive. It sounds cool. But does it really solve the core problem you're facing?

Even if your CTO gives you the green light to build it, can you be sure the system will perform as expected? Once you start implementing it, doubt often creeps in. You begin questioning the performance, wondering how to validate your assumptions.

One simple but powerful method to validate your system design early is through back-of-the-envelope calculations. It helps you estimate, reason, and catch potential issues long before they become expensive mistakes.

Back-of-the-envelope calculations will help you create estimations using a combination of thought experiments and common performance numbers to get a good feel for which designs will meet your requirements.

🧮 Operation Latency Table

No	Original Data	Activity Category	Component Category	Time (ns)	Time (ms)	Time (min)	Time (hr)
1	L1 cache reference	read	cache	0.5	0.0000005	0.00000000000833	0.00000000000139
2	Branch mispredict	misc	cpu	5	0.000005	0.00000000008333	0.00000000001389
3	L2 cache reference	read	cache	7	0.000007	0.00000000011667	0.00000000001944
4	Mutex lock/unlock	sync	cpu	100	0.0001	0.00000000166667	0.00000000002778
5	Main memory reference	read	memory	100	0.0001	0.00000000166667	0.00000000002778
6	Compress 1K bytes with Zippy	compute	cpu	10000	0.01	0.000000166667	0.000000002778
7	Send 2K bytes over 1 Gbps network	write	network	20000	0.02	0.000000333333	0.000000005556
8	Read 1 MB sequentially from memory	read	memory	250000	0.25	0.000004166667	0.000000069444
9	Round trip within same datacenter	network	network	500000	0.5	0.000008333333	0.000000138889
10	Disk seek	read	disk	10000000	10	0.000166667	0.000002778
11	Read 1 MB sequentially from network	read	network	10000000	10	0.000166667	0.000002778
12	Read 1 MB sequentially from disk	read	disk	30000000	30	0.0005	0.000008333
13	Send packet CA→Netherlands→CA	network	network	150000000	150	0.0025	0.000041667

💡 The Lessons

Writes are 40 times more expensive than reads.
- Frequent writes/updates will have high contention.
- To scale writes, you need to partition, and once you do that, it becomes difficult to maintain shared state like counters.
Global shared data is expensive.
- This is a fundamental limitation of distributed systems.
- Lock contention on heavily written shared objects kills performance as transactions become serialized and slow.
Architect for scaling writes.
Optimize for low write contention.
Optimize wide. Make writes as parallel as you can.

🔥 Writes Are Expensive!

Datastores are transactional: writes require disk access.
Disk access means disk seeks.
🧠 Rule of thumb:

1 disk seek = ~10ms
→ 1s / 10ms = 100 seeks/second (max per disk)

Throughput depends on:

The size and shape of your data
Doing work in batches (batch puts/gets)

⚡ Reads Are Cheap!

Reads don’t have to be transactional — just consistent.
After the first disk load, data is cached in memory.
Subsequent reads are super fast.
🧠 Rule of thumb:

Read 1MB from memory ≈ 250μs
→ 1s / 250μs = 4GB/sec
→ For 1MB entities: 4000 fetches/sec

🧪 Example: Generate Image Results Page of 30 Thumbnails

❌ Design 1 – Serial

Read images one-by-one:
Each image = disk seek + read 256KB at 30MB/s
Calculation:

30 seeks × 10ms = 300ms
30 × (256KB / 30MB/s) = 250ms
→ Total: 300 + 250 = 550ms

✅ Design 2 – Parallel

Issue reads in parallel.
Calculation:

1 seek = 10ms
Read 256KB / 30MBps ≈ 8.5ms
→ Total: 10 + 8.5 = ~18.5ms
- Expect variance in real world: ~30–60ms range

🧠 Simplified Mental Models

Insight	What It Means (Simplified)
💾 Disk is super slow	Like walking to the garage. You don’t want to do this often.
🧠 RAM is much faster than disk	Like grabbing from your desk instead of walking to the cabinet.
⚡ CPU is rarely the bottleneck	Your processor is fast. If your system is slow, it’s not the CPU’s fault.
🔁 Cache is insanely fast	Think of L1/L2 cache like stuff in your pocket — instant access.
🌐 Network trips are expensive	Talking to another datacenter is like mailing a letter to Europe. Avoid it.
🔃 Batching is your friend	Instead of reading 1 comment at a time, grab 100 at once.
🧵 Avoid shared locks	Waiting for someone to unlock the bathroom wastes time.
📦 Design for locality	Keep data close to where it’s processed — like keeping your tools nearby.

"Cache beats RAM. RAM beats disk. Disk is lava. Network is long-distance love."

🧠 Conclusion

Back-of-the-envelope calculations won’t give you perfect answers — but they give you fast and estimations answers. That’s often all you need to:

Avoid wasteful engineering
Identify bottlenecks early
Make sound architecture decisions without building the wrong thing first

Before building that real-time dashboard or scaling out another microservice, ask yourself:

“Did I run the numbers? Even roughly?”

You might just save yourself days of debugging.

📚 References

Google Pro Tip: Back-of-the-Envelope Calculations

Back-of-the-Envelope Thinking for Scalable System Design

🧮 Operation Latency Table

💡 The Lessons

🔥 Writes Are Expensive!

⚡ Reads Are Cheap!

🧪 Example: Generate Image Results Page of 30 Thumbnails

❌ Design 1 – Serial

✅ Design 2 – Parallel

🧠 Simplified Mental Models

🧠 Conclusion

📚 References

Comments (0)

Read More

#reading

#popular

Back-of-the-Envelope Thinking for Scalable System Design

🧮 Operation Latency Table

💡 The Lessons

🔥 Writes Are Expensive!

⚡ Reads Are Cheap!

🧪 Example: Generate Image Results Page of 30 Thumbnails

❌ Design 1 – Serial

✅ Design 2 – Parallel

🧠 Simplified Mental Models

🧠 Conclusion

📚 References

Comments (0)

Read More

A Comprehensive Guide to Backend Development

Backend Development Demystified: From Structure to Real-Life APIs

Why Modern Languages Are Ditching the Ternary Operator

You're Using HTTP Status Codes Wrong — Here's the solution

#reading

#popular