Have you ever been assigned a project where you designed an architecture using all the latest state-of-the-art tools — sharded databases, message queues, event buses, and more? At first glance, the architecture looks impressive. It sounds cool. But does it really solve the core problem you're facing?
Even if your CTO gives you the green light to build it, can you be sure the system will perform as expected? Once you start implementing it, doubt often creeps in. You begin questioning the performance, wondering how to validate your assumptions.
One simple but powerful method to validate your system design early is through back-of-the-envelope calculations. It helps you estimate, reason, and catch potential issues long before they become expensive mistakes.
Back-of-the-envelope calculations will help you create estimations using a combination of thought experiments and common performance numbers to get a good feel for which designs will meet your requirements.
🧮 Operation Latency Table
No | Original Data | Activity Category | Component Category | Time (ns) | Time (ms) | Time (min) | Time (hr) |
---|---|---|---|---|---|---|---|
1 | L1 cache reference | read | cache | 0.5 | 0.0000005 | 0.00000000000833 | 0.00000000000139 |
2 | Branch mispredict | misc | cpu | 5 | 0.000005 | 0.00000000008333 | 0.00000000001389 |
3 | L2 cache reference | read | cache | 7 | 0.000007 | 0.00000000011667 | 0.00000000001944 |
4 | Mutex lock/unlock | sync | cpu | 100 | 0.0001 | 0.00000000166667 | 0.00000000002778 |
5 | Main memory reference | read | memory | 100 | 0.0001 | 0.00000000166667 | 0.00000000002778 |
6 | Compress 1K bytes with Zippy | compute | cpu | 10000 | 0.01 | 0.000000166667 | 0.000000002778 |
7 | Send 2K bytes over 1 Gbps network | write | network | 20000 | 0.02 | 0.000000333333 | 0.000000005556 |
8 | Read 1 MB sequentially from memory | read | memory | 250000 | 0.25 | 0.000004166667 | 0.000000069444 |
9 | Round trip within same datacenter | network | network | 500000 | 0.5 | 0.000008333333 | 0.000000138889 |
10 | Disk seek | read | disk | 10000000 | 10 | 0.000166667 | 0.000002778 |
11 | Read 1 MB sequentially from network | read | network | 10000000 | 10 | 0.000166667 | 0.000002778 |
12 | Read 1 MB sequentially from disk | read | disk | 30000000 | 30 | 0.0005 | 0.000008333 |
13 | Send packet CA→Netherlands→CA | network | network | 150000000 | 150 | 0.0025 | 0.000041667 |
💡 The Lessons
-
Writes are 40 times more expensive than reads.
- Frequent writes/updates will have high contention.
- To scale writes, you need to partition, and once you do that, it becomes difficult to maintain shared state like counters.
-
Global shared data is expensive.
- This is a fundamental limitation of distributed systems.
- Lock contention on heavily written shared objects kills performance as transactions become serialized and slow.
- Architect for scaling writes.
- Optimize for low write contention.
- Optimize wide. Make writes as parallel as you can.
🔥 Writes Are Expensive!
- Datastores are transactional: writes require disk access.
- Disk access means disk seeks.
- 🧠 Rule of thumb:
1 disk seek = ~10ms
→ 1s / 10ms = 100 seeks/second (max per disk)
Throughput depends on:
- The size and shape of your data
- Doing work in batches (batch puts/gets)
⚡ Reads Are Cheap!
- Reads don’t have to be transactional — just consistent.
- After the first disk load, data is cached in memory.
- Subsequent reads are super fast.
- 🧠 Rule of thumb:
Read 1MB from memory ≈ 250μs
→ 1s / 250μs = 4GB/sec
→ For 1MB entities: 4000 fetches/sec
🧪 Example: Generate Image Results Page of 30 Thumbnails
❌ Design 1 – Serial
- Read images one-by-one:
- Each image = disk seek + read 256KB at 30MB/s
- Calculation:
30 seeks × 10ms = 300ms
30 × (256KB / 30MB/s) = 250ms
→ Total: 300 + 250 = 550ms
✅ Design 2 – Parallel
- Issue reads in parallel.
- Calculation:
1 seek = 10ms
Read 256KB / 30MBps ≈ 8.5ms
→ Total: 10 + 8.5 = ~18.5ms
- Expect variance in real world: ~30–60ms range
🧠 Simplified Mental Models
Insight | What It Means (Simplified) |
---|---|
💾 Disk is super slow | Like walking to the garage. You don’t want to do this often. |
🧠 RAM is much faster than disk | Like grabbing from your desk instead of walking to the cabinet. |
⚡ CPU is rarely the bottleneck | Your processor is fast. If your system is slow, it’s not the CPU’s fault. |
🔁 Cache is insanely fast | Think of L1/L2 cache like stuff in your pocket — instant access. |
🌐 Network trips are expensive | Talking to another datacenter is like mailing a letter to Europe. Avoid it. |
🔃 Batching is your friend | Instead of reading 1 comment at a time, grab 100 at once. |
🧵 Avoid shared locks | Waiting for someone to unlock the bathroom wastes time. |
📦 Design for locality | Keep data close to where it’s processed — like keeping your tools nearby. |
"Cache beats RAM. RAM beats disk. Disk is lava. Network is long-distance love."
🧠 Conclusion
Back-of-the-envelope calculations won’t give you perfect answers — but they give you fast and estimations answers. That’s often all you need to:
- Avoid wasteful engineering
- Identify bottlenecks early
- Make sound architecture decisions without building the wrong thing first
Before building that real-time dashboard or scaling out another microservice, ask yourself:
“Did I run the numbers? Even roughly?”
You might just save yourself days of debugging.