"Why scaling startups need more than just lean practices to survive and thrive."
Scaling a tech organization in the financial industry, particularly in sensitive domains like stocks and crypto, introduces unique challenges. What works for a 10-person startup no longer holds when the team grows to 80+ engineers. While lean principles foster agility early on, structure and governance become critical for sustainable growth.
This article draws heavily from the Team Topologies framework by Matthew Skelton and Manuel Pais, supported by Cognitive Load Theory, and validated by real-world scaling practices from Amazon, Spotify, and Google. Together, these references form the academic and practical foundation for our approach.
We explore the typical scaling pains, diagnose the root causes behind them, and outline a step-by-step guide to solving these issues.
Startup Growth Pains: From Lean Beginnings to Structured Necessity
You are no longer a "startup" at 80 engineers. You are a mid-sized tech company.
In the early stages, small teams rely on flexibility: informal communication, rapid decisions, and blurred responsibilities. However, as the team size grows, these strengths become liabilities:
- Delivery slows due to coordination overhead.
- Infrastructure strains under increasing demand.
- Internal politics and confusion rise.
Lean practices help small teams move fast, but beyond a certain scale, intentional structure must complement speed. Without it, chaos, instability, and organizational mistrust set in.
Team Size and Scaling Needs
Team Size | Typical State | Scaling Requirement |
---|---|---|
1-10 engineers | Chaos is acceptable | Maximize flexibility and exploration |
10-30 engineers | Growing pains start | Light processes, early team ownership |
30-80 engineers | Structured chaos | Formalize team types, start Platform teams |
80-150 engineers | Scaling complexity | Introduce IDP, enforce clear boundaries, governance |
150+ engineers | Large-scale organization | Split into Tribes, strong Platform engineering culture |
Our example of 80 engineers places us firmly in the "Structured chaos" phase, where building an Internal Developer Platform and enforcing clear team structures becomes mandatory.
big tech and academic papers have hit exactly the same problem and evolved similar solutions.
1. Amazon (early 2000s) — “You build it, you run it” with platform guardrails
Problem:
Amazon was scaling fast. Developers needed to move fast but infra/security couldn’t let them “touch everything.”
Solution:
- Split into 2-pizza teams (small, independent Stream-aligned Teams).
- Mandatory self-service platforms (deployment, logging, monitoring).
- No manual infra work: Developers use platforms built by platform teams.
- Teams own their service end-to-end within predefined guardrails.
Key Quote:
“You build it, you run it. But you run it inside the constraints provided by the central platform.”
Impact:
Allowed Amazon to scale to thousands of services without losing security, compliance, or control.
2. Spotify (2012–2014) — “Squads, Tribes, Chapters, Guilds” model
Problem:
Growing fast, too much friction between teams.
Solution:
- Squads: Stream-aligned teams (own one part of the product).
- Chapters: Shared function across squads (e.g., Infra Chapter).
- Guilds: Loose, voluntary knowledge sharing (e.g., Security Guild).
- Platform Teams: Build enabling platforms, not manual ops.
Special Rule:
- Infra teams acted as Internal Service Providers.
- Developers self-serve infra through APIs, not by asking infra engineers.
Academic Reference:
Spotify Engineering Culture, by Henrik Kniberg (official document, referenced globally).
3. Google SRE Model — “Error Budgets” and strict production control
Problem:
At Google scale, random developer changes = massive risks.
Solution:
- Developers are responsible for code and minor ops.
- SREs own production environment stability.
- Error Budgets: Developers are allowed to break things within acceptable limits. If errors spike, devs lose the right to deploy until fixed.
Quote from Google SRE Book:
“Letting developers deploy freely without accountability is a path to ruin.”
Impact:
- Developers move fast but inside a mathematically defined safety zone.
- SREs protect core infra and enforce reliability.
4. Academic Reference: “Cognitive Load Theory for Software Teams”
(Skelton, Pais, 2019 — same guys as Team Topologies, published academically)
Thesis:
- Developers cannot own too many unrelated concerns at once.
- Infra must be productized into easy-to-use platforms.
- Team boundaries must be designed to optimize flow and minimize handoffs.
Their research shows that high cognitive load (devs doing dev + infra + security manually) = slower delivery, higher burnout, and higher incident rate.
Diagnosing the Core Problems
The problems faced by growing startups often trace back to fundamental organizational issues:
Organizational Maturity Mismatch
Small team behaviors persist even as the organization demands more maturity. Teams lack clear boundaries, and developers are expected to juggle responsibilities across development, infrastructure, operations, and security.
Cognitive Load Overload
Cognitive Load Theory teaches that individuals and teams can only handle a limited amount of complexity effectively. When teams handle too many unrelated domains, delivery becomes error-prone and slow.
Tech Politics: Erosion of Trust
Opaque decision-making processes create mistrust. Engineers begin competing for resources and priorities in an unhealthy way, leading to favoritism and internal alliances.
Root Cause: All these issues stem from the absence of deliberate team structures and communication models, a concept central to Team Topologies.
Principles for Scaling Successfully
Drawing directly from Team Topologies and Cognitive Load Theory, organizations must adopt three core principles:
1. Design Clear Team Boundaries
Team Topologies prescribes explicit team types to reduce cognitive load and improve flow:
- Stream-aligned Teams: Build and run product features end-to-end.
- Platform Teams: Create internal platforms that other teams consume.
- Enabling Teams: Help other teams build missing capabilities.
- Complicated Subsystem Teams: Handle highly specialized areas that require deep expertise.
Amazon demonstrates this with their internal platform systems. Developers own services completely but operate within strict platform guardrails, minimizing unnecessary complexity.
2. Build Systems of Trust
To reduce political behavior, decision-making must be transparent and predictable:
- RFCs (Request for Comments): Publicly document and discuss major technical decisions.
- Open Architecture Boards: Ensure that decisions are made based on merit, not hierarchy.
- Public OKRs: Make team goals visible and measurable.
Spotify applied these principles with their Squads, Tribes, Chapters, and Guilds model. Squads operated independently but within a framework that encouraged transparency and cross-team collaboration.
3. Empower Developers Inside Guardrails
Developers should have autonomy but within safe, automated boundaries:
- Infrastructure must be self-service.
- Access must be controlled and audited.
- Guardrails must automate security and compliance.
Google practices this balance through their Site Reliability Engineering (SRE) model. Developers own their services, but SREs enforce reliability through Error Budgets, aligning freedom with operational excellence.
If you allow full infra access "for speed" today, you borrow time against massive technical debt and existential risk later.
Allowing unrestricted access for the sake of moving fast might provide short-term gains, but it compromises the long-term stability of the organization. Technical debt accumulates invisibly, security vulnerabilities grow unnoticed, and incident recovery becomes slower. In regulated industries like finance and crypto, these risks aren't just technical — they are existential.
Building robust guardrails through an Internal Developer Platform protects the organization without throttling developer productivity.
Step-by-Step Solution
Step 1: Define Proper Team Structures
Clearly establish team types:
- Stream-aligned Teams own and deliver complete product features.
- Platform Teams abstract and simplify complex infrastructure needs.
- Enabling Teams improve capability without owning delivery.
- Complicated Subsystem Teams manage specialized technical areas.
Each team has a distinct mission, reducing overlap and conflict. This structure directly follows Team Topologies principles.
Step 2: Define Developer and Infra Responsibilities
Responsibilities must be split clearly:
- Developers own application code, deployment pipelines, and monitoring.
- Infra teams provide secured, templatized pipelines, observability tools, and enforced security policies.
This division helps manage cognitive load and supports faster, safer delivery.
Step 3: Introduce RFCs for Major Changes
Every significant architectural or infrastructural change must go through an RFC process:
- Written proposals are discussed openly.
- Decisions are transparent and based on technical merit.
This process builds organizational memory and eliminates backchannel decision-making, reinforcing trust systems.
Step 4: Leadership Rituals to Maintain Trust
Leadership must reinforce trust continuously:
- Weekly Leads Meetings ensure alignment.
- Public OKRs make priorities clear.
- Rotating Architecture Review Boards distribute authority and expertise fairly.
These rituals align with building transparent, predictable decision-making systems.
Step 5: Build an Internal Developer Platform (IDP)
An Internal Developer Platform provides the foundation for developer autonomy without sacrificing safety. It must include:
- Infrastructure as Code: Tools like Terraform and Pulumi to create pre-approved, self-service modules.
- GitOps Deployments: Tools like ArgoCD or FluxCD automate deployment through Git workflows.
- Self-Service Portal: Platforms like Backstage allow developers to launch services, view documentation, and manage their environments easily.
- Secrets Management: Vault or AWS Secrets Manager centralizes secret handling and improves security.
- Observability: Prometheus and Grafana provide monitoring and alerting out-of-the-box.
- Incident Management: Slack integrations with Alertmanager or tools like PagerDuty enable professional on-call rotations.
Minimal Viable Stack Recommendation:
Category | Tool Choices |
---|---|
Infrastructure | Terraform + Atlantis |
Deployments | GitHub Actions + ArgoCD |
Portal | Backstage |
Secrets | Vault or AWS Secrets Manager |
Observability | Prometheus + Grafana |
Incident Management | Slack + Alertmanager or PagerDuty |
Building this platform aligns with Team Topologies' goal of enabling fast, secure, and independent delivery.
Conclusion
The desire for developers to "own everything end-to-end" is natural but risky when scaling in regulated industries. True ownership must happen within well-designed systems that balance speed, safety, and organizational trust.
The principles presented here are deeply grounded in the Team Topologies framework and Cognitive Load Theory, and validated by real-world practices at Amazon, Spotify, and Google. These references provide a solid foundation for any growing tech organization to scale successfully.
By applying structured team models, building internal platforms, and fostering trust through transparent processes, financial tech startups can achieve sustainable, scalable growth without chaos.
Build systems, not heroes. Move fast, but move safely.
Appendix: References
- Team Topologies by Matthew Skelton and Manuel Pais
- Cognitive Load Theory in Software Engineering
- Google SRE Book
- Spotify Engineering Culture (Henrik Kniberg)
- Amazon Leadership Principles ("You build it, you run it")
- Ruth Malan: Thoughts on Systems and Architecture