Introduction
Hey builders! π
So I recently took on an exciting challenge: transforming a full-stack FastAPI and React application into a production-ready system with robust monitoring. While the application itself was already well-structured, my role was to bring it to production standards through proper containerization, orchestration, and monitoring/observability. Let me walk you through this journey.
Want to see the complete code? Check out the GitHub repository.
π Prerequisites
Some things you'll need to have for this project:
Docker and Docker Compose installed. You can check the official documentation on how to install them on ubuntu.
A domain name (for SSL/TLS setup). I recommend getting from Hostinger
Basic understanding of:
Docker containers and images
Reverse proxies
Monitoring concepts
Linux command line
- Sufficient system resources:
- At least 4GB RAM
- 2 CPU cores
- 20GB storage
Cloned the github repo containing the frontend, backend, db .
Firewall configured to allow necessary ports (22, 3000, 9090, etc.)
The Challenge π―
When I first looked at the project, I saw a typical full-stack application with:
- A FastAPI backend
- A React frontend
- PostgreSQL database
- Basic authentication
My mission? Transform this into a production-grade system with:
- Proper containerization
- Automated SSL/TLS
- Comprehensive monitoring
- Efficient log management
- Zero-downtime deployments
The Solution Architecture
I designed a modern DevOps architecture that looks like this (if you've been following my articles recently, you'll know i like this types of diagrams now π
):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Requests β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββΌββββββββ
β Traefik β
β Reverse Proxyβ
β (SSL/TLS) β
βββββββββ¬ββββββββ
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
β β β
βββββββββΌββββββββ βββββββββΌββββββββ βββββββββΌββββββββ
β Frontend β β Backend β β Adminer β
β (React/Nginx)ββββββββΊβ (FastAPI) ββββββββΊβ (DB Admin) β
βββββββββ¬ββββββββ βββββββββ¬ββββββββ βββββββββ¬ββββββββ
β β β
β β β
β ββββββββΌββββββββ β
βββββββββββββββββΊβ PostgreSQL βββββββββββββββββ
β Database β
ββββββββ¬ββββββββ
β
ββββββββΌββββββββ
β Monitoring β
β Stack β
ββββββββ¬ββββββββ
β
βββββββββββββββ¬ββββββββββββββΌββββββββββββββ¬βββββββββββββ
β β β β β
βββββΌββββββ βββββΌββββββ βββββΌββββ ββββββΌββββββ βββββΌβββββββ
βcAdvisor β βPromtail β βLoki β βPrometheusβ βGrafana β
βContainerβ βLogs β βLog β βMetrics β βDashboardsβ
βMetrics β βCollectorβ βStorageβ βDatabase β β& Alerts β
βββββββββββ βββββββββββ βββββββββ ββββββββββββ ββββββββββββ
Application in Action
Containerization Strategy: Building Efficient Images π³
Multi-stage Builds: Optimizing Image Size
So let's look at how I achieved proper containerization.
One of the first challenges I faced was keeping the Docker images small in size and efficient. That's where multi-stage builds came in. Let me show you how I implemented this for both frontend and backend:
Frontend Container
# Build stage
FROM node:18-alpine as build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
# Production stage
FROM nginx:alpine
COPY --from=build /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Why did I adopt using multi-stage?
It reduces final image size by excluding build tools.
For the frontend imagine the image size reducing from a whooping ~590mb with a single-stage build to a 50.1mb image size from the multi-stage build.Using multi-stage also separates build dependencies from runtime dependencies
Improves security by reducing attack surface
Makes for faster deployments due to smaller images
Single-stage build...
vs Multi-stage build...
Backend Container
I also applied multi-stage build here:
# Build stage
FROM python:3.11-slim AS builder
WORKDIR /app
# Install only necessary build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
# Install specific version of poetry with export support
RUN pip install --no-cache-dir poetry==1.5.1
# Copy only dependency files first for better caching
COPY pyproject.toml poetry.lock* ./
# Generate requirements.txt - using the correct syntax for poetry export
RUN poetry export --without-hashes --format=requirements.txt > requirements.txt
# Copy the rest of the application
COPY . .
# Production stage
FROM python:3.11-slim
WORKDIR /app
# Install runtime dependencies only
RUN apt-get update && apt-get install -y --no-install-recommends \
libpq5 \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements and install directly with pip
COPY --from=builder /app/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY --from=builder /app .
# Make startup script executable
RUN chmod +x /app/prestart.sh
# Set correct PYTHONPATH to ensure app imports work properly
ENV PYTHONPATH="${PYTHONPATH}:/app"
EXPOSE 8000
CMD ["bash", "-c", "cd /app && ./prestart.sh && uvicorn app.main:app --host 0.0.0.0 --port 8000"]
Database Management with Adminer
I added Adminer in my application stack in order to monitor the database. I also configured it to have secure access through Traefik.
Did this by simply added that Traefik label in the adminer service section in the docker-compose file.
adminer:
image: adminer
labels:
- "traefik.enable=true"
- "traefik.http.routers.adminer.rule=Host(`michaeloxo.tech`) && PathPrefix(`/adminer`)"
- "traefik.http.routers.adminer.entrypoints=websecure"
- "traefik.http.routers.adminer.tls=true"
Adminer also supports multiple database types so you can try using it too in your various application stacks.
Volume Management
For data persistence for grafana and prometheus, i implemented named volumes:
volumes:
postgres_data:
driver: local
prometheus_data:
driver: local
grafana_data:
driver: local
This helps persist data across container restarts.
Network Isolation
I implemented proper network isolation using Docker networks:
networks:
app-network:
driver: bridge
This will allow easy communication among the containers.
Health Checks
Every service includes health checks. The database, for example, has this health check:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 5s
timeout: 5s
retries: 5
These health checks are important as they ensure that all the services are available, and we can detect problems early.
Detailed Component Breakdown: The Monitoring Stack π
Now let's talk about observability and monitoring.
When I first approached this project, I knew I needed a monitoring solution that would be both powerful and maintainable.
After careful consideration, I settled on a modern stack that combines the best tools for container monitoring, metrics collection, and log aggregation. Let me walk you through each component and why I chose them.
The Foundation: cAdvisor and Container Metrics
The first piece of the puzzle was finding a way to monitor the containers effectively. That's where cAdvisor came in. What makes cAdvisor special is its zero-configuration approach - just mount the right volumes, and it starts collecting metrics automatically. In my setup, it watches over every container, tracking CPU usage, memory consumption, network I/O, and disk usage in real-time.
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.0
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
The beauty of cAdvisor lies in its simplicity. It exposes metrics in Prometheus format out of the box, making it a perfect fit for my monitoring stack. Every container's performance is now visible at a glance, helping us identify potential issues before they become problems.
Prometheus and Metrics Storage
For storing and querying our metrics, I chose Prometheus. Yeah I know, Prometheus is quite everyone's go-to when it come storing and collecting metrics. This is because of its pull-based architecture, which is more reliable than push-based systems, especially in containerized environments. My Prometheus configuration is quite clean and straightforward:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
metrics_path: '/prometheus/metrics'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
The Log Aggregation Duo: Loki and Promtail
For log management, I implemented a combination of Loki and Promtail. This choice was driven by the need for a lightweight yet powerful logging solution. Unlike traditional ELK stacks that can be resource-intensive, Loki and Promtail provide efficient log aggregation with minimal overhead.
loki:
image: grafana/loki:latest
volumes:
- ./loki/loki-config.yml:/etc/loki/config.yml
promtail:
image: grafana/promtail:latest
volumes:
- ./promtail/promtail-config.yml:/etc/promtail/config.yml
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
The synergy between these tools is impressive. Promtail collects logs from both the system and containers, while Loki stores them efficiently. What's particularly useful is how they use the same labeling system as Prometheus, making it easy to correlate logs with metrics.
The Visualization Layer: Grafana
To bring all this data to life, I chose and implemented Grafana. I mean, Grafana just wonderfully ties everything together, providing beautiful dashboards and powerful querying capabilities. My Grafana setup is configured to work seamlessly with both Prometheus and Loki:
grafana:
image: grafana/grafana:latest
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SERVER_ROOT_URL=http://grafana:3000/grafana
- GF_SERVER_SERVE_FROM_SUB_PATH=true
Here's how the grafana dashboards looks like in production:
The Traffic Manager: Traefik
Finally, to tie everything together, I implemented Traefik as my reverse proxy. This modern reverse proxy stands out for its automatic service discovery and dynamic configuration capabilities. My Traefik setup ensures secure access to all my monitoring tools. Just add these labels to the desired service in the docker-compose file:
labels:
- "traefik.enable=true"
- "traefik.http.routers.[service].rule=Host(`michaeloxo.tech`) && PathPrefix(`/[service]`)"
- "traefik.http.routers.[service].entrypoints=websecure"
- "traefik.http.routers.[service].tls=true"
- "traefik.http.routers.[service].middlewares=global-middleware@file"
What makes this setup particularly effective is how all components work together.
cAdvisor collects metrics, Prometheus stores them, Loki and Promtail handle logs, Grafana visualizes everything, and Traefik ensures secure access. It's a well-oiled machine where each part plays its role perfectly.
The result? A comprehensive monitoring solution that provides real-time insights into our application's performance, helps us identify and troubleshoot issues quickly, and ensures we have the data we need to make informed decisions about our infrastructure.
(Btw, I recently implemented a similar but more complex monitoring stack on a product we're building, and ofc i wrote an article about it. Read it here )
Lessons Learned π
Throughout this DevOps implementation journey, I've gathered some valuable insights that are worth noting:
Traefik was like magic!
Automatic SSL Was a Game-Changer. Before Traefik, I spent hours manually configuring and renewing SSL certificates. I also never have to worry about certificate renewals again.
Middleware Chains Simplified Security. Creating reusable security configurations with middleware chains was a marvel. I could easily apply consistent security headers across all services with a single reference.
For containerization best practices
Multi-Stage Builds Transformed My Images
- The satisfaction of seeing frontend image sizes drop from 590MB to 50MB (0ver 90%) was incredible
- Eliminated unnecessary build tools from production images
- Significantly improved deployment speed and reduced bandwidth usage
Implementing Proper Service Dependencies Using Healthchecks
One thing that could make your setup even better is implementing proper service dependencies with healthchecks for startup order.
This ensures services start in the right order and only when their dependencies are truly ready, not just running. It's eliminated those annoying "connection refused" errors during startup and makes the system much more resilient to restarts.
The biggest takeaway from this project was how the right tooling can transform complex tasks into manageable ones. Traefik turned what would have been several hours of reverse proxy configuration into minutes, while the monitoring stack gave me insights I didn't even know I needed until I had them.
Conclusion π
Phew! What a rewarding journey this has been! Taking a regular app and turning it into a containerized, monitored production system was quite the adventure.
Is it perfect? Nah, nothing ever is. But that's the beauty of DevOps - it's all about continuous improvement. I'm still tinkering with container sizes, playing with alert thresholds, and learning new security tricks,. And tbh, that's the fun part!
The biggest win for me wasn't just getting everything up and running (though that felt amazing!), but seeing how all these pieces work together. Watching container metrics flow into Prometheus, visualizing them in Grafana, and catching issues before they become problems - it's like having superpowers lol.
I still have a list of improvements I want to make. Just thought of adding a ci/cd pipeline and integrating terraform & ansible to automate deployment, I know i'll still think of more.
But for now, I'm pretty good with this setup. It's running smoothly, it's secure, and most importantly - it's giving us the insights we need to keep getting better.
I do hope sharing my experience helps you in your own containerization and monitoring journey. Again don't forget to check out my article exclusively on monitoring here.
If you'd like to explore the complete implementation, the entire project is available on GitHub. I welcome your feedback, issues, and pull requests!
Till the next, happy building!