📊 YouTube API – Data Warehouse & Analytics Solution
This repository demonstrates a complete data pipeline that extracts data from the YouTube Data API, models it using the Medallion Architecture, and delivers business-ready insights via Grafana dashboards.
📦 Project Summary
This project implements a modern analytics pipeline with:
- Medallion Architecture: Structured into Bronze, Silver, and Gold layers for scalable data processing.
- ETL Workflows: Automated extraction, transformation, and loading using Apache Airflow.
- Data Modeling: Dimensional modeling in PostgreSQL for optimized querying.
- Dashboards: Real-time reporting using Grafana, powered by SQL.
🧰 Tech Stack
- PostgreSQL – Central data warehouse
- Apache Airflow – Workflow orchestration
- Grafana – Real-time data visualization
- Linux VM – Compute environment for pipeline execution
- Python – API ingestion & transformation logic
🎯 Project Objectives
Build a production-ready analytics solution to analyze YouTube channel and video performance:
- Source structured data from the YouTube Data API
- Clean, validate, and model for business intelligence
- Persist historical metrics (views, likes, etc.) for trend analysis
- Deliver actionable insights via dashboards and SQL queries
🗃️ Data Architecture (Medallion Model)
This project follows a Bronze → Silver → Gold pipeline:
🔹 Bronze Layer
Raw ingestion from the YouTube API (JSON format)
🔸 Silver Layer
Cleaned, validated, and structured data (see data flow and model below)
Data Flow
Data Model
🟡 Gold Layer
Aggregated data used to generate KPIs and dashboards in Grafana
-
Visualization Sample
📈 BI Use Cases
Dashboards and SQL queries answer key questions such as:
- What are the top-performing videos per channel?
- How is each channel performing over time?
- What are the daily trends for views and engagement?
📁 Repository Structure
├── README.md
├── channel_lists.py
├── channel_overview.py
├── channel_videos.py
├── __pycache__/ # Compiled Python files
├── project_files/
│ ├── Architecture/ # Draw.io and PNG files for architecture
│ └── ddl_update_scripts/ # SQL DDLs and procedures
│ ├── dim_channels.sql
│ ├── dim_videos.sql
│ ├── fct_subscribers_views_video_count.sql
│ └── fct_video_statistics.sql
└── requirements.txt # Python dependencies
🔗 Access the Code
Browse the full codebase here