๐Ÿ“Š YouTube API โ€“ Data Warehouse & Analytics Solution

This repository demonstrates a complete data pipeline that extracts data from the YouTube Data API, models it using the Medallion Architecture, and delivers business-ready insights via Grafana dashboards.


๐Ÿ“ฆ Project Summary

This project implements a modern analytics pipeline with:

  • Medallion Architecture: Structured into Bronze, Silver, and Gold layers for scalable data processing.
  • ETL Workflows: Automated extraction, transformation, and loading using Apache Airflow.
  • Data Modeling: Dimensional modeling in PostgreSQL for optimized querying.
  • Dashboards: Real-time reporting using Grafana, powered by SQL.

๐Ÿงฐ Tech Stack

  • PostgreSQL โ€“ Central data warehouse
  • Apache Airflow โ€“ Workflow orchestration
  • Grafana โ€“ Real-time data visualization
  • Linux VM โ€“ Compute environment for pipeline execution
  • Python โ€“ API ingestion & transformation logic

๐ŸŽฏ Project Objectives

Build a production-ready analytics solution to analyze YouTube channel and video performance:

  • Source structured data from the YouTube Data API
  • Clean, validate, and model for business intelligence
  • Persist historical metrics (views, likes, etc.) for trend analysis
  • Deliver actionable insights via dashboards and SQL queries

๐Ÿ—ƒ๏ธ Data Architecture (Medallion Model)

This project follows a Bronze โ†’ Silver โ†’ Gold pipeline:

Architecture

๐Ÿ”น Bronze Layer

Raw ingestion from the YouTube API (JSON format)

๐Ÿ”ธ Silver Layer

Cleaned, validated, and structured data (see data flow and model below)

  • Data Flow
    DataFlow

  • Data Model
    Data Model

๐ŸŸก Gold Layer

Aggregated data used to generate KPIs and dashboards in Grafana

  • Visualization Sample Visualization

๐Ÿ“ˆ BI Use Cases

Dashboards and SQL queries answer key questions such as:

  • What are the top-performing videos per channel?
  • How is each channel performing over time?
  • What are the daily trends for views and engagement?

๐Ÿ“ Repository Structure

โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ channel_lists.py
โ”œโ”€โ”€ channel_overview.py
โ”œโ”€โ”€ channel_videos.py
โ”œโ”€โ”€ __pycache__/                        # Compiled Python files
โ”œโ”€โ”€ project_files/
โ”‚   โ”œโ”€โ”€ Architecture/                  # Draw.io and PNG files for architecture
โ”‚   โ””โ”€โ”€ ddl_update_scripts/           # SQL DDLs and procedures
โ”‚       โ”œโ”€โ”€ dim_channels.sql
โ”‚       โ”œโ”€โ”€ dim_videos.sql
โ”‚       โ”œโ”€โ”€ fct_subscribers_views_video_count.sql
โ”‚       โ””โ”€โ”€ fct_video_statistics.sql
โ””โ”€โ”€ requirements.txt                   # Python dependencies

๐Ÿ”— Access the Code

Browse the full codebase here