As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!
Database schema design sits at the core of any successful web application. After years of building database systems for web platforms, I've found that proper schema design can make or break performance, scalability, and maintainability. In this article, I'll share eight schema design patterns that have consistently proven valuable across various projects.
Normalization: The Foundation of Good Schema Design
Normalization creates a solid foundation for most databases. By organizing data into tables and eliminating redundancy, we establish data integrity while optimizing storage. I typically start with third normal form (3NF) and adjust as needed.
The basic implementation involves breaking down data into related tables and connecting them through foreign keys:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
phone VARCHAR(50),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE addresses (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
street VARCHAR(255),
city VARCHAR(100),
state VARCHAR(100),
postal_code VARCHAR(20),
country VARCHAR(100),
is_primary BOOLEAN DEFAULT false
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status VARCHAR(50)
);
This approach provides clear relationships, minimizes data duplication, and makes data modifications less error-prone. However, I've learned that strict normalization isn't always the answer for web applications with heavy read operations.
Strategic Denormalization for Performance
While normalization establishes data integrity, web applications often benefit from controlled denormalization to optimize read performance. I've found that selectively duplicating data can dramatically improve query response times for frequently accessed data patterns.
For example, in an e-commerce system, we might store aggregated order statistics directly in the customer table:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
first_name VARCHAR(100),
last_name VARCHAR(100),
total_orders INTEGER DEFAULT 0,
total_spent DECIMAL(10,2) DEFAULT 0.00,
last_order_date TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
This approach eliminates complex joins and aggregate calculations for frequently accessed metrics. The trade-off is maintaining this denormalized data through triggers or application code.
I maintain a set of rules for denormalization: only denormalize data that's frequently read but infrequently changed, document all denormalized fields, and establish reliable methods to refresh denormalized values when source data changes.
Polymorphic Associations for Flexible Relationships
Polymorphic associations have been invaluable when I need to model relationships where an entity can belong to different types of records. Rather than creating separate join tables for each relationship type, a polymorphic pattern provides a more elegant solution.
Consider a system where comments can be attached to different entities like posts, products, or user profiles:
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title VARCHAR(255),
content TEXT
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
description TEXT
);
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
commentable_type VARCHAR(50) NOT NULL,
commentable_id INTEGER NOT NULL,
user_id INTEGER REFERENCES users(id),
content TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(commentable_type, commentable_id, user_id)
);
The commentable_type
field stores the table name (like "posts" or "products"), while commentable_id
references the specific record ID. This pattern reduces table proliferation and adds flexibility.
In PostgreSQL, I often create a custom index to optimize queries:
CREATE INDEX idx_comments_polymorphic
ON comments(commentable_type, commentable_id);
The primary challenge with this pattern is maintaining referential integrity, as the database can't enforce foreign key constraints on polymorphic relationships. I typically implement application-level validations to ensure data consistency.
JSON Column Types for Semi-Structured Data
Modern databases like PostgreSQL, MySQL, and SQL Server now support JSON column types, which have transformed how I handle semi-structured data. Instead of creating numerous columns for optional attributes or building complex entity-attribute-value tables, JSON columns provide flexibility with good query performance.
For example, storing product attributes that vary by product type:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
base_price DECIMAL(10,2) NOT NULL,
attributes JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Insert different product types with varying attributes
INSERT INTO products (name, base_price, attributes) VALUES
('Laptop', 999.99, '{"processor": "i7", "ram": "16GB", "storage": "512GB SSD", "screen": "15.6 inch"}'),
('T-shirt', 19.99, '{"size": "M", "color": "blue", "material": "cotton"}'),
('Coffee Table', 149.99, '{"width": "120cm", "length": "60cm", "material": "oak"}');
I can then query specific JSON attributes efficiently:
-- Find laptops with at least 16GB RAM
SELECT * FROM products
WHERE attributes->>'processor' = 'i7'
AND (attributes->>'ram')::text >= '16GB';
-- Create an index on specific JSON fields
CREATE INDEX idx_product_processor ON products ((attributes->>'processor'));
This approach balances structure and flexibility, particularly useful for applications with evolving data requirements. However, I'm careful not to overuse JSON columns, as they can lead to schema-less designs that sacrifice query performance and data validation.
Materialized Views for Precomputed Results
Materialized views store the results of complex queries, making them excellent for reporting and analytics in web applications. Unlike regular views, materialized views physically store data, providing significantly faster access at the cost of needing periodic refreshes.
Here's an example creating a materialized view for sales analytics:
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT
DATE_TRUNC('month', order_date) AS month,
p.category,
COUNT(oi.id) AS items_sold,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.id
JOIN orders o ON oi.order_id = o.id
WHERE o.status = 'completed'
GROUP BY DATE_TRUNC('month', order_date), p.category
WITH DATA;
-- Create an index to speed up queries on the materialized view
CREATE INDEX idx_monthly_sales_month ON monthly_sales(month);
To refresh the data when needed:
REFRESH MATERIALIZED VIEW monthly_sales;
For incrementally updating materialized views in PostgreSQL 12+:
-- Set up a more efficient partial refresh
CREATE MATERIALIZED VIEW monthly_sales_incremental AS
SELECT
DATE_TRUNC('month', order_date) AS month,
p.category,
COUNT(oi.id) AS items_sold,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.id
JOIN orders o ON oi.order_id = o.id
WHERE o.status = 'completed'
GROUP BY DATE_TRUNC('month', order_date), p.category
WITH DATA;
-- Only refresh data for the current month
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales_incremental;
I've found materialized views particularly effective for dashboard data, reducing complex real-time calculations to simple queries against precomputed data.
Temporal Data Modeling for Historical Records
Many web applications need to track changes over time. I've implemented temporal data models in various systems to maintain historical records while keeping current queries efficient.
A basic approach uses effective dating with start and end dates:
CREATE TABLE product_prices (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products(id),
price DECIMAL(10,2) NOT NULL,
effective_from DATE NOT NULL,
effective_to DATE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CHECK (effective_from < effective_to)
);
-- Insert initial price
INSERT INTO product_prices (product_id, price, effective_from, effective_to)
VALUES (1, 99.99, '2023-01-01', NULL);
-- Update price while maintaining history
UPDATE product_prices
SET effective_to = '2023-06-30'
WHERE product_id = 1 AND effective_to IS NULL;
INSERT INTO product_prices (product_id, price, effective_from, effective_to)
VALUES (1, 129.99, '2023-07-01', NULL);
For more complex scenarios, I implement history tables that track all changes to records:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(255),
status VARCHAR(50),
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE customer_history (
history_id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
email VARCHAR(255) NOT NULL,
name VARCHAR(255),
status VARCHAR(50),
changed_at TIMESTAMP NOT NULL,
changed_by INTEGER REFERENCES users(id)
);
-- Create a trigger to capture changes
CREATE OR REPLACE FUNCTION log_customer_changes()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO customer_history (
customer_id, email, name, status, changed_at, changed_by
) VALUES (
OLD.id, OLD.email, OLD.name, OLD.status,
CURRENT_TIMESTAMP, current_setting('app.current_user_id')::integer
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER customer_history_trigger
BEFORE UPDATE ON customers
FOR EACH ROW EXECUTE FUNCTION log_customer_changes();
This pattern enables point-in-time analysis and auditing while keeping the main tables focused on current data. The trade-off is increased storage requirements and slightly more complex queries for historical data.
Soft Deletion Pattern for Data Preservation
I've found that actual deletion of data often creates problems in web applications. Instead, implementing soft deletion (logical deletion) maintains referential integrity while allowing "deleted" data to disappear from standard queries.
The basic implementation adds a deletion flag and timestamp:
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
content TEXT,
author_id INTEGER REFERENCES users(id),
is_deleted BOOLEAN DEFAULT false,
deleted_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Soft delete an article
UPDATE articles
SET is_deleted = true, deleted_at = CURRENT_TIMESTAMP
WHERE id = 123;
-- Query that excludes soft-deleted records
SELECT * FROM articles WHERE is_deleted = false;
For better query ergonomics, I often implement this pattern with database views:
CREATE VIEW active_articles AS
SELECT * FROM articles WHERE is_deleted = false;
-- Now queries can be simpler
SELECT * FROM active_articles;
This approach preserves data relationships, maintains history, and allows for data recovery. It's especially important in systems with complex relationships between entities, where hard deletion could orphan related records or violate business rules.
Database Sharding for Horizontal Scalability
As web applications grow, database performance can become a bottleneck. Database sharding distributes data across multiple database instances, allowing horizontal scaling beyond what a single database server can handle.
I typically implement sharding by customer, geography, or time periods, depending on the application's natural partitioning boundaries.
Here's a simplified example of sharding by customer ID in a multi-tenant application:
-- On each shard, tables have the same structure
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
external_id UUID UNIQUE NOT NULL,
name VARCHAR(255),
email VARCHAR(255)
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
external_id UUID UNIQUE NOT NULL,
customer_id INTEGER REFERENCES customers(id),
amount DECIMAL(10,2),
status VARCHAR(50),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
The application layer handles shard selection logic:
def get_shard_for_customer(customer_external_id):
# Simple modulo-based sharding
shard_id = int(uuid.UUID(customer_external_id).int % NUM_SHARDS)
return f"shard_{shard_id}"
def get_customer_data(customer_external_id):
shard = get_shard_for_customer(customer_external_id)
db_connection = get_database_connection(shard)
return db_connection.query(
"SELECT * FROM customers WHERE external_id = %s",
[customer_external_id]
)
The challenges with sharding include managing cross-shard queries, distributed transactions, and ensuring consistent sequential IDs across shards. I address these by using UUID primary keys for external references, designing queries to minimize cross-shard operations, and implementing application-level coordination for distributed transactions.
Sharding requires significant architectural planning but can dramatically increase database capacity for large-scale web applications.
Conclusion
After implementing numerous database schemas for web applications, I've found these eight patterns consistently valuable. The key is applying them judiciously based on your application's specific requirements.
Start with normalization as your foundation, introduce denormalization where read performance matters, use polymorphic associations for flexible relationships, leverage JSON columns for variable attributes, implement materialized views for complex reporting queries, employ temporal data modeling when history matters, use soft deletion to preserve data relationships, and consider sharding for very large datasets.
Remember that no single pattern solves all problems. The most effective database schemas combine these approaches strategically, creating a balanced system that delivers good performance, maintainability, and scalability as your web application grows.
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva