PostgreSQL Functions, Procedures, and Normalization
Functions in PostgreSQL
What are Functions?
Functions in PostgreSQL are named blocks of code that:
- Accept input parameters
- Perform computations
- Return a single value or a table (for table functions)
- Can be used in SQL statements
Creating a Function
CREATE OR REPLACE FUNCTION calculate_tax(amount numeric)
RETURNS numeric AS $$
BEGIN
RETURN amount * 0.15; -- 15% tax
END;
$$ LANGUAGE plpgsql;
Calling a Function
SELECT calculate_tax(100); -- Returns 15.00
Key Features:
- Support multiple languages (SQL, PL/pgSQL, Python, etc.)
- Can be overloaded (same name, different parameters)
- Can return complex types
- Support default parameter values
Procedures in PostgreSQL
What are Procedures?
Introduced in PostgreSQL 11, procedures:
- Can perform transactions (COMMIT/ROLLBACK inside)
- Don't return values (unlike functions)
- Are called with CALL statement
- Used for operations that modify data
Creating a Procedure
CREATE OR REPLACE PROCEDURE update_salary(emp_id int, increase_percent numeric)
AS $$
BEGIN
UPDATE employees
SET salary = salary * (1 + increase_percent/100)
WHERE id = emp_id;
COMMIT;
END;
$$ LANGUAGE plpgsql;
Calling a Procedure
CALL update_salary(101, 10); -- Gives 10% raise to employee 101
Normalization in Database Design
What is Normalization?
A process to organize data to minimize redundancy and dependency by dividing tables and defining relationships.
Normal Forms:
1NF (First Normal Form)
- Each table cell should contain atomic values
- Each record needs to be unique
- No repeating groups
Example:
Before 1NF:
Orders(order_id, [product1, product2, product3])
After 1NF:
Orders(order_id, product_id, quantity) -- In separate rows
2NF (Second Normal Form)
- Must be in 1NF
- All non-key attributes depend on the entire primary key
Example:
Before 2NF:
Order_Details(order_id, product_id, product_name, quantity)
After 2NF:
Order_Details(order_id, product_id, quantity)
Products(product_id, product_name)
3NF (Third Normal Form)
- Must be in 2NF
- No transitive dependencies (non-key attributes shouldn't depend on other non-key attributes)
Example:
Before 3NF:
Employees(emp_id, name, dept_id, dept_name, dept_location)
After 3NF:
Employees(emp_id, name, dept_id)
Departments(dept_id, dept_name, dept_location)
BCNF (Boyce-Codd Normal Form)
- Stricter version of 3NF
- Every determinant must be a candidate key
4NF and 5NF
- Deal with multi-valued dependencies and join dependencies
- Rarely implemented in practice
Benefits of Normalization:
- Reduces data redundancy
- Improves data integrity
- Simplifies maintenance
- Makes the database more flexible
Trade-offs:
- More complex queries (more JOINs)
- Potential performance impact
- Sometimes denormalization is used for performance
Practical Example Combining All Concepts
-- Normalized tables
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(customer_id),
order_date DATE NOT NULL DEFAULT CURRENT_DATE
);
CREATE TABLE order_items (
item_id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(order_id),
product_id INT REFERENCES products(product_id),
quantity INT NOT NULL CHECK (quantity > 0)
);
-- Function to calculate order total
CREATE OR REPLACE FUNCTION get_order_total(order_id int)
RETURNS numeric AS $$
DECLARE
total numeric;
BEGIN
SELECT SUM(p.price * oi.quantity) INTO total
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
WHERE oi.order_id = get_order_total.order_id;
RETURN COALESCE(total, 0);
END;
$$ LANGUAGE plpgsql;
-- Procedure to process orders
CREATE OR REPLACE PROCEDURE process_order(customer_id int, product_ids int[], quantities int[])
AS $$
DECLARE
new_order_id int;
BEGIN
-- Create new order
INSERT INTO orders (customer_id) VALUES (customer_id)
RETURNING order_id INTO new_order_id;
-- Add order items
FOR i IN 1..array_length(product_ids, 1) LOOP
INSERT INTO order_items (order_id, product_id, quantity)
VALUES (new_order_id, product_ids[i], quantities[i]);
END LOOP;
COMMIT;
END;
$$ LANGUAGE plpgsql;
This example demonstrates how PostgreSQL functions and procedures can work with a normalized database structure to provide efficient data operations while maintaining data integrity.
Ranking Functions in PostgreSQL: RANK, ROW_NUMBER, and DENSE_RANK
PostgreSQL provides three main ranking functions that are commonly used in window functions. Here's a detailed comparison with examples:
1. ROW_NUMBER()
Assigns a unique sequential integer to rows within a partition (no ties).
SELECT
employee_name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;
Behavior:
- Always produces consecutive numbers (1, 2, 3, ...)
- Different rows always get different numbers, even with ties
- Resets for each partition if PARTITION BY is used
2. RANK()
Assigns a rank with gaps for ties.
SELECT
employee_name,
salary,
RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;
Behavior:
- Rows with equal values get the same rank
- Subsequent ranks have gaps (e.g., 1, 1, 3, 4, 4, 6...)
- Resets for each partition
3. DENSE_RANK()
Assigns a rank without gaps for ties.
SELECT
employee_name,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
Behavior:
- Rows with equal values get the same rank
- Subsequent ranks are consecutive (e.g., 1, 1, 2, 3, 3, 4...)
- Resets for each partition
Comparison Example
SELECT
employee_name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
Sample Output:
employee_name | salary | row_num | rank | dense_rank |
---|---|---|---|---|
Alice | 9000 | 1 | 1 | 1 |
Bob | 9000 | 2 | 1 | 1 |
Charlie | 8000 | 3 | 3 | 2 |
David | 7500 | 4 | 4 | 3 |
Eve | 7500 | 5 | 4 | 3 |
Frank | 7000 | 6 | 6 | 4 |
Practical Use Cases
- Top N per group:
WITH ranked_employees AS (
SELECT
employee_name,
department,
salary,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees
)
SELECT * FROM ranked_employees WHERE dept_rank <= 3;
- Finding duplicates:
SELECT *
FROM (
SELECT
email,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS row_num
FROM users
) t
WHERE row_num > 1;
- Pagination:
SELECT *
FROM (
SELECT
product_name,
price,
ROW_NUMBER() OVER (ORDER BY price DESC) AS row_num
FROM products
) t
WHERE row_num BETWEEN 11 AND 20;
Performance Considerations
- Window functions execute after WHERE, GROUP BY, and HAVING clauses
- Include only necessary columns in PARTITION BY for better performance
- Add appropriate indexes on columns used in ORDER BY of window functions