PostgreSQL Functions, Procedures, and Normalization

Functions in PostgreSQL

What are Functions?

Functions in PostgreSQL are named blocks of code that:

  • Accept input parameters
  • Perform computations
  • Return a single value or a table (for table functions)
  • Can be used in SQL statements

Creating a Function

CREATE OR REPLACE FUNCTION calculate_tax(amount numeric)
RETURNS numeric AS $$
BEGIN
    RETURN amount * 0.15; -- 15% tax
END;
$$ LANGUAGE plpgsql;

Calling a Function

SELECT calculate_tax(100); -- Returns 15.00

Key Features:

  • Support multiple languages (SQL, PL/pgSQL, Python, etc.)
  • Can be overloaded (same name, different parameters)
  • Can return complex types
  • Support default parameter values

Procedures in PostgreSQL

What are Procedures?

Introduced in PostgreSQL 11, procedures:

  • Can perform transactions (COMMIT/ROLLBACK inside)
  • Don't return values (unlike functions)
  • Are called with CALL statement
  • Used for operations that modify data

Creating a Procedure

CREATE OR REPLACE PROCEDURE update_salary(emp_id int, increase_percent numeric)
AS $$
BEGIN
    UPDATE employees 
    SET salary = salary * (1 + increase_percent/100)
    WHERE id = emp_id;

    COMMIT;
END;
$$ LANGUAGE plpgsql;

Calling a Procedure

CALL update_salary(101, 10); -- Gives 10% raise to employee 101

Normalization in Database Design

What is Normalization?

A process to organize data to minimize redundancy and dependency by dividing tables and defining relationships.

Normal Forms:

1NF (First Normal Form)

  • Each table cell should contain atomic values
  • Each record needs to be unique
  • No repeating groups

Example:

Before 1NF:
Orders(order_id, [product1, product2, product3])

After 1NF:
Orders(order_id, product_id, quantity) -- In separate rows

2NF (Second Normal Form)

  • Must be in 1NF
  • All non-key attributes depend on the entire primary key

Example:

Before 2NF:
Order_Details(order_id, product_id, product_name, quantity)

After 2NF:
Order_Details(order_id, product_id, quantity)
Products(product_id, product_name)

3NF (Third Normal Form)

  • Must be in 2NF
  • No transitive dependencies (non-key attributes shouldn't depend on other non-key attributes)

Example:

Before 3NF:
Employees(emp_id, name, dept_id, dept_name, dept_location)

After 3NF:
Employees(emp_id, name, dept_id)
Departments(dept_id, dept_name, dept_location)

BCNF (Boyce-Codd Normal Form)

  • Stricter version of 3NF
  • Every determinant must be a candidate key

4NF and 5NF

  • Deal with multi-valued dependencies and join dependencies
  • Rarely implemented in practice

Benefits of Normalization:

  • Reduces data redundancy
  • Improves data integrity
  • Simplifies maintenance
  • Makes the database more flexible

Trade-offs:

  • More complex queries (more JOINs)
  • Potential performance impact
  • Sometimes denormalization is used for performance

Practical Example Combining All Concepts

-- Normalized tables
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date DATE NOT NULL DEFAULT CURRENT_DATE
);

CREATE TABLE order_items (
    item_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id),
    product_id INT REFERENCES products(product_id),
    quantity INT NOT NULL CHECK (quantity > 0)
);

-- Function to calculate order total
CREATE OR REPLACE FUNCTION get_order_total(order_id int)
RETURNS numeric AS $$
DECLARE
    total numeric;
BEGIN
    SELECT SUM(p.price * oi.quantity) INTO total
    FROM order_items oi
    JOIN products p ON oi.product_id = p.product_id
    WHERE oi.order_id = get_order_total.order_id;

    RETURN COALESCE(total, 0);
END;
$$ LANGUAGE plpgsql;

-- Procedure to process orders
CREATE OR REPLACE PROCEDURE process_order(customer_id int, product_ids int[], quantities int[])
AS $$
DECLARE
    new_order_id int;
BEGIN
    -- Create new order
    INSERT INTO orders (customer_id) VALUES (customer_id)
    RETURNING order_id INTO new_order_id;

    -- Add order items
    FOR i IN 1..array_length(product_ids, 1) LOOP
        INSERT INTO order_items (order_id, product_id, quantity)
        VALUES (new_order_id, product_ids[i], quantities[i]);
    END LOOP;

    COMMIT;
END;
$$ LANGUAGE plpgsql;

This example demonstrates how PostgreSQL functions and procedures can work with a normalized database structure to provide efficient data operations while maintaining data integrity.

Ranking Functions in PostgreSQL: RANK, ROW_NUMBER, and DENSE_RANK

PostgreSQL provides three main ranking functions that are commonly used in window functions. Here's a detailed comparison with examples:

1. ROW_NUMBER()

Assigns a unique sequential integer to rows within a partition (no ties).

SELECT 
    employee_name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;

Behavior:

  • Always produces consecutive numbers (1, 2, 3, ...)
  • Different rows always get different numbers, even with ties
  • Resets for each partition if PARTITION BY is used

2. RANK()

Assigns a rank with gaps for ties.

SELECT 
    employee_name,
    salary,
    RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;

Behavior:

  • Rows with equal values get the same rank
  • Subsequent ranks have gaps (e.g., 1, 1, 3, 4, 4, 6...)
  • Resets for each partition

3. DENSE_RANK()

Assigns a rank without gaps for ties.

SELECT 
    employee_name,
    salary,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

Behavior:

  • Rows with equal values get the same rank
  • Subsequent ranks are consecutive (e.g., 1, 1, 2, 3, 3, 4...)
  • Resets for each partition

Comparison Example

SELECT 
    employee_name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
    RANK() OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

Sample Output:

employee_name salary row_num rank dense_rank
Alice 9000 1 1 1
Bob 9000 2 1 1
Charlie 8000 3 3 2
David 7500 4 4 3
Eve 7500 5 4 3
Frank 7000 6 6 4

Practical Use Cases

  1. Top N per group:
WITH ranked_employees AS (
    SELECT 
        employee_name,
        department,
        salary,
        DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
    FROM employees
)
SELECT * FROM ranked_employees WHERE dept_rank <= 3;
  1. Finding duplicates:
SELECT *
FROM (
    SELECT 
        email,
        ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS row_num
    FROM users
) t
WHERE row_num > 1;
  1. Pagination:
SELECT *
FROM (
    SELECT 
        product_name,
        price,
        ROW_NUMBER() OVER (ORDER BY price DESC) AS row_num
    FROM products
) t
WHERE row_num BETWEEN 11 AND 20;

Performance Considerations

  • Window functions execute after WHERE, GROUP BY, and HAVING clauses
  • Include only necessary columns in PARTITION BY for better performance
  • Add appropriate indexes on columns used in ORDER BY of window functions