🤯 Why You Should Care
Neural networks (NNs) power everything from ChatGPT to self-driving cars. But let’s be honest: Using TensorFlow/PyTorch feels like magic—until you realize you don’t know how the wand works.

This post is for you if:

🧠 You want to demystify neural networks (no more black boxes!).

💻 You love coding fundamentals (goodbye model.fit(), hello raw matrices!).

⚡ You crave the satisfaction of "I built this myself!"

Spoiler: By the end, you’ll code a NN that classifies handwritten digits (MNIST) with 90%+ accuracy—using only numpy. Let’s go!


🔥 The Blueprint: How Neural Nets Actually Work
Here’s what we’ll implement:

  1. Layers: Input → Hidden → Output (with weights and biases).
  2. Activation Function: ReLU (hidden layer) and Softmax (output).
  3. Loss: Cross-entropy (because we’re classifying digits).
  4. Backpropagation: Calculus + chain rule (don’t panic—numpy does the heavy lifting).

💻 Step 1: Coding the Neural Network

1. Initialize Parameters

import numpy as np  

def initialize_parameters(input_size, hidden_size, output_size):  
    W1 = np.random.randn(hidden_size, input_size) * 0.01  
    b1 = np.zeros((hidden_size, 1))  
    W2 = np.random.randn(output_size, hidden_size) * 0.01  
    b2 = np.zeros((output_size, 1))  
    return {"W1": W1, "b1": b1, "W2": W2, "b2": b2}

Why? Tiny random weights prevent symmetry issues. Biases start at zero.
2. Forward Propagation

def relu(Z):  
    return np.maximum(0, Z)  

def softmax(Z):  
    exp = np.exp(Z - np.max(Z))  # Stability hack  
    return exp / exp.sum(axis=0)  

def forward(X, params):  
    Z1 = params["W1"] @ X + params["b1"]  
    A1 = relu(Z1)  
    Z2 = params["W2"] @ A1 + params["b2"]  
    A2 = softmax(Z2)  
    return A2, (Z1, A1, Z2)

3. Compute Loss

def cross_entropy_loss(A2, Y):  
    m = Y.shape[1]  
    log_probs = np.log(A2) * Y  
    return -np.sum(log_probs) / m

4. Backpropagation (The “Aha!” Moment)

def backward(X, Y, params, cache):  
    m = Y.shape[1]  
    Z1, A1, Z2 = cache  
    A2, _ = forward(X, params)  

    # Output layer gradient  
    dZ2 = A2 - Y  
    dW2 = (dZ2 @ A1.T) / m  
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m  

    # Hidden layer gradient  
    dZ1 = (params["W2"].T @ dZ2) * (Z1 > 0)  # ReLU derivative  
    dW1 = (dZ1 @ X.T) / m  
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m  

    return {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}

5. Update Parameters (Gradient Descent)

def update_params(params, grads, learning_rate=0.1):  
    params["W1"] -= learning_rate * grads["dW1"]  
    params["b1"] -= learning_rate * grads["db1"]  
    params["W2"] -= learning_rate * grads["dW2"]  
    params["b2"] -= learning_rate * grads["db2"]  
    return params

🚂 Training Loop (The Grind)

def train(X, Y, epochs=1000):  
    params = initialize_parameters(784, 128, 10)  # MNIST: 28x28=784 pixels  
    for i in range(epochs):  
        A2, cache = forward(X, params)  
        loss = cross_entropy_loss(A2, Y)  
        grads = backward(X, Y, params, cache)  
        params = update_params(params, grads)  
        if i % 100 == 0:  
            print(f"Epoch {i}: Loss = {loss:.4f}")  
    return params

🎯 Results: 92% Accuracy on MNIST!
After training on 60k MNIST images (and tuning hyperparameters):

Epoch 0: Loss = 2.3026  
Epoch 100: Loss = 0.3541  
Epoch 200: Loss = 0.2011  
...  
Final Test Accuracy: 92.3%

Not bad for 150 lines of numpy!


💡 Key Takeaways

  1. NNs are just math: Matrix multiplications, derivatives, and chain rules.
  2. Backpropagation = Loss gradients flowing backward (no magic!).
  3. You don’t need frameworks to understand the core (but use them for real projects 😉).

👨💻 Follow on GitHub
https://github.com/dassomnath99

📣 Share This Post
If you geeked out reading this, share it with a friend and tag #NumpyNN!

💬 Comments
“Wait, backprop is just the chain rule?!” → Drop your reactions below!