Building a neural network from scratch in Python is an excellent way to understand how deep learning works under the hood. While libraries like TensorFlow and PyTorch simplify the process, coding a neural network manually helps solidify core concepts such as forward propagation, backpropagation, and optimization.

In this guide, we’ll walk through the step-by-step process of creating a simple neural network using only NumPy. By the end, you’ll have a working neural network that can make predictions based on given inputs, and you’ll understand how it learns from data.

Why Build a Neural Network from Scratch?

Before diving into the code, here are some key reasons why manually implementing a neural network is beneficial:

  • Deep understanding: Learn how neural networks process information at a fundamental level.
  • Mathematical intuition: Understand how weights, biases, and activation functions contribute to learning.
  • Customization: Modify and experiment with different architectures without framework restrictions.
  • Better debugging skills: Gain insights into how training works and how to troubleshoot neural network issues.

Prerequisites

Before we begin, make sure you have the following installed:

  • Python (3.x recommended)
  • NumPy (pip install numpy)
  • Matplotlib (optional, for visualization: pip install matplotlib)

Step 1: Import Necessary Libraries

We'll use NumPy for numerical computations, particularly for matrix operations and random number generation.

import numpy as np

Step 2: Initialize Neural Network Parameters

We will build a simple neural network with:

  • 2 input neurons (for two input features)
  • 2 hidden layer neurons (for learning patterns)
  • 1 output neuron (to make a prediction)

Neural networks have weights (connections between neurons) and biases (threshold adjustments). We initialize these randomly:

# Set random seed for reproducibility
np.random.seed(42)

# Initialize weights and biases
weights_input_hidden = np.random.randn(2, 2)
bias_hidden = np.zeros((1, 2))

weights_hidden_output = np.random.randn(2, 1)
bias_output = np.zeros((1, 1))

Step 3: Define the Activation Function

Activation functions introduce non-linearity, allowing the network to model complex relationships. We use the sigmoid function because it maps values between 0 and 1.

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)  # Used for computing gradients

Step 4: Implement Forward Propagation

Forward propagation calculates the output of the neural network by passing inputs through the layers:

  • Multiply input values by weights and add biases.
  • Apply the activation function to get hidden layer values.
  • Multiply hidden layer values by weights and apply activation again to get the final output.
def forward_propagation(X):
    global hidden_layer_input, hidden_layer_output, output_layer_input, output

    # Compute hidden layer
    hidden_layer_input = np.dot(X, weights_input_hidden) + bias_hidden
    hidden_layer_output = sigmoid(hidden_layer_input)

    # Compute output layer
    output_layer_input = np.dot(hidden_layer_output, weights_hidden_output) + bias_output
    output = sigmoid(output_layer_input)

    return output

Step 5: Compute the Loss

The loss function measures how different the predictions are from the actual values. We use Mean Squared Error (MSE):

def compute_loss(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

Step 6: Implement Backpropagation

Backpropagation is the learning process where the network adjusts its weights based on the error.

  • Calculate the error between predicted and actual values.
  • Compute gradients using the derivative of the activation function.
  • Update the weights and biases using gradient descent.
def backpropagation(X, y, learning_rate=0.1):
    global weights_input_hidden, bias_hidden, weights_hidden_output, bias_output

    # Calculate error
    error = y - output

    # Compute gradients for output layer
    d_output = error * sigmoid_derivative(output)

    # Compute error for hidden layer
    error_hidden = d_output.dot(weights_hidden_output.T)
    d_hidden = error_hidden * sigmoid_derivative(hidden_layer_output)

    # Update weights and biases
    weights_hidden_output += hidden_layer_output.T.dot(d_output) * learning_rate
    bias_output += np.sum(d_output, axis=0, keepdims=True) * learning_rate
    weights_input_hidden += X.T.dot(d_hidden) * learning_rate
    bias_hidden += np.sum(d_hidden, axis=0, keepdims=True) * learning_rate

Step 7: Train the Neural Network

Now, let's define the training loop to update weights over multiple iterations:

# Sample dataset (XOR problem)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Training the network
epochs = 10000
learning_rate = 0.1

for epoch in range(epochs):
    output = forward_propagation(X)
    backpropagation(X, y, learning_rate)

    # Print loss every 1000 epochs
    if epoch % 1000 == 0:
        loss = compute_loss(y, output)
        print(f'Epoch {epoch}, Loss: {loss:.5f}')

Step 8: Test the Neural Network

After training, let's test the network:

output = forward_propagation(X)
print("Final Predictions:")
print(output)

Understanding the Results

The network should approximate the XOR function. The output values should be close to:

  • 0, 1, 1, 0

Since we’re using the sigmoid function, values will be between 0 and 1 but should be close to the expected results.

Optimizing the Neural Network

To improve accuracy, consider:

  • Increasing the number of hidden neurons.
  • Using different activation functions (e.g., ReLU).
  • Adjusting the learning rate.
  • Training for more epochs.

FAQs

  • Can I build deeper networks with this method? Yes, you can add more hidden layers.
  • Why use NumPy instead of TensorFlow? NumPy helps you understand the core logic before using advanced libraries.
  • What is the role of activation functions? They introduce non-linearity to model complex relationships.

Conclusion

Building a neural network from scratch in Python helps you understand fundamental deep learning concepts. By implementing forward propagation, backpropagation, and gradient descent, you can train a simple network to make predictions.

Want to explore more? Try modifying the architecture or training the model on a different dataset!