WAI Forward Developer Blog

How to Build a Neural Network from Scratch in Python

By: WAI Forward ltd on April 28, 2025

Building a neural network from scratch in Python is an excellent way to understand how deep learning works under the hood. While libraries like TensorFlow and PyTorch simplify the process, coding a neural network manually helps solidify core concepts such as forward propagation, backpropagation, and optimization.

In this guide, we’ll walk through the step-by-step process of creating a simple neural network using only NumPy. By the end, you’ll have a working neural network that can make predictions based on given inputs, and you’ll understand how it learns from data.

Why Build a Neural Network from Scratch?

Before diving into the code, here are some key reasons why manually implementing a neural network is beneficial:

Deep understanding: Learn how neural networks process information at a fundamental level.
Mathematical intuition: Understand how weights, biases, and activation functions contribute to learning.
Customization: Modify and experiment with different architectures without framework restrictions.
Better debugging skills: Gain insights into how training works and how to troubleshoot neural network issues.

Prerequisites

Before we begin, make sure you have the following installed:

Python (3.x recommended)
NumPy (pip install numpy)
Matplotlib (optional, for visualization: pip install matplotlib)

Step 1: Import Necessary Libraries

We'll use NumPy for numerical computations, particularly for matrix operations and random number generation.

import numpy as np

Step 2: Initialize Neural Network Parameters

We will build a simple neural network with:

2 input neurons (for two input features)
2 hidden layer neurons (for learning patterns)
1 output neuron (to make a prediction)

Neural networks have weights (connections between neurons) and biases (threshold adjustments). We initialize these randomly:

# Set random seed for reproducibility
np.random.seed(42)

# Initialize weights and biases
weights_input_hidden = np.random.randn(2, 2)
bias_hidden = np.zeros((1, 2))

weights_hidden_output = np.random.randn(2, 1)
bias_output = np.zeros((1, 1))

Step 3: Define the Activation Function

Activation functions introduce non-linearity, allowing the network to model complex relationships. We use the sigmoid function because it maps values between 0 and 1.

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)  # Used for computing gradients

Step 4: Implement Forward Propagation

Forward propagation calculates the output of the neural network by passing inputs through the layers:

Multiply input values by weights and add biases.
Apply the activation function to get hidden layer values.
Multiply hidden layer values by weights and apply activation again to get the final output.

def forward_propagation(X):
    global hidden_layer_input, hidden_layer_output, output_layer_input, output

    # Compute hidden layer
    hidden_layer_input = np.dot(X, weights_input_hidden) + bias_hidden
    hidden_layer_output = sigmoid(hidden_layer_input)

    # Compute output layer
    output_layer_input = np.dot(hidden_layer_output, weights_hidden_output) + bias_output
    output = sigmoid(output_layer_input)

    return output

Step 5: Compute the Loss

The loss function measures how different the predictions are from the actual values. We use Mean Squared Error (MSE):

def compute_loss(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

Step 6: Implement Backpropagation

Backpropagation is the learning process where the network adjusts its weights based on the error.

Calculate the error between predicted and actual values.
Compute gradients using the derivative of the activation function.
Update the weights and biases using gradient descent.

def backpropagation(X, y, learning_rate=0.1):
    global weights_input_hidden, bias_hidden, weights_hidden_output, bias_output

    # Calculate error
    error = y - output

    # Compute gradients for output layer
    d_output = error * sigmoid_derivative(output)

    # Compute error for hidden layer
    error_hidden = d_output.dot(weights_hidden_output.T)
    d_hidden = error_hidden * sigmoid_derivative(hidden_layer_output)

    # Update weights and biases
    weights_hidden_output += hidden_layer_output.T.dot(d_output) * learning_rate
    bias_output += np.sum(d_output, axis=0, keepdims=True) * learning_rate
    weights_input_hidden += X.T.dot(d_hidden) * learning_rate
    bias_hidden += np.sum(d_hidden, axis=0, keepdims=True) * learning_rate

Step 7: Train the Neural Network

Now, let's define the training loop to update weights over multiple iterations:

# Sample dataset (XOR problem)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Training the network
epochs = 10000
learning_rate = 0.1

for epoch in range(epochs):
    output = forward_propagation(X)
    backpropagation(X, y, learning_rate)

    # Print loss every 1000 epochs
    if epoch % 1000 == 0:
        loss = compute_loss(y, output)
        print(f'Epoch {epoch}, Loss: {loss:.5f}')

Step 8: Test the Neural Network

After training, let's test the network:

output = forward_propagation(X)
print("Final Predictions:")
print(output)

Understanding the Results

The network should approximate the XOR function. The output values should be close to:

0, 1, 1, 0

Since we’re using the sigmoid function, values will be between 0 and 1 but should be close to the expected results.

Optimizing the Neural Network

To improve accuracy, consider:

Increasing the number of hidden neurons.
Using different activation functions (e.g., ReLU).
Adjusting the learning rate.
Training for more epochs.

FAQs

Can I build deeper networks with this method? Yes, you can add more hidden layers.
Why use NumPy instead of TensorFlow? NumPy helps you understand the core logic before using advanced libraries.
What is the role of activation functions? They introduce non-linearity to model complex relationships.

Conclusion

Building a neural network from scratch in Python helps you understand fundamental deep learning concepts. By implementing forward propagation, backpropagation, and gradient descent, you can train a simple network to make predictions.

Want to explore more? Try modifying the architecture or training the model on a different dataset!

Back to Blog