Building a neural network from scratch in Python is an excellent way to understand how deep learning works under the hood. While libraries like TensorFlow and PyTorch simplify the process, coding a neural network manually helps solidify core concepts such as forward propagation, backpropagation, and optimization.
In this guide, we’ll walk through the step-by-step process of creating a simple neural network using only NumPy. By the end, you’ll have a working neural network that can make predictions based on given inputs, and you’ll understand how it learns from data.
Why Build a Neural Network from Scratch?
Before diving into the code, here are some key reasons why manually implementing a neural network is beneficial:
- Deep understanding: Learn how neural networks process information at a fundamental level.
- Mathematical intuition: Understand how weights, biases, and activation functions contribute to learning.
- Customization: Modify and experiment with different architectures without framework restrictions.
- Better debugging skills: Gain insights into how training works and how to troubleshoot neural network issues.

Prerequisites
Before we begin, make sure you have the following installed:
- Python (3.x recommended)
- NumPy (
pip install numpy
) - Matplotlib (optional, for visualization:
pip install matplotlib
)
Step 1: Import Necessary Libraries
We'll use NumPy for numerical computations, particularly for matrix operations and random number generation.
import numpy as np
Step 2: Initialize Neural Network Parameters
We will build a simple neural network with:
- 2 input neurons (for two input features)
- 2 hidden layer neurons (for learning patterns)
- 1 output neuron (to make a prediction)
Neural networks have weights (connections between neurons) and biases (threshold adjustments). We initialize these randomly:
# Set random seed for reproducibility
np.random.seed(42)
# Initialize weights and biases
weights_input_hidden = np.random.randn(2, 2)
bias_hidden = np.zeros((1, 2))
weights_hidden_output = np.random.randn(2, 1)
bias_output = np.zeros((1, 1))
Step 3: Define the Activation Function
Activation functions introduce non-linearity, allowing the network to model complex relationships. We use the sigmoid function because it maps values between 0 and 1.
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x) # Used for computing gradients
Step 4: Implement Forward Propagation
Forward propagation calculates the output of the neural network by passing inputs through the layers:
- Multiply input values by weights and add biases.
- Apply the activation function to get hidden layer values.
- Multiply hidden layer values by weights and apply activation again to get the final output.
def forward_propagation(X):
global hidden_layer_input, hidden_layer_output, output_layer_input, output
# Compute hidden layer
hidden_layer_input = np.dot(X, weights_input_hidden) + bias_hidden
hidden_layer_output = sigmoid(hidden_layer_input)
# Compute output layer
output_layer_input = np.dot(hidden_layer_output, weights_hidden_output) + bias_output
output = sigmoid(output_layer_input)
return output
Step 5: Compute the Loss
The loss function measures how different the predictions are from the actual values. We use Mean Squared Error (MSE):
def compute_loss(y_true, y_pred):
return np.mean((y_true - y_pred) ** 2)
Step 6: Implement Backpropagation
Backpropagation is the learning process where the network adjusts its weights based on the error.
- Calculate the error between predicted and actual values.
- Compute gradients using the derivative of the activation function.
- Update the weights and biases using gradient descent.
def backpropagation(X, y, learning_rate=0.1):
global weights_input_hidden, bias_hidden, weights_hidden_output, bias_output
# Calculate error
error = y - output
# Compute gradients for output layer
d_output = error * sigmoid_derivative(output)
# Compute error for hidden layer
error_hidden = d_output.dot(weights_hidden_output.T)
d_hidden = error_hidden * sigmoid_derivative(hidden_layer_output)
# Update weights and biases
weights_hidden_output += hidden_layer_output.T.dot(d_output) * learning_rate
bias_output += np.sum(d_output, axis=0, keepdims=True) * learning_rate
weights_input_hidden += X.T.dot(d_hidden) * learning_rate
bias_hidden += np.sum(d_hidden, axis=0, keepdims=True) * learning_rate
Step 7: Train the Neural Network
Now, let's define the training loop to update weights over multiple iterations:
# Sample dataset (XOR problem)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
# Training the network
epochs = 10000
learning_rate = 0.1
for epoch in range(epochs):
output = forward_propagation(X)
backpropagation(X, y, learning_rate)
# Print loss every 1000 epochs
if epoch % 1000 == 0:
loss = compute_loss(y, output)
print(f'Epoch {epoch}, Loss: {loss:.5f}')
Step 8: Test the Neural Network
After training, let's test the network:
output = forward_propagation(X)
print("Final Predictions:")
print(output)
Understanding the Results
The network should approximate the XOR function. The output values should be close to:
- 0, 1, 1, 0
Since we’re using the sigmoid function, values will be between 0 and 1 but should be close to the expected results.
Optimizing the Neural Network
To improve accuracy, consider:
- Increasing the number of hidden neurons.
- Using different activation functions (e.g., ReLU).
- Adjusting the learning rate.
- Training for more epochs.
FAQs
- Can I build deeper networks with this method? Yes, you can add more hidden layers.
- Why use NumPy instead of TensorFlow? NumPy helps you understand the core logic before using advanced libraries.
- What is the role of activation functions? They introduce non-linearity to model complex relationships.
Conclusion
Building a neural network from scratch in Python helps you understand fundamental deep learning concepts. By implementing forward propagation, backpropagation, and gradient descent, you can train a simple network to make predictions.
Want to explore more? Try modifying the architecture or training the model on a different dataset!