Neural network illustration — nodes and connections emerging from code

Building a neural network from scratch in Python is one of the best ways to truly understand how deep learning works under the hood. While high-level libraries like :contentReference[oaicite:0]{index=0} or :contentReference[oaicite:1]{index=1} make neural network architecture and training easy, writing your own implementation forces you to engage with the core mechanics — forward propagation, back-propagation, weight updates, activation functions — everything that drives a model’s ability to learn.

In this guide, we’ll walk through step-by-step how to create a simple feed-forward neural network using only :contentReference[oaicite:2]{index=2}. By the end of this journey, you’ll have built a functioning network capable of learning from data, and you’ll have developed insight into how each component contributes to learning.

Why Build a Neural Network from Scratch?

There are several compelling reasons to roll up your sleeves and build a neural network ‘by hand’:

  • Deep understanding: You’ll learn exactly how inputs, weights, biases, and activations interact to produce outputs and enable learning.
  • Mathematical intuition: When you manually code the derivatives and update steps, the theory becomes clear and meaningful.
  • Customization at will: Without a framework’s abstractions, you’re free to explore unusual architectures or tweak every detail.
  • Better debugging skills: Since you’ve written each component yourself, when things break you’ll know exactly where the problem lies.

Prerequisites

Before we dive in, make sure you’re prepared:

  • Python (3.x recommended)
  • NumPy (pip install numpy)
  • Matplotlib (optional, for visualising training progress: pip install matplotlib)

Step 1: Import the Basics

We’ll use NumPy for all numerical computations and random initialisations:

import numpy as np

Step 2: Initialise Network Parameters

The network we’ll build has 2 input neurons (two features), 2 hidden neurons, and 1 output neuron. We will randomly initialise weights and biases for both layers:

# Set random seed for reproducibility
np.random.seed(42)

# Initialise weights and biases
weights_input_hidden  = np.random.randn(2, 2)
bias_hidden            = np.zeros((1, 2))
weights_hidden_output = np.random.randn(2, 1)
bias_output            = np.zeros((1, 1))

Step 3: Define the Activation Function

We’ll start with the sigmoid activation function, which transforms inputs into values between 0 and 1 — a useful range when you’re interpreting outputs as probabilities.

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)  # used for back-propagation

Step 4: Implement Forward Propagation

Forward propagation computes predictions by passing input through the network:

  • Multiply inputs by weights and add biases to compute the hidden layer.
  • Apply the activation function on the hidden layer.
  • Multiply hidden layer output by output weights, add bias, then apply activation again to compute final output.
def forward_propagation(X):
    global hidden_layer_input, hidden_layer_output, output_layer_input, output
    hidden_layer_input  = np.dot(X, weights_input_hidden) + bias_hidden
    hidden_layer_output = sigmoid(hidden_layer_input)
    output_layer_input  = np.dot(hidden_layer_output, weights_hidden_output) + bias_output
    output              = sigmoid(output_layer_input)
    return output

Step 5: Compute the Loss

The loss function tells us how far the model’s predictions are from the true values. Here, we use Mean Squared Error (MSE):

def compute_loss(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

Step 6: Implement Back-Propagation

Back-propagation is the heart of learning — it computes gradients and updates weights and biases via gradient descent:

  • Compute the error at output layer: (true − predicted).
  • Use the derivative of the activation function to compute gradients.
  • Propagate that error back to the hidden layer.
  • Update weights and biases accordingly.
def backpropagation(X, y, learning_rate=0.1):
    global weights_input_hidden, bias_hidden, weights_hidden_output, bias_output

    # Error at output
    error      = y - output
    d_output   = error * sigmoid_derivative(output)

    # Error at hidden layer
    error_hidden = d_output.dot(weights_hidden_output.T)
    d_hidden     = error_hidden * sigmoid_derivative(hidden_layer_output)

    # Update weights and biases
    weights_hidden_output    += hidden_layer_output.T.dot(d_output) * learning_rate
    bias_output              += np.sum(d_output, axis=0, keepdims=True) * learning_rate
    weights_input_hidden     += X.T.dot(d_hidden) * learning_rate
    bias_hidden              += np.sum(d_hidden, axis=0, keepdims=True) * learning_rate

Step 7: Train the Neural Network

Now we’ll train the network on a simple dataset (XOR problem) and observe how the loss reduces over time:

# Sample dataset (XOR problem)
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])
y = np.array([[0],
              [1],
              [1],
              [0]])

epochs        = 10000
learning_rate = 0.1

for epoch in range(epochs):
    output = forward_propagation(X)
    backpropagation(X, y, learning_rate)

    if epoch % 1000 == 0:
        loss = compute_loss(y, output)
        print(f'Epoch {epoch}, Loss: {loss:.5f}')

Step 8: Test the Neural Network

After training, let’s inspect the network’s predictions:

output = forward_propagation(X)
print("Final Predictions:")
print(output)

Ideally, the model outputs values close to 0, 1, 1, 0 for the four XOR inputs (since we used a sigmoid, values will be probabilities close to 0 or 1).

Understanding the Results

Even though this is a small network, you now understand how each layer, weight, bias and activation contributes to learning. You can feel confident modifying the architecture, experimenting with different activation functions, or tuning the learning rate.

Optimising the Network

Here are some ways to move beyond this simple model:

  • Increase the number of hidden neurons or add more hidden layers.
  • Switch to an activation like ReLU to improve convergence.
  • Tune the learning rate or apply optimisers like Adam instead of vanilla gradient descent.
  • Train on a more complex dataset and visualise your results with Matplotlib.

FAQs

  • Can I build deeper networks with this manual method?
    Yes — you can add more hidden layers and neurons. Just update parameter shapes and loops accordingly.
  • Why use NumPy instead of TensorFlow or PyTorch?
    Implementing the core logic yourself gives you insight into how neural networks really work before using higher-level libraries.
  • What is the role of activation functions?
    Activations introduce non-linearities into your network so it can model complex patterns.
  • How do I know if the network is learning?
    Track the loss over epochs — if it decreases and predictions improve, your network is learning.
  • What’s the next step after this?
    Try small real-world datasets, build a network for regression or multiclass classification, and gradually move to TensorFlow or PyTorch.

Conclusion

Building a neural network from scratch in Python is more than a tutorial — it’s a deep dive into how machines learn from data. By creating every component yourself — forward propagation, back-propagation, loss functions — you gain a level of clarity most developers never experience.

Armed with this foundation, you’re ready to explore advanced topics like convolutional networks, recurrent networks, and production-ready deployments. Start experimenting now, and you’ll see how much clearer the “magic” of deep learning really is.

Related Posts