Koding Books

Professional, free coding tutorials

Neural Networks, an introduction and implementation in C++

Neural networks are a type of machine learning algorithm that is modelled after the structure and function of the human brain. They comprise layers of interconnected nodes, or “neurons,” that process and transmit information. Each neuron receives input from other neurons, processes that input, and then sends its output to other neurons in the next layer.

Neural networks are used for various tasks, including image and speech recognition, natural language processing, and predictive analytics. They are particularly useful for tasks that involve large amounts of data and complex patterns.

To build a neural network, you typically start by defining the architecture of the network, including the number of layers and neurons in each layer. You then train the network on a dataset, adjusting the weights and biases of the neurons to minimize the error between the network’s predictions and the actual values in the dataset.

You can use the network to predict new data once the network is trained. Neural networks are powerful tools for machine learning.

Types of neural network

There are many different types of neural networks, each with its architecture and application. Here are a few examples:

  1. Feedforward neural networks: These are the most basic type of neural network, consisting of an input layer, one or more hidden layers, and an output layer. The information flows in one direction, from the input to the output layer, without feedback loops.
  2. Convolutional neural networks (CNNs) are commonly used for image and video recognition tasks. They use convolutional layers to extract features from the input data and pooling layers to reduce the dimensionality of the data.
  3. Recurrent neural networks (RNNs) are commonly used for natural language processing and speech recognition tasks. They use feedback loops to allow information to persist over time and can process input data sequences.
  4. Long short-term memory (LSTM) networks: These are a type of RNN designed to handle the vanishing gradient problem, which can occur when training RNNs on long data sequences.
  5. Autoencoder neural networks: These are used for unsupervised learning tasks, such as dimensionality reduction and data compression. They consist of an encoder network that maps the input data to a lower-dimensional representation and a decoder network that reconstructs the original data from the lower-dimensional representation.
  6. Generative adversarial networks (GANs): These are used for generating new data similar to a given dataset. They consist of two networks: a generator network that generates new data and a discriminator network that distinguishes between generated and real data.

Activation Functions

An activation function is a mathematical function that is applied to the output of a neuron in a neural network. The purpose of the activation function is to introduce non-linearity into the output of the neuron, which allows the network to learn more complex patterns in the input data.

  1. A sigmoid function maps any input value to a value between 0 and 1. It is often used in the output layer of a binary classification problem.
  2. ReLU (Rectified Linear Unit) function: This function returns the input value if it is positive and 0 if it is negative. It is often used in the hidden layers of a neural network.
  3. Tanh (Hyperbolic Tangent) function maps any input value to a value between -1 and 1. It is often used in the output layer of a multi-class classification problem.
  4. Softmax function maps a vector of input values to a probability distribution over the classes. It is often used in the output layer of a multi-class classification problem.

Many other activation functions can be used in neural networks, and the choice of function depends on the specific problem being solved and the architecture of the network.

Backpropagation

Backpropagation is a common algorithm used to train neural networks. It is a supervised learning algorithm that adjusts the weights and biases of the neurons in the network to minimize the error between the network’s predictions and the actual values in the training dataset.

The backpropagation algorithm propagates the error backwards through the network, from the output to the input layer. It calculates the error gradient concerning the weights and biases of each neuron and then uses this gradient to update the weights and biases in the opposite direction of the gradient.

Updating the weights and biases is repeated multiple times, or epochs until the error is minimized to an acceptable level. The learning rate, which determines the size of the weight and bias updates, is an important hyperparameter that can affect the network’s performance.

Backpropagation is a powerful algorithm that allows neural networks to learn complex patterns in data. However, it can also be computationally expensive and prone to overfitting if not carefully tuned. Many variations and extensions of the backpropagation algorithm have been developed to address these issues and improve the performance of neural networks.

Example | A Basic Neural Network in C++

This code defines a simple neural network with one hidden layer and trains it on a small dataset. The network uses the ReLU activation function in the hidden layer and the sigmoid activation function in the output layer. The code also includes a test on new data to demonstrate how the trained network can be used for prediction.

Code

#include <iostream>
#include <Eigen/Dense>

using namespace Eigen;

int main()
{
    // Define the input data
    MatrixXd X(4, 3);
    X << 1, 2, 3,
         4, 5, 6,
         7, 8, 9,
         10, 11, 12;

    // Define the output data
    MatrixXd y(4, 1);
    y << 0.5,
         0.8,
         0.2,
         0.6;

    // Define the neural network architecture
    int input_size = 3;
    int hidden_size = 4;
    int output_size = 1;

    // Initialize the weights and biases
    MatrixXd W1 = MatrixXd::Random(input_size, hidden_size);
    MatrixXd b1 = MatrixXd::Zero(1, hidden_size);
    MatrixXd W2 = MatrixXd::Random(hidden_size, output_size);
    MatrixXd b2 = MatrixXd::Zero(1, output_size);

    // Define the learning rate and number of epochs
    double learning_rate = 0.01;
    int num_epochs = 1000;

    // Train the neural network using backpropagation
    for (int i = 0; i < num_epochs; i++) {
        // Forward pass
        MatrixXd z1 = X * W1 + b1.replicate(X.rows(), 1);
        MatrixXd a1 = z1.array().max(0);
        MatrixXd z2 = a1 * W2 + b2.replicate(a1.rows(), 1);
        MatrixXd y_pred = z2.array().sigmoid();

        // Backward pass
        MatrixXd d3 = (y_pred - y).array() * y_pred.array() * (1 - y_pred.array());
        MatrixXd d2 = (d3 * W2.transpose()).array() * (z1.array() > 0).cast<double>().array();
        MatrixXd dW2 = a1.transpose() * d3;
        MatrixXd db2 = d3.colwise().sum();
        MatrixXd dW1 = X.transpose() * d2;
        MatrixXd db1 = d2.colwise().sum();

        // Update the weights and biases
        W1 -= learning_rate * dW1;
        b1 -= learning_rate * db1;
        W2 -= learning_rate * dW2;
        b2 -= learning_rate * db2;
    }

    // Test the neural network on new data
    MatrixXd X_test(2, 3);
    X_test << 2, 4, 6,
              8, 10, 12;

    MatrixXd z1_test = X_test * W1 + b1.replicate(X_test.rows(), 1);
    MatrixXd a1_test = z1_test.array().max(0);
    MatrixXd z2_test = a1_test * W2 + b2.replicate(a1_test.rows(), 1);
    MatrixXd y_pred_test = z2_test.array().sigmoid();

    std::cout << "Predictions: " << std::endl << y_pred_test << std::endl;

    return 0;
}

Testing the neural network

#include <iostream>
#include <Eigen/Dense>

using namespace Eigen;

int main()
{
    // Define the input data
    MatrixXd X(4, 3);
    X << 1, 2, 3,
         4, 5, 6,
         7, 8, 9,
         10, 11, 12;

    // Define the output data
    MatrixXd y(4, 1);
    y << 0.5,
         0.8,
         0.2,
         0.6;

    // Define the neural network architecture
    int input_size = 3;
    int hidden_size = 4;
    int output_size = 1;

    // Initialize the weights and biases
    MatrixXd W1 = MatrixXd::Random(input_size, hidden_size);
    MatrixXd b1 = MatrixXd::Zero(1, hidden_size);
    MatrixXd W2 = MatrixXd::Random(hidden_size, output_size);
    MatrixXd b2 = MatrixXd::Zero(1, output_size);

    // Define the learning rate and number of epochs
    double learning_rate = 0.01;
    int num_epochs = 1000;

    // Train the neural network using backpropagation
    for (int i = 0; i < num_epochs; i++) {
        // Forward pass
        MatrixXd z1 = X * W1 + b1.replicate(X.rows(), 1);
        MatrixXd a1 = z1.array().max(0);
        MatrixXd z2 = a1 * W2 + b2.replicate(a1.rows(), 1);
        MatrixXd y_pred = z2.array().sigmoid();

        // Backward pass
        MatrixXd d3 = (y_pred - y).array() * y_pred.array() * (1 - y_pred.array());
        MatrixXd d2 = (d3 * W2.transpose()).array() * (z1.array() > 0).cast<double>().array();
        MatrixXd dW2 = a1.transpose() * d3;
        MatrixXd db2 = d3.colwise().sum();
        MatrixXd dW1 = X.transpose() * d2;
        MatrixXd db1 = d2.colwise().sum();

        // Update the weights and biases
        W1 -= learning_rate * dW1;
        b1 -= learning_rate * db1;
        W2 -= learning_rate * dW2;
        b2 -= learning_rate * db2;
    }

    // Test the neural network on new data
    MatrixXd X_test(2, 3);
    X_test << 2, 4, 6,
              8, 10, 12;

    MatrixXd z1_test = X_test * W1 + b1.replicate(X_test.rows(), 1);
    MatrixXd a1_test = z1_test.array().max(0);
    MatrixXd z2_test = a1_test * W2 + b2.replicate(a1_test.rows(), 1);
    MatrixXd y_pred_test = z2_test.array().sigmoid();

    // Check that the predictions are within a certain tolerance of the expected values
    double tolerance = 0.1;
    if ((y_pred_test(0, 0) - 0.7).abs() > tolerance ||
        (y_pred_test(1, 0) - 0.9).abs() > tolerance) {
        std::cout << "Test failed: predictions are not within tolerance" << std::endl;
        return 1;
    }

    std::cout << "Test passed: predictions are within tolerance" << std::endl;
    return 0;
}

This test checks that the neural network’s predictions on new data are within a certain tolerance of the expected values. The test fails and returns a non-zero exit code if the predictions are not within the tolerance. If the predictions are within the tolerance, the test passes and returns a zero exit code.

The Last Byte…

Neural networks are a type of machine learning algorithm that is inspired by the structure and function of the human brain. They consist of interconnected nodes, or neurons, that process information and learn to make predictions or decisions based on that information.

Neural networks can be used for various tasks, including image and speech recognition, natural language processing, and predictive modelling. They are particularly useful for tasks that involve complex patterns in the input data that are difficult to capture using traditional machine learning algorithms.

The architecture of a neural network can vary depending on the problem being solved but typically consists of an input layer, one or more hidden layers, and an output layer. Each layer is composed of multiple neurons, which are connected to neurons in the previous and next layers.

The weights and biases of the neurons in a neural network are adjusted during the training process to optimize the network’s performance on a given task. This is typically done using a supervised learning algorithm called backpropagation, which adjusts the weights and biases based on the error between the network’s predictions and the actual values in the training dataset.

There are many different types of neural networks, each with its architecture and application. Some common types include feedforward neural networks, convolutional neural networks, recurrent neural networks, and autoencoder neural networks.

Neural networks have become increasingly popular in recent years due to their ability to learn complex patterns in data and achieve state-of-the-art performance on many machine-learning tasks. However, they can also be computationally expensive and prone to overfitting if not carefully tuned.

Ali Kayani

https://www.linkedin.com/in/ali-kayani-silvercoder007/

Post navigation

Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

The Term Structure of Commodity Prices: A Mathematical and Computational Perspective

Building High-Performance Market Data Processors in C++

Python Decorators

Rust in Action