Why Initialize a Neural Network with Random Weight?

‍Introduction

‍

Neural Networks consist of some of the few basic building blocks around which the whole deep learning sphere revolves, here are the fundamentals that we need to know before gaining an understanding of why we initialize a Neural network with random weights after all.

‍

Layers, which are combined into a network
Input data and corresponding targets
Loss function, which defines the feedback signal used for learning
The optimizer determines how weights are adjusted in-network or model.

‍

In short, a neural network model is nothing but a set of layers that contains weights that minimizes the loss function of train and validation data. The loss function sends feedback signals back to layers through algorithms called optimizers, like Stochastic gradient descent.

‍

Here is when adjustments of weights come into the picture. The optimizer adjusts the weights of the previous iteration to maximize the accuracy. A model trained using Stochastic gradient descent uses randomness when adjusting weights to find optimal weights in map inputs to outputs that need to be learned.

‍

In this blog, we will gain a deeper understanding of the optimization process of SGDs and the need of random initialization in Neural network models.

‍

In order to discover a suitable collection of weights for the particular input-to-output mapping function in your data that is being learned, the technique leverages randomization. It implies that every time the training algorithm is run, a different network with a different model skill will match your individual network on your specific training data.

Weights are present between every two layers in neural networks. The values of the subsequent layer are produced by the linear transformation of these weights and the values in the preceding layers through a non-linear activation function. Layers are passed over one another during forward propagation, and by using backpropagation, the ideal weight values for a specific input can be determined.

By far, this is the best approach to train deep neural networks and find an optimal model which is sufficiently skilled and generalizes well to unseen data.

In particular, initializing the weights of the network to small random values is necessary for stochastic gradient descent (random, but close to zero, such as in [0.0, 0.1]). During the search process, the training dataset is randomly shuffled before each epoch, which causes variations in the gradient estimate for each batch.

‍

What if we initialize the weights with Zero?

Zero initialization is useless. Symmetry-breaking is not carried out by the neural network. In this situation, the learning algorithm's equations would be unable to alter the network weights, and the model would become stuck. It is crucial to remember that each neuron's bias weight is by design fixed to zero rather than a seemingly random amount.

w=np.zeros((layer_size[l],layer_size[l-1]))

‍

Random Initialization

This improves precision and aids in the symmetry-breaking process. The weights are initialized arbitrarily in this technique, but very nearly at zero.

w=np.random.randn(layer_size[l],layer_size[l-1])*0.01

‍

Python’s Keras library has initializer methods that can be used while building deep learning models. Below is one of the examples of initializers.

‍Kera’s Random Initialization methods

Below are the methods available in Keras for random initialization of the neural network.

‍

RandomNormal class - Initializer that generates tensors with a normal distribution
RandomUniform class - Initializer that generates tensors with a uniform distribution.
TruncatedNormal class - Initializer that generates a truncated normal distribution.
Zeros class - Initializer that generates tensors initialized to
Ones class - Initializer that generates tensors initialized to 1
GlorotNormal class - The Glorot normal initializer, also called Xavier normal initializer
GlorotUniform class - The Glorot uniform initializer, also called Xavier uniform initializer.
HeNormal class - He normal initializer.
HeUniform class - He uniform variance scaling initializer
Identity class - Initializer that generates the identity matrix.
Orthogonal class - Initializer that generates an orthogonal matrix.
Constant class - Initializer that generates tensors with constant values.
VarianceScaling class - Initializer capable of adapting its scale to the shape of weights tensors.

Benefit of Random initialization of weights in Neural Network model

A good use-case can be word-embedding algorithms used in vectorization of tokens in NLP tasks. Pre-trained word embedding algorithms like Word2vec or Glove are trained on large text data taken from Wikipedia. They can be used in NLP tasks but they fail to capture semantics of each NLP task while embedding layers initialized with random weights gradually adjusted via backpropagation, structuring space into something the downstream model can exploit and this kind of structure specializes for the specific problem that we are solving.

Conclusion

In this blog, we learned how a neural network model optimizes model parameters using a stochastic gradient descent algorithm. And the need of random initialization in neural networks along with kera’s random initialization methods.

References:

[1] Deep learning with python by François chollet

[2] https://en.wikipedia.org/wiki/Stochastic_gradient_descent

[3] https://machinelearningmastery.com/why-initialize-a-neural-network-with-random-weights/

[4] https://keras.io/api/layers/initializers/

[5] https://towardsdatascience.com/random-initialization-for-neural-networks-a-thing-of-the-past-bfcdd806bf

Why Initialize a Neural Network with Random Weight?

‍Introduction

Table of contents:

Stochastic gradient descent algorithm

Random Initialization in Neural Networks

What if we initialize the weights with Zero?

‍Kera’s Random Initialization methods

Benefit of Random initialization of weights in Neural Network model

Conclusion

Related Articles

Making AI Deployment Affordable and Scalable: Cost Efficiency of Quantization

Interpretable vs. Black-Box Models: A Comprehensive Exploration on Early Prediction under Uncertainty

Generative AI in Healthcare: Applications, Benefits, and Its Future

GPU Cloud

Company

Legal & Policies

Investor Relations

Resources