Main Blog

Neural Network Guide: Everything You Need to Know

Let's talk about neural networks. We are going to give a presentation of sophisticated information in such a simple way that even a child will understand. We studied the basics and delved into the topic from scratch.

Machine learning, data science, neural networks - these areas are not only extremely interesting but also quite complex. Let’s start with explaining what it is and talk about the basic concepts.

Neuron

... is a basic unit of a neural network. Each neuron has a certain number of inputs which receive signals that are summed up taking into account the significance (weight) of each input. Next, the signals are fed to the inputs of other neurons. The weight of each such “knot” can be both positive and negative. For example, if a neuron has four inputs, then it has four weight values that can be adjusted independently of each other.

An artificial neural network imitates the operation of a natural neural network - the human brain - and is used to create machines with artificial intelligence. As a rule, for teaching AI you need a “teacher" - a set of information with certain parameters, values and indicators.

Compounds

... bind neurons to each other. The weight value is directly related to the connection, and the goal of the training is to update the weight of each connection so that there are no errors in the future.

Displacement

... is an additional input for a neuron, which is always 1 and, therefore, has its own weight. This ensures that even when all inputs are zero, the neuron will be active.

The activation function

... is used to introduce non-linearity into the neural network. It determines the output value of the neuron which will depend on the total value of the inputs and the threshold value.
This function also determines which neurons need to be activated, and, therefore, what information will be transmitted to the next layer. Thanks to the activation function, deep networks can be trained.

The input layer

... is the first layer in the neural network that receives incoming signals and transfers them to subsequent levels.

The hidden (computational) layer

... applies various transformations to the input data. All neurons in the hidden layer are associated with each neuron in the next layer.

The output layer

... is the last layer in the network that receives data from the last hidden layer. With it, we can get the right amount of values in the desired range.

Weight

... represents the strength of the connection between neurons. For example, if the weight of the connection of nodes 1 and 3 is bigger than that of nodes 2 and 3, this means that neuron 1 has a greater effect on neuron 3. Zero weight means that input changes will not affect the output. A negative weight indicates that increasing the input will decrease the output. Weight determines the effect of input on output.

Direct propagation

... is the process of transmitting input values to a neural network and receiving output, which is called the predicted value. When the input values are transferred to the first layer of the neural network, the process passes without any operations. The second level of the network takes the values of the first level, and after the operations of multiplication and activation passes the values further. The same process occurs in deeper layers.

Reverse error propagation.

After direct distribution, we get a value which is called predicted. To calculate the error, we compare the predicted value with the actual value using the loss function. Then we can calculate the derivative of the error value for each weight in the neural network.
The back propagation method uses differential calculus rules. Gradients (derivatives of error values) are calculated by the weight values of the last layer of the neural network (error signals propagate in the direction opposite to the direct propagation of signals) and are used to calculate layer gradients.
This process is repeated until the gradients of each weight in the neural network are calculated. Then, the gradient value is subtracted from the weight value to reduce the error value. This allows you to achieve minimal losses.

Learning speed

... is a characteristic that is used during the training of neural networks. It determines how quickly the weight value is updated during the back propagation process. The learning speed should be high, but not too much, otherwise, the algorithm will diverge. If the learning speed is too low, the algorithm will converge for a very long time and get stuck in a local minimum.

Convergence

... is a phenomenon when, during the iteration, the output signal becomes closer to a certain value. In order to avoid retraining (problems with new data due to high speed), regularization is used - reducing the complexity of the model while maintaining the parameters. In this case, the loss and weight vector (the vector of the studied parameters in this algorithm) is taken into account.

Data normalization

... is the process of changing the scale of one or more parameters in the range from 0 to 1. This method should be used if you do not know how your data is distributed. Also, with its help, you can speed up learning.

Fully connected layers

It is worth mentioning such a term as fully connected layers. This means that the activity of all nodes in one layer goes to each node in the next. In this case, the layers will be completely connected.

Using the loss function, you can calculate the error in a specific part of the training. This is the average value of the function for learning:

‘Mse’ - for a quadratic error
‘Binary_crossentropy’ - for binary logarithmic
‘Categorical_crossentropy’ - for multiclass logarithmic

To update the scales, the model uses optimizers:

SGD (Stochastic Gradient Descent) for optimizing momentum.
RMSprop - adaptive optimization of learning speed according to the Jeff Hinton method.
Adam is an adaptive moment score that also uses adaptive learning speed.

Performance metrics are used to measure neural network performance. Accuracy, loss, verification accuracy, evaluation are just some of the indicators.

Batch size

... is a number of training examples per iteration. The larger the batch size, the more space will be needed.

The number of eras

... shows how many times the model is exposed to learning. Epoch is one passage forward or backward for all learning examples.

Artificial neural network

So what is an artificial neural network? This is a system of neurons that interact with each other. Each neuron receives signals or sends them to other processors (neurons). United in one large network, neurons that are learning can perform complex tasks.