What is a Neural Network? by Alberto Pinedo ??
Do you remember when in school we had to learn the multiplication tables? From one to ten? Well, that memorization exercise made us with 8 years math genius… or not.
In school they taught those multiplications and since then we hold it inside our minds, automatically. It was really easy 8×8 =64, 2×2=4, 3×5=15, etc. Although that memorization helped us to go thru the semester, is didn’t help with the next one, that is, to solve 12×12=144, 315×212=66,789 or any other multiplication.
To complete the analogy, memorize the ten multiplication tables doesn’t allow us to solve (by memorization) multiplications more complex. What allows us to solve more complex ones is our ability to understand how these multiplications can be done.
The overfitting is something similar, when our model adjusts too much to the training data (memorize the tables from 1 to 10) and its not capable of generalize enough to solve complex multiplications is when overfitting happens.
If we take this reflection to a classification model, like we see in this graphic the model with the green line, adjusts too much to the training data. Is very probably that in front of new data this model will give a lot of mistakes compare to the model with the black line.
Is clear that the overfitting just like the underfitting is a problem in the neural network’s models, one because over adjusts and the other one because it’s too general.
How to solve the overfitting?
To solve this problem, exists several different technics, in this post we will talk about four of them. Sometimes with just one of them is enough, but in some other cases is necessary apply more than one technic to solve “our problem”.
- Increase the training data
The first technic we can use is increasing the training data, as much data we provide, we would get better results. By enlarging our data, we will also enlarge the diversity from them. Like Satya Nadella, Microsoft’s CEO, said: the diversity enriches.
For example, if we train our model with cat’s images Siamese and Persians, it will be really good identifying those families, but usually often inefficient giving a lot of mistakes giving results with Sphinxes cats.
- Data Augmentation
Sometimes happens that we don’t have enough data, or they aren’t diverse. With this technic we can improve them. How it works?
This technic consists in modifying in a reasonable way the data group we have. We must take all of them (our cats images) modify them by rotating, turning them into black and white, or adding “noise” to them.
This will allow us to avoid the overfitting.
- Reducing layers or neurons
Another technic that can be used to reduce the overfitting is to lower the number of layers form our neural network or the neurons number in the layer. In this case, we must have very clear where is the biggest rate of errors and try to decrease that layer/neuron.
- Dropout
In this technic we must introduce the dropout in our neural network random disactivating neurons and their ability to predict. The difference with the previous technic is that the dropout introduces the randomness in the deactivation. With this technic we obtain that our neural network generalizes much better.
The dropout deletes the prediction “effect” of the neurons in a random way, helping our model to generalize and avoid the overfitting.
This article was written by Alberto Pinedo, Spain Digital transformation Lead at Microsoft.