Gradients __ exponentially during back-propagation
Weight gradients
Cause
Solution
Vanishing (Converging)
shrink
Too small
Deep Networks
Weight-initialization Weights Scaling
Exploding (Diverging)
grow
Too large
Deep Networks
Weight-initialization Clipping
Large loss due to target with large range*
Target normalization
A target variable with a large spread of values, in turn, may result in large error gradient values causing weight values to change dramatically, making the learning process unstable