Gradients¶

Gradient Issues¶

Especially for FP32 or lower precision

	Gradients __ exponentially during back-propagation	Weight gradients	Cause	Solution
Vanishing (Converging)	shrink	Too small	Deep Networks	Weight-initialization Weights Scaling
Exploding (Diverging)	grow	Too large	Deep Networks	Weight-initialization Clipping
			Large loss due to target with large range*	Target normalization

A target variable with a large spread of values, in turn, may result in large error gradient values causing weight values to change dramatically, making the learning process unstable

rescales gradient to size at most \(\theta\).

\[ g \leftarrow \min \left( 1, \frac{\theta}{\vert g \vert} \right) g \]

If the weights are large, the gradients grow exponentially during back-propagation

2024-12-26