What not to do?
In weight initialization techniques for deep learning, here are a few simple things to avoid:
Don't initialize all weights to zero: It can cause all neurons to learn the same features and hinder model performance. Use random initialization instead.
Avoid large weight initialization: Large values can lead to unstable training. Prevent this by not initializing weights with excessively large values.
Be consistent in weight initialization: Use the same initialization method for all layers to maintain consistency in learning across the network.
Consider the activation functions: Different activation functions require different initialization techniques. Don't ignore the impact of activation functions on weight initialization.
Avoid overly complex initialization: Don't complicate weight initialization unnecessarily. Simpler methods like Xavier or He initialization often work well in practice.
Random initialization with small or large weights
What to do?
Xavier init (Normal):
$$Random(n,n)*\sqrt\frac{1}{n}$$
n is the number of dimensions.
He init (Normal):
$$Random(n,n)*\sqrt\frac{2}{n}$$
Xaver init (Uniform):
$$limit = \sqrt\frac{6}{fan\_in+fan\_out}$$
numbers are generated in range [-limit, limit].
He init (Uniform):
$$limit = \sqrt\frac{6}{fan\_in}$$
numbers are generated in range [-limit, limit].