Weight Initialization Techniques

What not to do?

In weight initialization techniques for deep learning, here are a few simple things to avoid:

  1. Don't initialize all weights to zero: It can cause all neurons to learn the same features and hinder model performance. Use random initialization instead.

  2. Avoid large weight initialization: Large values can lead to unstable training. Prevent this by not initializing weights with excessively large values.

  3. Be consistent in weight initialization: Use the same initialization method for all layers to maintain consistency in learning across the network.

  4. Consider the activation functions: Different activation functions require different initialization techniques. Don't ignore the impact of activation functions on weight initialization.

  5. Avoid overly complex initialization: Don't complicate weight initialization unnecessarily. Simpler methods like Xavier or He initialization often work well in practice.

  6. Random initialization with small or large weights

What to do?

Xavier init (Normal):

$$Random(n,n)*\sqrt\frac{1}{n}$$

n is the number of dimensions.

He init (Normal):

$$Random(n,n)*\sqrt\frac{2}{n}$$

Xaver init (Uniform):

$$limit = \sqrt\frac{6}{fan\_in+fan\_out}$$

numbers are generated in range [-limit, limit].

He init (Uniform):

$$limit = \sqrt\frac{6}{fan\_in}$$

numbers are generated in range [-limit, limit].