Activation Functions

Activation functions are essential in neural networks because they introduce non-linearity, enabling the network to model complex relationships in the data. They help the network learn and capture non-linear patterns, making it more powerful and capable of solving a wide range of tasks, including classification, regression, and more.

Why activation functions are needed?

All activation functions:

Sigmoid activation function:

$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

Advantages
1. Output is in between [0,1].
2. Non-linear function.
3. Differentiable
Disadvantages
1. Saturating function
2. Non zero centered
3. Computationally expensive

Tanh activation function:

$$\tanh(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$$

Advantages
1. Non-linear
2. Differentiable
3. Zero centered
Disadvantages
1. Saturating function (Vanishing gradient problem)
2. Computationally expensive

Relu activation function:

$$\text{ReLU}(x) = \max(0, x)$$

Advantages
1. Non-linear
2. Not Saturated in the positive region
3. Computationally inexpensive
4. Convergence is faster than Tanh and Sigmoid
Disadvantages
1. Not completely differentiable
2. None zero centered