Note: Adam is currently the most powerful technique.
Adam (Adaptive Moment Estimation) is an optimization algorithm used in deep learning that combines the benefits of both momentum and RMSprop. It calculates adaptive learning rates for each parameter based on the estimation of the first and second moments of the gradients. Adam maintains an exponentially decaying average of past gradients and their squares, allowing it to adaptively adjust the learning rates for different parameters. This helps the algorithm converge quickly and robustly. Adam is known for its effectiveness in handling large-scale datasets, dealing with noisy or sparse gradients, and requires less manual tuning compared to other optimization methods.