
The Transformer neural network architecture
The Transformer neural networks, explained in details. Continue reading The Transformer neural network architecture
The Transformer neural networks, explained in details. Continue reading The Transformer neural network architecture
Introduction to and comparison of Batch Norm, Weight Norm, Layer Norm, Instance Norm, and Group Norm. Continue reading Deep Learning normalization methods
Get your first intuition with Attention right with a minimal code-example. Continue reading Attention in Deep Learning, your starting point (with code)
Where and how you should start with LSTM? Continue reading LSTM: where to start?
Through various experiments, ELU is accepted by many researchers as a good successor of the original version (ReLU). Continue reading ELU activation: A comprehensive analysis
Batch Normalization (BatchNorm) is a very frequently used technique in Deep Learning, however, the reason why it works is often interpreted ambiguously. Continue reading Batch Normalization and why it works
Despise its simplicity, ReLU previously achieved the top performance over various tasks of modern Machine Learning. Continue reading Rectifier Linear Unit (ReLU)
The sigmoid and tanh activation functions were very frequently used in the past but have been losing popularity in the era of Deep learning. Continue reading Sigmoid, tanh activations and their loss of popularity