Optimization in PyTorch#

Optimization methods for neural networks are for the most part variations on Gradient Descent. In this session, we will look at vanilla GD, and some of its descendants.

Learning goals for this session#

  1. Understand the basics of gradient descent.

  2. Get familiar with variations of GD.

  3. Learn how to use GD to optimize an RSA model.

Slides#

Here are the slides for this session.

Practical exercises#

There are two notebooks for exercises: one on vanilla gradient descent, the other on optimizing parameters for an RSA model. You can also find the extracted Python code for both notebooks on the GitHub repository for this web-book.

Additional resources#

There are many great resources covering gradient descent methods. To single out one concise overview, there is this paper supported by this blog post.