Gradient Descent and Batch-Processing for Generative Models in PyTorch | by Nikolaus Correll | Jan, 2025

Published:


Step-by-step from fundamental concepts to training a basic generative model

Towards Data Science

Torch models can get pretty complicated pretty quickly, making it hard to see the forest for the trees. This is particularly the case once you are interested in more than basic regression and classification examples such as generative models using Transformers. Even though Torch provides powerful abstractions, most models come with a lot of custom code and boilerplate. This tutorial addresses machine learning and PyTorch basics that are necessary to understand generative models such as generating random sequences of text: (1) backpropagation of error and (2) batch processing. We will first implement a simple bigram model like in Andrej Karpathy’s “makemore” series, implement a simple model that is trained one example at the time, and then introduce Torch’ DataLoader class including padding. We will deliberately not using any of Torch’ neural network models, allowing you to focus on the tooling that goes around them. You can then build up on this example to learn specific neural network models such as Transformers or LSTMs.

Illustration of learning a bigram model using batch processing and gradient descent. StableDiffusion via ChatGPT prompt. Image: own.

Specifically, this walk-through will expose you to examples for the following concepts, helping both with fundamental understanding as well as the…

Related Updates

Recent Updates