torch.nn.Conv2d vs torch.nn.ConvTranspose2d in Machine Learning

Table of Contents

torch.nn.Conv2d and torch.nn.ConvTranspose2d are opposite operations:

Conv2d (Standard Convolution)

Reduces spatial dimensions (downsampling)
Used for feature extraction
Output size is smaller than input (unless padding compensates)
Formula: output_size = floor((input_size + 2*padding - kernel_size) / stride) + 1

Increases spatial dimensions (upsampling)
Used for reconstruction/generation (e.g., decoder in autoencoders, generators in GANs)
Output size is larger than input
Formula: output_size = (input_size - 1) * stride - 2*padding + kernel_size + output_padding

Purpose: Conv2d extracts features and reduces resolution; ConvTranspose2d reconstructs/generates and increases resolution
Data flow: Conv2d: many-to-one (multiple input positions → one output position); ConvTranspose2d: one-to-many (one input position → multiple output positions)
Common use cases:
- Conv2d: CNNs, encoders, feature extraction
- ConvTranspose2d: GANs, VAEs, semantic segmentation (upsampling path), super-resolution, Diffusion model decoders
Checkerboard artifacts: ConvTranspose2d can produce checkerboard artifacts when kernel_size is not divisible by stride