Pooling#

Pooling is a standard operation in convolutional neural networks (CNNs) used to downsample feature maps. It reduces the spatial dimensions (height and width) while keeping the number of channels unchanged.

Pooling is not a learnable operation — it applies a fixed function (like max or average) over small regions of the input.

A pooling layer slides a small window (like a kernel) across the input and applies a function. Like convolution, pooling has:

  • Kernel size: window size (\(k\times k\))

  • Stride: step size (often equals the kernel size for downsampling)

  • Padding: rarely used in pooling, but available

Max Pooling#

The example below shows a Max-Pooling with a \(3\times 3\) kernel, no padding and a stride=2.

In Python, we can apply a max pooling layer as follows.

import torch
import torch.nn as nn

x = torch.tensor([[[[1., 2., 3., 4.],
                    [5., 6., 7., 8.],
                    [9.,10.,11.,12.],
                    [13.,14.,15.,16.]]]])

pool = nn.MaxPool2d(kernel_size=2, stride=2)
y = pool(x)

print(y)  # shape: (1, 1, 2, 2)
tensor([[[[ 6.,  8.],
          [14., 16.]]]])

The effect of max-pooling is mainly the amplification of features. It compresses the input to a summary that contains the most prevalent features of the previous feature map.

Average Pooling#

Average pooling computes the average of the window. The example below shows the input on the right and the average pooling output on the right. Again, we use a \(3\times 3\) kernel, no padding and a stride of 2.

Average pooling smoothes the features in the input feature map.