Introduction to PyTorch

Background

PyTorch is a widely used deep learning platform known for its flexibility and speed. Originally developed by Facebook AI (now Meta AI), it has grown into one of the most popular frameworks for deep learning research and applications. Researchers and industry professionals widely adopt it due to its ease of use, dynamic computation graph, and easy integration with GPU acceleration.

PyTorch evolved from the Torch library, which was an open-source deep learning framework written in C. Although Torch is no longer actively developed, many of its libraries and functionalities have been incorporated into PyTorch. Today, PyTorch is used by corporations, laboratories, and universities to develop software like Autopilot and Full Self-Driving (FSD) models (Tesla), ChatGPT (OpenAI), reinforcement learning models for robotics (Boston Dynamics), and much more.

Overview of PyTorch and Its Core Components

PyTorch provides a comprehensive set of tools and features that enable efficient deep learning model development and training. Below are some of its core components:

1. Torch Tensors

A tensor is a multi-dimensional array, similar to NumPy arrays, but with the added benefit of GPU acceleration and automatic differentiation.Tensors are the core data structure in PyTorch, used to store and manipulate data for deep learning models. Every input in PyTorch is represented as a tensor; features, responses, parameters, etc. PyTorch does not accept numpy arrays as input like Keras/TensorFlow does, but numpy arrays are easily converted to and from Torch tensors.

Creating Tensors

import torch

# Creating a tensor from a list
x = torch.tensor([1.0, 2.0, 3.0], dtype = torch.float)

# Creating a random tensor
rand_tensor = torch.rand(3, 3) # 3x3 matrix 

# Creating a tensor filled with zeros or ones 
zero_tensor = torch.zeros(5, 5) # 5x5 matirx

one_tensor = torch.ones(5,5) # 5x5 matirx

# Make a row tensor a column tensor
column = torch.tensor([1,2,3,4,5]).view(-1,1)

# Convert a Numpy array to a tensor 
array = np.array([1,2,3,4])
tensor = torch.tensor(array)
# or
tensor = torch.from_numpy(array)
# Convert back to array
array = tensor.numpy()

Tensor Operations

x = torch.tensor([2,4,6])
y = torch.tensor([1,3,5])

# Matrix Addition
z = x + y
z = x * y

# Matrix Multiplication
z = torch.matmul(x, torch.t(y))

Automatic Differentiation

Automatic differentiation (autograd) is a technique used by PyTorch to compute derivatives (gradients) automatically, making it easier to train deep learning models. It dynamically builds a computational graph and efficiently applies the chain rule for differentiation.

# Create tensor with gradient tracking
x = torch.tensor(2.0, requires_grad=True)

If a tensor has requires_grad = True, PyTorch keeps track of all operations performed on it.

Computing Gradients

When .backward() is called on a scalar loss, PyTorch traverses the graph in reverse (backpropagation) and computes derivatives using the chain rule. The computed gradients are stored in the .grad attribute of each tensor. The gradients are then used to update model parameters (e.g., via gradient descent). We’ll talk more about backpropogation on the next page.

Key Features of Autograd

Dynamic Computation Graph: Built at runtime, allowing flexibility in model design.
Efficient Backpropagation: Computes gradients only for required tensors.
Automatic Chain Rule Application: Saves time and avoids manual derivative computation.

2. Activation Functions

Activation functions introduce non-linearity into neural networks, enabling them to learn and model complex patterns.

Common Activation Functions in PyTorch

import torch.nn as nn

# Sigmoid Activation
sigmoid = nn.Sigmoid()

# ReLU Activation
relu = nn.ReLU()

# Tanh Activation
tanh = nn.Tanh()

# Leaky ReLU
lrelu = nn.LeakyReLU(negative_slope=0.01)

More Activation Functions

3. Loss Functions

A loss function measures how well or poorly a model performs by comparing the model’s output to the true labels

Common Loss Functions

mse_loss = nn.MSELoss()  # Mean Squared Error
ce_loss = nn.CrossEntropyLoss()  # Cross-Entropy Loss
bce_loss = nn.BCELoss()  # Binary Cross-Entropy Loss
nll_loss = nn.NLLLoss() # Negative Log-Likelihood (NLL) Loss
hu_loss = nn.SmoothL1Loss() # Huber Loss (Smooth L1 Loss):

More Loss Functions

4. Optimizers

Optimizers are algorithms used to update the weights of a neural network during training to minimize the loss function. They adjust model parameters based on gradients computed during backpropagation to improve the model’s performance.

Common PyTorch Optimizers

import torch.optim as optim

# Stochastic Gradient Descent (SGD)
sgd = optim.SGD(model.parameters(), lr=0.01)

# Adam Optimizer
adam = optim.Adam(model.parameters(), lr=0.001)

# RMSprop: (Root Mean Square Propagation): 
rmsprop = optim.RMSprop(model.parameters(), lr=0.001)

# Adagrad: 
adagrad = optim.Adagrad(model.parameters(), lr=0.01)

The model parameters are passed through the optimizer. This is essential for training.

More Optimizers

5. Transforms

PyTorch Transforms are operations used to preprocess and augment data before feeding it into a neural network. They are commonly used in computer vision tasks to prepare images for training. Other data types have other augmenters.

Example Image Transformations

import torchvision.transforms as transforms

transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])
])

More Transforms

6. Data Handling with Dataset and DataLoader

PyTorch provides powerful tools for handling datasets efficiently using torch.utils.data.Dataset and torch.utils.data.DataLoader. These utilities make it easy to preprocess and load data in batches, which is essential for training deep learning models effectively.

Using `Dataset` to Create Custom Datasets

torch.utils.data.Dataset is an abstract class representing a dataset. PyTorch Datasets provide an easy way to load, preprocess, iterate through, and manage data in PyTorch.Your custom dataset should inherit Dataset and override the following methods:

init: point to your own dataset
len : ensure len(dataset) returns the correct length of the dataset.
getitem: to iterate through data

from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

Using `DataLoader` to Load Data Efficiently

The DataLoader class provides an efficient way to load and iterate through the dataset in mini-batches, enabling better performance and faster training.

from torch.utils.data import DataLoader

dataloader = DataLoader(CustomDataset(data, labels), batch_size=32, shuffle=True)

# Example: Iterating through batches
for batch in dataloader:
    inputs, targets = batch
    print(inputs.shape, targets.shape)

Key Features of `DataLoader`

Batching: Automatically divides data into batches to optimize training.
Shuffling: Randomly shuffles data at each epoch to improve model generalization.
Parallel Processing: Uses multiple workers to speed up data loading.
Pin Memory: Optimizes memory transfer between CPU and GPU for better performance.

Using Dataset and DataLoader correctly can significantly enhance the efficiency of deep learning pipelines, making it easier to work with large-scale datasets.

Advantages of PyTorch

PyTorch is a widely used deep learning framework known for its flexibility, ease of use, and strong ecosystem. Key advantages include:

Dynamic Computation Graphs (Define-by-Run) Unlike TensorFlow’s older static graphs, PyTorch builds computation graphs dynamically, enabling intuitive debugging, flexible model modifications, and support for variable-length inputs—essential for NLP and reinforcement learning. Learn more about dynamic computation.
Seamless GPU Acceleration PyTorch easily utilizes GPUs with .to(device), supports mixed precision training (torch.cuda.amp), and scales efficiently with DataParallel and DistributedDataParallel for multi-GPU training.
Integration with the Python Ecosystem PyTorch works seamlessly with NumPy, SciPy, and Pandas, allowing easy data preprocessing and interoperability with scikit-learn, TensorBoard, and wandb.
Strong Community & Ecosystem With extensive documentation, active forums, and widespread industry adoption, PyTorch benefits from constant improvements and extensive open-source contributions.
Excellent Support for Computer Vision & NLP With TorchVision, TorchText, and TorchAudio, PyTorch simplifies deep learning applications in vision, language processing, and audio analysis.

Last updated on Feb 26, 2025