Home » Tutorials » PyTorch » Quick Start

Quick Start

An adaptation of PyTorch Quickstart tutorial using Habana Gaudi AI processors.

This tutorial, demonstrates how to migrate an existing PyTorch workload to Gaudi. The migration requires only loading the Habana PyTorch plugin library.

This section runs through the API for common tasks in machine learning.

Working with data

PyTorch has two primitives to work with datatorch.utils.data.DataLoader and torch.utils.data.Dataset
Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset.

%matplotlib inline
!pip install ipywidgets
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt

Enable Habana

Let’s enable a single Gaudi device by loading the Habana PyTorch plugin library:

from habana_frameworks.torch.utils.library_loader import load_habana_module
load_habana_module()

PyTorch offers domain-specific libraries such as TorchTextTorchVision, and TorchAudio, all of which include datasets. For this tutorial, we will be using a TorchVision dataset.

The torchvision.datasets module contains Dataset objects for many real-world vision data like CIFAR, COCO (full list here). In this tutorial, we use the FashionMNIST dataset. Every TorchVision Dataset includes two arguments: transform and target_transform to modify the samples and labels respectively.

# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw Processing... Done!
Code language: JavaScript (javascript)

We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break
Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28]) Shape of y: torch.Size([64]) torch.int64
Code language: CSS (css)

Creating Models

To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the __init__ function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to Gaudi.

# Use hpu device for training.
device = "hpu"
print("Using {} device".format(device))

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)
Using hpu device NeuralNetwork( (flatten): Flatten(start_dim=1, end_dim=-1) (linear_relu_stack): Sequential( (0): Linear(in_features=784, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=512, bias=True) (3): ReLU() (4): Linear(in_features=512, out_features=10, bias=True) ) )
Code language: PHP (php)

Optimizing the Model Parameters

To train a model, we need a loss function and an optimizer.

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and backpropagates the prediction error to adjust the model’s parameters.

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We also check the model’s performance against the test dataset to ensure it is learning.

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")
Epoch 1 ------------------------------- loss: 2.304275 [ 0/60000] loss: 2.292274 [ 6400/60000] loss: 2.273886 [12800/60000] loss: 2.261484 [19200/60000] loss: 2.260275 [25600/60000] loss: 2.249568 [32000/60000] loss: 2.268266 [38400/60000] loss: 2.249405 [44800/60000] loss: 2.237456 [51200/60000] loss: 2.224073 [57600/60000] Test Error: Accuracy: 42.7%, Avg loss: 2.212126 Epoch 2 ------------------------------- loss: 2.236188 [ 0/60000] loss: 2.213333 [ 6400/60000] loss: 2.162462 [12800/60000] loss: 2.162725 [19200/60000] loss: 2.134478 [25600/60000] loss: 2.130509 [32000/60000] loss: 2.175475 [38400/60000] loss: 2.123774 [44800/60000] loss: 2.108947 [51200/60000] loss: 2.112280 [57600/60000] Test Error: Accuracy: 47.9%, Avg loss: 2.066313 Epoch 3 ------------------------------- loss: 2.110157 [ 0/60000] loss: 2.057358 [ 6400/60000] loss: 1.976155 [12800/60000] loss: 2.007183 [19200/60000] loss: 1.918031 [25600/60000] loss: 1.948208 [32000/60000] loss: 2.027580 [38400/60000] loss: 1.937621 [44800/60000] loss: 1.937256 [51200/60000] loss: 1.967258 [57600/60000] Test Error: Accuracy: 50.5%, Avg loss: 1.876117 Epoch 4 ------------------------------- loss: 1.949822 [ 0/60000] loss: 1.862419 [ 6400/60000] loss: 1.751973 [12800/60000] loss: 1.830420 [19200/60000] loss: 1.685468 [25600/60000] loss: 1.759455 [32000/60000] loss: 1.885085 [38400/60000] loss: 1.771499 [44800/60000] loss: 1.786278 [51200/60000] loss: 1.849400 [57600/60000] Test Error: Accuracy: 53.6%, Avg loss: 1.718248 Epoch 5 ------------------------------- loss: 1.812151 [ 0/60000] loss: 1.708405 [ 6400/60000] loss: 1.578452 [12800/60000] loss: 1.701682 [19200/60000] loss: 1.517465 [25600/60000] loss: 1.625431 [32000/60000] loss: 1.775850 [38400/60000] loss: 1.655754 [44800/60000] loss: 1.670564 [51200/60000] loss: 1.766322 [57600/60000] Test Error: Accuracy: 56.2%, Avg loss: 1.603294 Done!
Code language: CSS (css)

Saving Models

A common way to save a model is to serialize the internal state dictionary (containing the model parameters).

model = model.to("cpu")
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")
Saved PyTorch Model State to model.pth
Code language: CSS (css)

Loading Models

The process for loading a model includes re-creating the model structure and loading the state dictionary into it.

model = NeuralNetwork()
model.load_state_dict(torch.load("model.pth"))
<All keys matched successfully>
Code language: HTML, XML (xml)

This model can now be used to make predictions.

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')
Predicted: "Ankle boot", Actual: "Ankle boot"
Code language: JavaScript (javascript)

BSD 3-Clause License

Copyright (c) 2021 Habana Labs, Ltd. an Intel Company.
Copyright (c) 2017, Pytorch contributors
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Sign up for the latest Habana developer news, events, training, and updates.