Introduction to Neural Networks on CCAST

A beginner's guide to running TensorFlow and PyTorch jobs on CCAST HPC clusters.

Introduction

This document provides instructions on how to run TensorFlow and PyTorch neural network jobs on CCAST clusters. In the following examples, we will use the CIFAR-10 dataset to train a convolutional neural network (CNN) to classify images. The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. For more information about the CIFAR-10 dataset, see the CIFAR-10 website.

Neural Networks

As a subset of artificial intelligence, machine learning algorithms improve models automatically through existing experience. A machine learning algorithm builds mathematical models on training data to make predictions or decisions after learning from those training data, and is then evaluated on test data. The goal of machine learning is to generalize a model to make predictions or decisions on new data.

Neural networks are machine learning models which are inspired by the structure and function of the human brain, composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems.

In this example, we compose a neural network which uses convolutional layers to extract features from images, and fully connected layers to classify the images. The following figure shows the structure of the CNN used in the following examples:

CNN structure used in the following examples: during forward pass of the network, data flows from left to right.

In each example, the Python script performs the following steps:

Data Preparation - download and prepare the CIFAR-10 dataset, splitting into training, validation, and test sets.
- The training set (40,000 images) is used to train the model, and is used to update the model weights.
- The validation set (10,000 images) is used to evaluate the model during training.
- The test set (10,000 images) is used to evaluate the model after training is complete.
Model Definition - define a CNN model using TensorFlow or PyTorch, and compile the model.
Model Training - train the model using the training set, and evaluate the model accuracy using the validation set.
Model Testing - finally, test the model accuracy using the test set.

Python Machine Learning Frameworks

We will use the following machine learning frameworks in this tutorial, both of which are available on CCAST:

TensorFlow - an open-source software library for dataflow and differentiable programming across a range of tasks, developed by Google Brain Team.
PyTorch - an open-source machine learning library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab (FAIR).

Additional python packages can be installed and managed by users using either virtual environments or conda. for more information about creating your own custom machine learning environments, see the related kb article.

Neural Network Workflows on CCAST

Example files

All the examples and job submission scripts discussed in this document can be found in the following compressed file on Thunder Prime: /mmfs1/projects/ccastest/examples/cnn_example.tar.gz.

$ cp /mmfs1/projects/ccastest/examples/cnn_example.tar.gz $SCRATCH
# change to your SCRATCH directory
$ cd $SCRATCH
# extract the example files
$ tar -xvf cnn_example.tar.gz
$ cd cnn_example

TensorFlow Usage

On CCAST, TensorFlow is available as an anaconda environment. To use TensorFlow, you must first load the anaconda module and activate the TensorFlow environment. The example files demonstrate basic usage of TensorFlow on CCAST clusters.

Within the cnn_example directory, the TensorFlow directory contains the following files:

tf_cpu.py - a TensorFlow example script for CPU-based jobs.
tf_gpu.py - a TensorFlow example script for GPU-based jobs.
tf_cpu.pbs - a PBS job script for running the tf_cpu.py script on a single node with 4 CPU cores.
tf_gpu.pbs - a PBS job script for running the tf_gpu.py script on a single node with 4 CPU cores and 1 GPU.

CPU-based Jobs

For CPU-based jobs, the NCPUS environment variable can be used to specify the number of CPU cores to use within python. The following example shows how to run a TensorFlow job on a single node with 4 CPU cores:

#!/bin/bash
#PBS -q default
#PBS -N tf_cpu_test
#PBS -l select=1:mem=16gb:ncpus=4
#PBS -l walltime=08:00:00
## replace "x-ccast-prj-" below with your "x-ccast-prj-[your group name]"
#PBS -W group_list=x-ccast-prj-

cd ${PBS_O_WORKDIR}

## load anaconda TensorFlow environment
source /mmfs1/apps/pyenvs/anaconda3-2022.05/bin/activate tf-2.10

python tf_cpu.py

To submit the job, use the qsub command:

$ qsub tf_cpu.pbs

GPU accelerated Jobs

To run TensorFlow jobs on GPU nodes, you must first load the cuda and cudnn modules, and activate the TensorFlow environment. The following example shows how to run a TensorFlow job on a single node with 4 CPU cores and 1 GPU.

#!/bin/bash
#PBS -q gpus
#PBS -N tf_gpu_test
#PBS -l select=1:mem=16gb:ncpus=4:ngpus=1
#PBS -l walltime=08:00:00
## replace "x-ccast-prj-" below with your "x-ccast-prj-[your group name]"
#PBS -W group_list=x-ccast-prj-

cd ${PBS_O_WORKDIR}

## load cuda and cudnn modules
module load cuda/12.3
module load cudnn/8.9

## load anaconda TensorFlow environment
source /mmfs1/apps/pyenvs/anaconda3-2022.05/bin/activate tf-2.10

python tf_gpu.py

To submit the job, use the qsub command:

$ qsub tf_gpu.pbs

PyTorch Usage

Pytorch automatically detects and uses both GPU and CPU resources. The example files demonstate basic usage of PyTorch on CCAST clusters. Within the cnn_example directory, the pytorch directory contains the following files:

pytorch.py - a PyTorch example script.
pytorch_cpu.pbs - a PBS job script for running the pytorch.py script on a single node with 4 CPU cores.
pytorch_gpu.pbs - a PBS job script for running the pytorch.py script on a single node with 4 CPU cores and 1 GPU.

CPU-based Jobs

On CCAST, PyTorch is available as an anaconda environment. To use PyTorch, you must first load the anaconda module and activate the PyTorch environment. The following example shows how to run a PyTorch job on a single node with 4 CPU cores:

#!/bin/bash
#PBS -q default
#PBS -N pytorch_cpu_test
#PBS -l select=1:mem=16gb:ncpus=4
#PBS -l walltime=08:00:00
## replace "x-ccast-prj-" below with your "x-ccast-prj-[your group name]"
#PBS -W group_list=x-ccast-prj-

cd ${PBS_O_WORKDIR}

## load anaconda PyTorch environment
source /mmfs1/apps/pyenvs/anaconda3-2022.05/bin/activate pytorch

python pytorch.py

To submit the job, use the qsub command:

$ qsub pytorch_cpu.pbs

GPU accelerated Jobs

To run PyTorch jobs on GPU nodes, you must first load the cuda and cudnn modules, and activate the PyTorch environment. The following example shows how to run a PyTorch job on a single node with 4 CPU cores and 1 GPU.

#!/bin/bash
#PBS -q gpus
#PBS -N pytorch_gpu_test
#PBS -l select=1:mem=16gb:ncpus=4:ngpus=1
#PBS -l walltime=08:00:00
## replace "x-ccast-prj-" below with your "x-ccast-prj-[your group name]"
#PBS -W group_list=x-ccast-prj-

cd ${PBS_O_WORKDIR}

## load cuda and cudnn modules
module load cuda/12.3
module load cudnn/8.9

## load anaconda PyTorch environment
source /mmfs1/apps/pyenvs/anaconda3-2022.05/bin/activate pytorch

python pytorch.py

To submit the job, use the qsub command:

$ qsub pytorch_gpu.pbs

Performance

Benchmarking was performed on a single node, with all times taken as the mean of 10 runs. The following tables show the results of the example scripts on a single node.

TensorFlow

Job Type	Cores	GPU	Mean Training Time (s)	Speedup*
CPU	1	-	4927	1.00
CPU	4	-	1745	2.82
CPU	6	-	1391	3.54
CPU	8	-	821.2	6.00
CPU	16	-	504.5	9.77
CPU	32	-	354.8	13.8
CPU	64	-	473.0	10.4
GPU	4	1x a10	26.45	186
GPU	4	1x a40	23.27	212
GPU	4	1x a100	19.75	249

PyTorch

Job Type	Cores	GPU	Mean Training Time (s)	Speedup*
CPU	1	-	2835	1.00
CPU	4	-	822.3	3.44
CPU	6	-	582.5	4.87
CPU	8	-	504.4	5.62
CPU	16	-	300.9	9.42
CPU	32	-	238.6	11.9
CPU	64	-	356.2	7.96
GPU	4	1x a10	56.6	50.1
GPU	4	1x a40	49.2	57.7
GPU	4	1x a100	47.2	60.1

*Note: Values for speedup are calculated within a single framework, with reference to the single core CPU runtime, and should not be directly compared across frameworks.

Multi-node Jobs

The parallelization of deep learning jobs across multiple nodes is not trivial, and requires the creation of specific python code for each framework. The following links provide examples of multi-node jobs for TensorFlow and PyTorch:

Keywords:

How-to, Tutorial, CCAST, HPC, Machine Learning, Deep Learning, Neural Networks, Computer Science, Python, Pytorch, TensorFlow, Statistics

Doc ID:

133762

Owned by:

Stephen S. in NDSU IT Knowledge Base

Created:

2023-12-29

Updated:

2024-05-29

Sites:

NDSU IT Knowledge Base

0 0 Comment Suggest new doc