Running MATLAB on HPC Clusters
Introduction
A tutorial on running serial and parallel MATLAB jobs on CPUs and GPUs on CCAST clusters.
MATLAB is a high-level programming language and numerical computing environment developed by MathWorks. It is used for numerical computation, machine learning, and visualization. MATLAB is available to all CCAST users who are affiliated with North Dakota State University. There are multiple MATLAB versions installed on CCAST systems. On Thunder or Thunder Prime, check all available software modules by typing:
$ module avail matlab
There are two ways to run MATLAB: one uses batch mode on Thunder or Thunder Prime, and the other one is via Open OnDemand. That latter allows you to run MATLAB interactively and graphically. In this document, we only discuss running serial and parallel MATLAB jobs on CPUs and GPUs via batch mode.
A MATLAB job needs MATLAB scripts that you intend to run and a job submission script to submit it to the job scheduler (PBS Pro on Thunder or Thunder Prime). See the CCAST User Guide for more information on running jobs on CCAST systems in general.
Example files
All the source codes and job submission scripts discussed in this document can be found in the following compressed file: /mmfs1/thunder/projects/ccastest/examples/MATLAB_Tutorial_examples.tar.gz
(on Thunder) /mmfs1/projects/ccastest/examples/MATLAB_Tutorial_examples.tar.gz
(on Thunder Prime)
To copy the examples to your SCRATCH directory:
# on Thunder
$ cp /mmfs1/thunder/projects/ccastest/examples/MATLAB_Tutorial_examples.tar.gz $SCRATCH
# on Thunder Prime
$ cp /mmfs1/projects/ccastest/examples/MATLAB_Tutorial_examples.tar.gz $SCRATCH
Then, extract the files:
$ cd $SCRATCH
$ tar -xvf MATLAB_Tutorial_examples.tar.gz
Running serial jobs on CPUs
In the following, we examine a few MATLAB examples:
Example 1: Hello World !
In this simple example, the system prints the phrase “Hello World !” in the output file.
$ cd $SCRATCH/MATLAB_Tutorial_examples/MATLAB_example_1_HelloWorld
The MATLAB script: HelloWorld.m
% display information
fprintf(Hello World !\n\n)
The job submission script: matlab_job.pbs
#!/bin/bash
#PBS -q default
#PBS -N test
## serial jobs: only 1 processor core is requested
#PBS -l select=1:mem=2gb:ncpus=1
#PBS -l walltime=00:10:00
## replace 'x-ccast-prj' below with 'x-ccast-prj-[your project group name]'
#PBS -W group_list=x-ccast-prj
## if you require a specific version of MATLAB, replace "module load matlab" below
module load matlab
cd $PBS_O_WORKDIR
## change the input filename as needed
matlab -nodesktop -nodisplay -r "run HelloWorld.m"
exit 0
Note that the above job submission script requires the group_list
variable to be set to your project group name.
To submit the job:
$ qsub matlab_job.pbs
To check the status of all jobs you are running (It may show nothing if the job has been completed):
$ qstat -u $USER
The job will produce an output file named test.o<job_id>
, where <job_id>
is the job ID number. The expected output is:
< M A T L A B (R) >
Copyright 1984-2020 The MathWorks, Inc.
R2020a Update 3 (9.8.0.1396136) 64-bit (glnxa64)
May 27, 2020
To get started, type doc.
For product information, visit www.mathworks.com.
Hello World !
Example 2: Summing the elements of a vector
In this example, the job creates a vector outside of a loop and uses the vector index to do the sum operation of a vector. Then, the system prints the sum value in the output file.
cd $SCRATCH/MATLAB_Tutorial_examples/MATLAB_example_2_Sum
The MATLAB script: sum.m
z = 0;
xAll = 0:0.1:10000; % create a vector outside of a loop
for i= 1:length(xAll)
x = xAll(i);
z = z + x;
end
fprintf('%f\n', z);
% Copyright 2010 - 2014 The MathWorks, Inc.
The job submission script: matlab_job.pbs
#!/bin/bash
#PBS -q default
#PBS -N test
## serial jobs: only 1 processor core is requested
#PBS -l select=1:mem=2gb:ncpus=1
#PBS -l walltime=00:10:00
## replace "x-ccast-prj" below with "x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj
## if you require a specific version of MATLAB, replace "module load matlab" below
module load matlab
cd $PBS_O_WORKDIR
## change the input filename as needed
matlab -nodesktop -nodisplay -r "run sum.m"
exit 0
The above script requests a single node for at most 10 minutes, with 2 GB of RAM. After editing the job submission script as needed (editing the line “#PBS -W”), to submit this job:
$ qsub matlab_job.pbs
The expected output is:
< M A T L A B (R) >
Copyright 1984-2020 The MathWorks, Inc.
R2020a Update 3 (9.8.0.1396136) 64-bit (glnxa64)
May 27, 2020
To get started, type doc.
For product information, visit www.mathworks.com.
500005000.000000
Running parallel jobs on CPUs
MATLAB parallelism includes implicit multi-threaded parallelism and explicit parallelism. In this section, we will discuss how to run parallel jobs on CPUs.
Implicit parallelism
MATLAB contains internal mechanisms that enable some code to run much faster by automatically parallelizing arithmetic and logical operations on data. This is called implicit parallelism (multi-threaded or built-in multi-threading) since we do not need to explicitly tell MATLAB to parallelize the operations. Implicit parallelization relies on the fact many operations are independent of each other and can therefore be processed in parallel.
For implicit parallel computation, vector operation is the necessary trigger, but it is not sufficient. The application or algorithm, and the amount of computation also help MATLAB to determine whether an application will be performed with multi-threads. This link provides more information on conditions of implicit parallelism. Such parallelism is implemented by multi-threaded computations that are executed within a single node. Therefore, on Thunder or Thunder Prime, you should only select one node to enable implicit parallelism.
Example 3: Matrix Multiplication
This job executes multiplication operations by multiplying a matrix by another matrix. In this example, implicit parallelism is triggered. This can be observed by log in to the compute node and check the CPU usage. On Thunder or Thunder Prime, to check the name of the compute node, you can execute:
$ cd $SCRATCH/MATLAB_Tutorial_examples/MATLAB_example_3_Multiplication
The MATLAB script: matrix_multiplication.m
n = 5000; % set matrix size
A = rand(n); % create random matrix
B = rand(n); % create another random matrix
tic % calculate the elapsed time using tic and toc
C = A \* B; % matrix multiplication
time=toc; % calculate the elapsed time using tic and toc
disp(['Processing time: ' num2str(time)] 's'); % display running time (unit: second)
The job submission script: matlab_job.pbs
#!/bin/bash
#PBS -q default
#PBS -N test
##change the ncpus number to run the job in implicit parallel
#PBS -l select=1:mem=4gb:ncpus=1
#PBS -l walltime=00:30:00
##replace "x-ccast-prj" below with "x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj
## to run on Thunder Prime, replace "module load matlab/R2020a" below with "module load matlab/R2021a"
module load matlab/R2020a
cd $PBS_O_WORKDIR
##change the input filename as needed
matlab -nodesktop -nodisplay -r "run matrix_multiplication.m"
exit 0
To submit this job:
$ qsub matlab_job.pbs
The expected output is:
< M A T L A B (R) >
Copyright 1984-2020 The MathWorks, Inc.
R2020a Update 3 (9.8.0.1396136) 64-bit (glnxa64)
May 27, 2020
To get started, type doc.
For product information, visit www.mathworks.com.
Processing time: 6.8422s
To observe implicit parallel computing performance, when different numbers of cores are selected, you might get results like:
Cores number (ncpus) | Processing time |
---|---|
1 | 6.84 s |
2 | 2.56 s |
3 | 1.88 s |
4 | 1.66 s |
5 | 1.33 s |
6 | 1.17 s |
7 | 0.98 s |
8 | 0.99 s |
The results show the relationship between the processing time and the number of cores allocated. The running time will not decrease monotonously with the increase of the number of cores. Users need to decide the number of selected cores according to the actual situation of the project.
Explicit parallelism
Explicit parallelism is characterized by the presence of explicit constructs in the programming language, aimed at describing the way in which the parallel computation will take place. In explicit parallelism, several instances of MATLAB simultaneously execute a single MATLAB command or function.
The Parallel Computing Toolbox describes the explicit parallelism and lets you use parallel-enabled functions in MATLAB and other toolboxes. For more details on running your code in explicit parallelism, see Choose a Parallel Computing Solution.
A common way to initiate a parallel computation in MATLAB is to use a parfor
loop: parfor
executes for-loop iterations in parallel on workers in a parallel pool. For more information, please see Decide When to Use parfor.
Example 4: Getting eigenvalues of square matrices
This job performs N trials of computing the largest eigenvalue for a M-by-M random matrix using parfor
and outputs its processing time.
$ cd $SCRATCH/MATLAB_Tutorial_examples/MATLAB_example_4_MatrixEigenvalue
The MATLAB script: parfor.m
% Performs N trials of computing the largest eigenvalue for an M-by-M random matrix
gcp; % open parallel pool if none is open
M=500; % number of rows and columns of each matrix
N=12000; % number of trials
tic; % calculate the elapsed time using tic and toc
a = zeros(N,1); % initialize output vector
parfor i = 1:N % use parallel processing by running parfor in a parallel pool
a(i) = max(eig(rand(M))); % vector of largest eigenvalues
end
time = toc; % calculate the elapsed time using tic and toc
disp(['Parallel processing time: ' num2str(time)] 's'); % display running time
poolobj = gcp('nocreate'); % Get current parallel pool
delete(poolobj); % Shutting down parallel pool
The job submission script: matlab_job.pbs
#!/bin/bash
#PBS -q default
#PBS -N test
#PBS -l select=1:mem=8gb:ncpus=5
#PBS -l walltime=00:30:00
##replace "x-ccast-prj" below with "x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj
##to run on Thunder Prime, replace "module load matlab/R2020a" below with "module load matlab/R2021a"
module load matlab/R2020a
cd $PBS_O_WORKDIR
##change the input filename as needed
matlab -nodesktop -nosplash -nodisplay -r "run parfor.m"
exit 0
To submit this job:
$ qsub matlab_job.pbs
The expected output is:
< M A T L A B (R) >
Copyright 1984-2020 The MathWorks, Inc.
R2020a Update 3 (9.8.0.1396136) 64-bit (glnxa64)
May 27, 2020
To get started, type doc.
For product information, visit www.mathworks.com.
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 5).
Parallel processing time: 33.9835s
Parallel pool using the 'local' profile is shutting down.
To observe explicit parallel computing performance, a different number of cores are selected, and you might get results like:
Number of workers (ncpus) | Processing time |
---|---|
1 | 192.5 s |
2 | 72.8 s |
3 | 65.5 s |
4 | 49.4 s |
5 | 34.0 s |
Example 5: for-loop and parfor-loop
$ cd $SCRATCH/MATLAB_Tutorial_examples/MATLAB_example_5_for_CompareLoops
This example job also performs N trials of computing the largest eigenvalue for an M-by-M random matrix, but it addresses another two problems. First, this job compares processing time between for-loop processing and parfor-loop processing. Second, when your code includes files that call other files, you need to use addpath
to add the working directory into your MATLAB search path.
The MATLAB script: ex_compare.m
% set pathToData with your current work directory
% pathToData = [CURRENT_WORKDIR];
addpath(pathToData);
tic; a1 = ex_for(500,1200); t1 = toc; % calculate a1 using for-loop
gcp; % open parallel pool if none is open
tic; a2 = ex_parfor (500,1200); t2 = toc; % calculate a2 using parfor-loop
% Compare processing times
disp(['For-loop processing time: ' num2str(t1) 's'])
disp(['Parfor-loop processing time: ' num2str(t2) 's'])
% Shutting down parallel pool
poolobj = gcp('nocreate'); % get current parallel pool
delete(poolobj); % shutting down parallel pool
% Copyright 2010 - 2014 The MathWorks, Inc.
Which calls two functions: ex_parfor.m
and ex_for.m
ex_parfor.m
function a = ex_parfor(M, N)
a = zeros(N,1);
parfor i = 1:N
vi a(i) = max(eig(rand(M)));
end
ex_for.m
function a = ex_for(M, N)
a = zeros(N,1);
for i = 1:N
a(i) = max(eig(rand(M)));
end
Job submission script: matlab_job.pbs
#!/bin/bash
#PBS -q default
#PBS -N test
#PBS -l select=1:mem=8gb:ncpus=2
#PBS -l walltime=00:30:00
##replace "x-ccast-prj" below with "x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj
##to run on Thunder Prime, replace "module load matlab/R2020a" below with "module load matlab/R2021a"
module load matlab/R2020a
cd $PBS_O_WORKDIR
##pass $PBS_O_WORKDIR to MATLAB to add directories into MATLAB search path
sed -i "1c pathToData = '$PBS_O_WORKDIR';" compare.m
##change the input filename as needed
matlab -nodesktop -nosplash -nodisplay -r "run compare.m"
exit 0
To submit this job:
$ qsub matlab_job.pbs
The expected output is:
< M A T L A B (R) >
Copyright 1984-2020 The MathWorks, Inc.
R2020a Update 3 (9.8.0.1396136) 64-bit (glnxa64)
May 27, 2020
To get started, type doc.
For product information, visit www.mathworks.com.
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 2).
For-loop processing time: 141.6204s
Parfor-loop processing time: 72.8218s
Parallel pool using the 'local' profile is shutting down.
This job compares processing time between for-loop and parfor-loop under the same workload. The results show that parfor-loop processing time is much less than for-loop processing time.
Running MATLAB on GPUs
A GPU, designed to quickly render high-resolution images and video concurrently, and can also be used for non-graphical tasks such as machine learning and scientific computation due to its ability to perform parallel operations on multiple sets of data. MATLAB provides tools to run functions on a GPU, including gpuArray
for transferring input data and calling gather
to retrieve output data from the GPU. Many functions in MATLAB and other toolboxes are automatically enabled on a GPU if they support GPU-enabled computing, including fast Fourier transform fft
, matrix multiplication mtimes
, left matrix division mldivide
, and hundreds of others. For more information, please see Accelerate Your Code by Running It on a GPU.
Example 6: Filter a matrix
$ cd $SCRATCH/MATLAB_Tutorial_examples/MATLAB_example_6_GPU_MatrixFilter
This job filters a matrix on GPU using the filter
function and returns the processing time on GPU.
The MATLAB script: matrix_filter.m
A = magic(5000); % magic(n) returns an n-by-n matrix
f = ones(1,20)/20; % create array of all ones
% Filter a matrix on GPU
AonGPU = gpuArray(A); % create an array stored on the GPU
tic; % calculate the elapsed time using tic and toc
BonGPU = filter(f, 1, AonGPU); % do filter operation on GPU
C=gather(BonGPU); % convert back to a numeric array on the CPU
wait(gpuDevice) % wait for GPU calculation to complete
tCompGpu = toc; % calculate the elapsed time using tic and toc
disp([' Processing time on GPU: ' num2str(tCompGpu) 's']) % display processing time
% Copyright 2014 The MathWorks, Inc.
The job submission script: matlab_job.pbs
#!/bin/bash
#PBS -q gpus
#PBS -N test
#PBS -l select=1:ncpus=1:mem=4gb:ngpus=1
#PBS -l walltime=00:20:00
##change “x-ccast-prj” to “x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj
##to run on Thunder Prime, replace "module load matlab/R2020a" below with "module load matlab/R2021a"
module load matlab/R2020a
cd $PBS_O_WORKDIR
##change the input filename as needed
matlab -nodesktop -nodisplay -r "run matrix_filter.m"
exit 0
To submit this job:
$ qsub matlab_job.pbs
The expected output is:
< M A T L A B (R) >
Copyright 1984-2020 The MathWorks, Inc.
R2020a Update 3 (9.8.0.1396136) 64-bit (glnxa64)
May 27, 2020
To get started, type doc.
For product information, visit www.mathworks.com.
Processing time on GPU: 3.9936s
From the output, you can see the computation time of filter function on GPU.
For more information on parallel computing, see the MATLAB website.