Topics Map > Services > Research Computing and Support > CCAST
Using Intel Math Kernel Library (MKL) on HPC Clusters
This document explains the syntax and usage of Intel Math Kernel Library (MKL) included in Intel oneAPI Toolkits, with examples for each usage scenario.
- Introduction to Intel MKL
- Examples using Intel MKL: C
- Examples using Intel MKL: Fortran
- Miscellaneous
Conventions used in this document
• Terminal commands are denoted by inline code prefixed with $, output omits the $
$ echo You are the coolest programmer ever
You are the coolest programmer ever
• Code is denoted by code blocks
if (hacker) {
access_granted = True
}
• Variable inputs are denoted by capital letters in brackets
[PASSWORD]
1. Introduction to Intel MKL
Intel Math Kernel Library (Intel MKL) is a math library that exploits the core counts
and architectures of Intel CPUs to reach a high degree of optimization and
parallelization. It has implementations of many standard math packages, like BLAS and LAPACK. This means no code changes are required if these libraries
are already being utilized, a developer merely needs to link against Intel MKL.
Most
routines in this library are parallelized behind the scenes, allowing
programmers to utilize implicit parallelization when calling
from sequential programs. These routines are thread
safe, but caution should be taken when using threaded
MKL routines in applications already utilizing threading. Appropriate include files and compiler commands are required to use the
libraries.
For examples on how to use MKL, multiple code samples exist for C/C++ and Fortran on Intel’s developer reference site, and here on Intel’s get started page for MKL.
1.1 Types of MKL libraries
• Basic Linear Algebra Subprograms (BLAS): vector, matrix-vector, and matrix-matrix operations.
• Sparse BLAS: Basic operations on sparse vectors and matrices.
• Sparse QR: A multifrontal sparse QR factorization method for solving a sparse system of linear equations.
• Linear Algebra PACKage (LAPACK): Solves systems of linear equations, least square problems, eigenvalue and singular value problems, and Sylvester’s equations.
• Statistical Functions: Implements commonly used pseudorandom random number generators (RNG) with continuous distribution.
• Direct and Iterative Sparse Solvers: Several options for solving sparse linear systems of equations and a direct sparse solver based on PARDISO*, called Intel MKL PARDISO.
• Vector Mathematics Functions: Computes core mathematical functions on vector arguments.
• Vector Statistics Functions: Generates vectors of pseudorandom numbers with different types of statistical distributions and perform convolution and correlation computations.
• Fourier Transform Functions: Several options for computing Fast Fourier Transforms (FFTs).
1.2 General usage
• Include appropriate headers
• Use appropriate
MKL routines in C/C++ or Fortran
• Create compiler
commands and link lines with Intel MKL Link Line Advisor
• Execute the
compiler using the output from the previous step
1.3 Linking with Intel MKL
For
a quick guide on how to use the link line advisor for generating link lines and
compiler options for the Thunder/Thunder Prime cluster, the following can be used:
• Go to Intel Math Kernel Library Link Line Advisor
• Set "Select Intel product:" to "oneMKL 2021"
• Set "Select OS:" to "Linux*"
• Set "Select compiler:" to "Intel(R) [LANGUAGE] Classic"
• Set "Select
architecture:" to "Intel(R) 64"
• Set "Select
dynamic or static linking:" to
– "Static" to include libraries in executable
– "Dynamic" to use libraries separately from
executable (recommended)
• Set "Select
interface layer:" to
– "32-bit
integer" if
your integer values do not exceed 2^31 (i.e., 2 billion)
– "64-bit
integer"
otherwise or when in doubt
• Set "Select
threading layer:" to
– "OpenMP
threading" if using OpenMP
– "TBB
threading" if using Intel TBB (TBB
is not covered in this tutorial)
– "Sequential" otherwise
• If
applicable, set "Select OpenMP library" to appropriate library: "Intel(R) (libiomp5)"
• If
applicable, set "Select cluster library" to appropriate libraries (not commonly used)
• If
applicable, set "Select MPI library" to appropriate library (not commonly used)
• If
applicable, set "Select the Fortran 95 interfaces" to appropriate interfaces (not commonly used)
• Select "Link
with Intel(R) MKL libraries explicitly:" if explicitly loading libraries
with pointers (not recommended)
2. Examples using Intel MKL: C
2.1 Dot product (sequential)
For our first C example we will be using the cblas_sdot routine, which computes a vector-vector dot product with double precision.
With the knowledge that we require a dot product operation, we could find our MKL routine with the following procedure:
• Navigate to the C Developer Reference
• Noticing our problem domain is basic linear algebra, we click on the BLAS link.
• Noticing our problem involves vector-vector operations, we click on the BLAS Level 1 Routines link.
• Searching for dot product on this page we find 4 routines. We choose cblas_?sdot because we want double precision and we are not using conjugated vectors.
This routine uses one dimensional arrays as inputs, where the data members can be optionally offset. In our example, we will have an x array with an offset of 2, and a y array with an offset of 1. Using ? to represent information we don’t care about, our arrays will look like the following:
x in memory (offset of 2)
------------------------------------
| 2
| ? | 2 | ? | 2 | ? | 2 | ? | 2
x logically
--------------------
| 2
| 2 | 2 | 2 | 2
y in memory and logically (offset of 1)
--------------------
| 1
| 1 | 1 | 1 | 1
For
a refresher on dot products. It is an operation that takes 2 vectors of equal
size and outputs a scalar.
x • y
2*1 + 2*1 + 2*1 + 2*1 + 2*1 = 10
With
this in mind, our C code will look like the following. We are using dynamically
allocated arrays here, with malloc and free. These allocate
and deallocate memory, respectively.
mkl_sdot.c
#include <stdio.h>
#include
<stdlib.h>
//
MKL interface
#include
"mkl.h"
int
main()
{
// MKL_INT variables for MKL routine
MKL_INT num_elements, inc_x, inc_y, x_len,
y_len;
int i;
float
result;
float *x, *y;
num_elements = 5;
inc_x = 2;
inc_y = 1;
// Include space for offsets of array
elements
x_len = (1 + ((num_elements - 1) *
abs(inc_x)));
y_len = (1 + ((num_elements - 1) *
abs(inc_y)));
// Allocate memory for arrays
x = (float*)malloc(x_len * sizeof(float));
y = (float*)malloc(y_len * sizeof(float));
// Initialize arrays
for (i = 0; i < num_elements; i++) {
x[i * abs(inc_x)] = 2.0;
y[i * abs(inc_y)] = 1.0;
}
// Run MKL routine
result = cblas_dsdot(num_elements, x,
inc_x, y, inc_y);
// Print result
printf("DOT PRODUCT = %7.3f\n",
result);
// Free memory
free(x);
free(y);
return 0;
}
Using the above link line advisor guide, we can get the following code by inputting sequential for our threading layer. No additional libraries or interfaces need to be selected. This example can be compiled and linked with OpenMP threading for implicit parallelization, but it is important to note that sequential execution is an option.
Link line:
-L${MKLROOT}/lib/intel64 -lpthread -lm -ldl
Compiler options:
-DMKL_ILP64 -mkl=sequential
Appending these to our compiler command, we get:
On Thunder:
$ module load intel/2021.1.1
$ icc mkl_sdot.c -o exe_mkl_sdot_c
On Thunder Prime:
$ module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
$
icc mkl_sdot.c -o exe_mkl_sdot_c
Here
is our job script:
mkl_sdot_c.pbs
#!/bin/bash
#PBS
-q default
#PBS
-N job_mkl_sdot_c
#PBS
-l select=1:mem=1gb:ncpus=1
#PBS
-l walltime=08:00:00
#PBS
-W group_list=x-ccast-prj-[GROUP_NAME]
## load Intel to link with Intel MKL
##On Thunder using module load intel/2021.1.1
##On Thunder Prime, replace below line into module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
module load intel/2021.1.1
cd
$PBS_O_WORKDIR
./exe_mkl_sdot_c
exit
0
Submit the job:
$ qsub mkl_sdot_c.pbs
Output the results:
$ cat job_mkl_sdot_c.o[JOB_ID]
DOT PRODUCT = 10.000
2.2 Dot product (implicit parallelism)
Use the Intel Math Kernel Library Link Line Advisor again, but this time set "Select threading layer:" to "OpenMP threading" and "Select OpenMP library:" to "Intel(R) (libiomp5)"; see below. The link line is now:
-L${MKLROOT}/lib/intel64 -liomp5 -lpthread -lm -ldl
and the compiler options are:
-DMKL_ILP64 -mkl=parallel
Now, let’s look at all of our commands again with this change.
On Thunder:
$ module load intel/2021.1.1
$ icc mkl_sdot.c -o exe_mkl_sdot_c
On Thunder Prime:
$ module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
$ icc mkl_sdot.c -o exe_mkl_sdot_c
We now can run the executable on more than one CPU core (on a single compute node!). Modify the “#PBS -l” line in the job submission script as the following:
#PBS -l select=1:mem=2gb:ncpus=4:ompthreads=4
Submit the job and view the output:
$ qsub mkl_sdot_c.pbs
$ cat
job_mkl_sdot_c.o[JOB_ID]
2.3 Eigenvalue problem
In this next example, we solve an eigenproblem. This is done with Intel’s syev routine. Using the same logic that we used in the dot product example, here is how we would find this routine:
• Navigate
to the C
Developer Reference
• We
see eignenvalues under the LAPACK
heading, so we click on the cooresponding link.
• Looking
under the LAPACK routines, we see the Least
Squares and Eigenvalue Problems LAPACK Routines link, and click on it.
• Understanding
that we want a complete solution to a problem and not a distinct computational
task, we click on Driver
Routines.
• Realizing
we are dealing with symmetric matrices, we click on Symmetric
EigenProblems.
• We
click on syev
because it computes all eigenvalues and eigenvectors of a real symmetric
matrix.
From Intel’s website, we have the following description for this routine: The routine computes all eigenvalues and, optionally, eigenvectors of an n-by-n real symmetric matrix A. The eigenvector v(j) of A satisfies
A*v(j) = lambda(j)*v(j)
where
lambda(j) is its eigenvalue. The computed eigenvectors are orthonormal.
Input
Matrix
----------------------------------------
|
1.96 | -6.49 | -0.47 | -7.20 | -0.65
----------------------------------------
| -6.49 |
3.80 | -6.39 | 1.50 | -6.34
----------------------------------------
| -0.47 | -6.39 | 4.17 | -1.51 | 2.67
----------------------------------------
| -7.20 |
1.50 | -1.51 | 5.70 | 1.80
----------------------------------------
| -0.65 | -6.34 | 2.67 |
1.80 | -7.10
Eigenvalues
------------------------------------------
| -11.07 | -6.23 | 0.86 |
8.87 | 16.09
Eigenvectors
(stored columnwise)
-----------------------------------------
|
0.30 | -0.61 | -0.40 | -0.37 | -0.49
-----------------------------------------
|
0.51 | -0.29 | 0.41 | -0.36
| 0.61
-----------------------------------------
|
0.08 | -0.38 | 0.66 | 0.50 | -0.40
-----------------------------------------
|
0.00 | -0.45 | -0.46 | 0.62
| 00.46
-----------------------------------------
|
0.80 | 0.45 | -0.17 | 0.31 | -0.16
Now here’s the code. We are using macros with #define to enable us to use fixed length arrays (not dynamically allocated). These are evaluated at compile time (as opposed to run time) and are thus much more performant. This also enables us to use list initialization to easily visualize what we are working with.
mkl_ssyev.c
#include <stdlib.h>
#include
<stdio.h>
#include
"mkl.h"
#define
ROWS 5
#define
COLUMNS 5
int
main() {
// integer identifier to specify row or
column major
int i,j;
int
matrix_layout;
lapack_int rows, columns, info;
// character to specify job information
char jobz, uplo;
// use MKL macro
matrix_layout = LAPACK_ROW_MAJOR;
// macro for fixed length array
// row major
rows = ROWS;
columns = COLUMNS;
// describes returned results
// 'N' to compute eigenvalues only
// 'V' to compute eigenvalues and
eigenvectors
jobz = 'V';
// describes the matrix A
// 'U' for uppertriangular matrix
// 'L' for lowertriangular matrix
uplo = 'L';
float eigenvalues[columns];
// list initialization, row major
float a[ROWS * COLUMNS] = {
1.96,
0.00, 0.00, 0.00,
0.00,
-6.49,
3.80, 0.00, 0.00,
0.00,
-0.47, -6.39, 4.17,
0.00, 0.00,
-7.20,
1.50, -1.51, 5.70, 0.00,
-0.65, -6.34, 2.67,
1.80, -7.10
};
// run MKL routine
info = LAPACKE_ssyev(matrix_layout, jobz,
uplo, columns, a, rows, eigenvalues);
// if (info == 0) the execution was
successful
// if (info == -i), the i-th parameter had
an illegal value
// if (info == i), then the algorithm
failed to converge; i elements did not converge to zero
if (info != 0) {
printf( "The algorithm failed to
compute eigenvalues.\n" );
return 1;
}
// print eigenvalues
printf( "Eigenvalues\n" );
printf("----------------------------------------\n");
for (i = 0; i < columns; i++) {
printf( "|%6.2f ",
eigenvalues[i] );
}
printf("\n");
printf("\n");
// print eigenvectors
printf( "Eigenvectors (stored
columnwise)\n" );
for (i = 0; i < rows; i++) {
printf("----------------------------------------\n");
for (j = 0; j < columns; j++)
{
printf( "|%6.2f ", a[i +
(j * rows)] );
}
printf( "\n" );
}
return 0;
}
This time when we use the link line advisor; the only difference is that we select OpenMP threading for our threading layer. Notice the -mkl=parallel this time in our compiler options.
Link line:
-L${MKLROOT}/lib/intel64 -liomp5 -lpthread -lm -ldl
Compiler options:
-DMKL_ILP64 -mkl=parallel
Compilation and linking:
On Thunder:
$ module load intel/2021.1.1
$ icc mkl_ssyev.c -o exe_mkl_ssyev_c
On Thunder Prime:
$ module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
$ icc mkl_ssyev.c -o exe_mkl_ssyev_c
Next,
for our job script:
mkl_ssyev_c.pbs
#!/bin/bash
#PBS
-q default
#PBS
-N job_mkl_ssyev_c
#PBS
-l select=1:mem=2gb:ncpus=4:ompthreads=4
#PBS
-l walltime=08:00:00
#PBS
-W group_list=x-ccast-prj-[GROUP_NAME]
##On Thunder using module load intel/2021.1.1
##On Thunder Prime, replace below line into module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
module load intel/2021.1.1
cd
$PBS_O_WORKDIR
## "time" is added to time the calculation
echo "Executed at: $(date)"
time ./exe_mkl_ssyev_c
echo "Finished at: $(date)"
exit
0
Submit the job:
$ qsub mkl_ssyev_c.pbs
Output the results:
$ cat job_mkl_ssyev_c.o[JOB_ID]
Eigenvalues
----------------------------------------
|-11.07
| -6.23 | 0.86 | 8.87 | 16.09
Eigenvectors
(stored columnwise)
----------------------------------------
| 0.30 |
0.51 | 0.08 | 0.00 |
0.80
----------------------------------------
|
-0.61 | -0.29 | -0.38 | -0.45 | 0.45
----------------------------------------
|
-0.40 | 0.41 | 0.66 | -0.46 | -0.17
----------------------------------------
|
-0.37 | -0.36 | 0.50 | 0.62 |
0.31
----------------------------------------
3. Examples using Intel MKL: Fortran
3.1 Dot product (sequential)
We would follow the same path in the Fortran documentation to find our routine: Fortran Developer Reference -> BLAS -> BLAS Level 1 Routines -> ?sdot
Here are our arrays and logical computation again:
x in memory (offset of 2)
------------------------------------
| 2
| ? | 2 | ? | 2 | ? | 2 | ? | 2
x logically
--------------------
| 2
| 2 | 2 | 2 | 2
y in memory and
logically (offset of 1)
--------------------
| 1
| 1 | 1 | 1 | 1
x • y
2*1 + 2*1 + 2*1 + 2*1 + 2*1 = 10
Here is the same program in Fortran. Using dynamically allocated arrays again with allocate and deallocate.
mkl_sdot.f90
program mkl_sdot
!! MKL interface
include "mkl.fi"
integer :: num_elements, incx, incy, len_x,
len_y, i
real :: result
!! dynamically allocated arrays of 1
dimension
real, dimension(:), allocatable :: x, y
num_elements = 5
incx = 2
incy = 1
!! Include space for offsets of array
elements
len_x = 1 + ((num_elements - 1) *
abs(incx))
len_y = 1 + ((num_elements - 1) *
abs(incy))
!! allocate memory for arrays
allocate( x(len_x) )
allocate( y(len_y) )
!! initialize arrays
do i = 0, num_elements
x(1 + (i * abs(incx))) = 2.0e0
y(1 + (i * abs(incy))) = 1.0e0
end do
!! run MKL routine
result = sdot(num_elements, x, incx, y,
incy)
!! print result
print '(a14, f7.3)', 'DOT PRODUCT = ',
result
!! free memory
deallocate(x)
deallocate(y)
end
program mkl_sdot
Using the link line advisor in a similar fashion, we obtain:
Link line:
-L${MKLROOT}/lib/intel64 -lpthread -lm -ldl
Compiler options:
-i8 -mkl=sequential
All together we have:
On Thunder:
$ module load intel/2021.1.1
$ ifort mkl_sdot.f90 -o exe_mkl_sdot_f90
On Thunder Prime:
$ module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
$ ifort mkl_sdot.f90 -o exe_mkl_sdot_f90
Our job script:
mkl_sdot_f90.pbs
#!/bin/bash
#PBS
-q default
#PBS
-N job_mkl_sdot_f90
#PBS
-l select=1:mem=1gb:ncpus=1
#PBS
-l walltime=08:00:00
#PBS
-W group_list=x-ccast-prj-[GROUP_NAME]
## load Intel to link with Intel MKL
##On Thunder using module load intel/2021.1.1
##On Thunder Prime, replace below line into module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
module load intel/2021.1.1
cd
$PBS_O_WORKDIR
./exe_mkl_sdot_f90
exit
0
Submit the job:
$ qsub mkl_sdot_f90.pbs
Output the results:
$ cat job_mkl_sdot_f90.o[JOB_ID]
DOT PRODUCT = 10.000
3.2 Dot product (implicit parallelism)
Use the Intel Math Kernel Library Link Line Advisor again, but this time set "Select threading layer:" to "OpenMP threading" and "Select OpenMP library:" to "Intel(R) (libiomp5)"; see below. The link line is now:
-L${MKLROOT}/lib/intel64 -liomp5 -lpthread -lm -ldl
and the compiler options are:
-i8 -mkl=parallel
So we have:
On Thunder:
$ module load intel/2021.1.1
$ ifort mkl_sdot.f90 -o exe_mkl_sdot_f90 -L${MKLROOT}/lib/intel64 -liomp5 -lpthread -lm -ldl -i8 -mkl=parallel
On Thunder Prime:
$ module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
$ ifort mkl_sdot.f90 -o exe_mkl_sdot_f90 -L${MKLROOT}/lib/intel64 -liomp5 -lpthread -lm -ldl -i8 -mkl=parallel
We now can run the executable on more than one CPU core (on a single compute node!). Modify the “#PBS -l” line in the job submission script as the following:
#PBS -l select=1:mem=2gb:ncpus=4:ompthreads=4
Submit the job and view the output:
$ qsub mkl_sdot_f90.pbs
$ cat job_mkl_sdot_f90.o[JOB_ID]
3.3 Eigenvalue problem
Steps for finding our routine in Fortran: Fortran Developer Reference -> LAPACK -> Least Squares and Eigenvalue Problems LAPACK Routines -> Driver Routines -> Symmetric EigenProblems -> syev
Here
are our inputs and outputs again for reference:
Input
Matrix
----------------------------------------
| 1.96 | -6.49 | -0.47 | -7.20 | -0.65
----------------------------------------
|
-6.49 | 3.80 | -6.39 | 1.50 | -6.34
----------------------------------------
|
-0.47 | -6.39 | 4.17 | -1.51 | 2.67
----------------------------------------
|
-7.20 | 1.50 | -1.51 | 5.70 |
1.80
----------------------------------------
|
-0.65 | -6.34 | 2.67 | 1.80 | -7.10
Eigenvalues
------------------------------------------
|
-11.07 | -6.23 | 0.86 | 8.87 |
16.09
Eigenvectors
(stored columnwise)
-----------------------------------------
| 0.30 | -0.61 | -0.40 | -0.37 | -0.49
-----------------------------------------
| 0.51 | -0.29 | 0.41 | -0.36 | 0.61
-----------------------------------------
| 0.08 | -0.38 | 0.66 |
0.50 | -0.40
-----------------------------------------
| 0.00 | -0.45 | -0.46 | 0.62 |
00.46
-----------------------------------------
| 0.80 |
0.45 | -0.17 | 0.31 | -0.16
The Fortran routine requests us to explicitly provide the working array. We arbitrarily provide it half our total RAM as our maximum working memory. To get the optimal working memory, according to the documentation, we can make a call to the syev routine with a work length of -1. Our optimal work length is provided in the first entry of the work array, and we take the minimum of that and our maximum working length.
Last note is that we initialize our array with reshape and Fortran is column major by default. Thus for our array to match visually what we see in the initialization, we must transpose it with transpose.
mkl_ssyev.f90
program mkl_ssyev
include "mkl.fi"
integer :: columns, rows, work_len_max,
info, work_len, i, j
!! character to specify job information
character*1 :: jobz, uplo
!! dynamically allocated arrays of 2
dimensions
real, dimension(:, :), allocatable :: a
!! dynamically allocated arrays of 1
dimension
real, dimension(:), allocatable ::
eigenvalues, work
!! column major
columns = 5
rows = columns
!! .5GB of floating points, 4 bytes a
floating point
work_len_max = floor(.5 * 1024 * 1024 *
1024 / 4)
!! describes returned results
!! 'N' to compute eigenvalues only
!! 'V' to compute eigenvalues and
eigenvectors
jobz = 'V'
!! describes the matrix a
!! 'U' for uppertriangular matrix
!! 'L' for lowertriangular matrix
uplo = 'L'
!! allocate memory for arrays
allocate( a(rows, columns) )
allocate( eigenvalues(columns) )
allocate( work(work_len_max) )
!! initialize array, fortran is column
major
!! transpose to match visual representation
a = transpose(reshape((/ &
1.96, 0.00, 0.00, 0.00, 0.00, &
-6.49, 3.80, 0.00, 0.00, 0.00, &
-0.47, -6.39, 4.17, 0.00, 0.00, &
-7.20, 1.50,-1.51, 5.70, 0.00, &
-0.65,-6.34, 2.67, 1.80, -7.10 &
/), (/rows, columns/)))
!! query MKL for optimal workspace
!! optimal workspace is placed in first
index of work array
work_len = -1
call ssyev( jobz, uplo, rows, a, columns,
eigenvalues, work, work_len, info )
work_len = min( work_len_max, int( work( 1
) ) )
!! run MKL routine
call ssyev( jobz, uplo, rows, a, columns,
eigenvalues, work, work_len, info )
!! if (info == 0) the execution was
successful
!! if (info == -i), the i-th parameter had
an illegal value.
!! if (info == i), then the algorithm
failed to converge; i elements did not converge to zero
if ( info /= 0 ) then
print '(A43)', 'The algorithm failed to
compute eigenvalues'
!! Terminate program
stop
end if
!! print eigenvalues
print '(A11)','Eigenvalues'
print
'(a40)','----------------------------------------'
write(*, '(*(a, F6.2, 1x))') ( '|',
eigenvalues( i ), i = 1, columns )
print *
!! print eigenvectors
print '(A32)','Eigenvectors (stored
columnwise)'
do i = 1, rows
print
'(a40)','----------------------------------------'
write(*, '(*(a, F6.2, 1x))') ('|', a(
i, j ), j = 1, columns )
end do
!! free memory
deallocate(a)
deallocate(eigenvalues)
deallocate(work)
end
program mkl_ssyev
Using OpenMP threading for our threading layer again. Link lines and compiler commands are:
Link line:
-L${MKLROOT}/lib/intel64 -liomp5 -lpthread -lm -ldl
Compiler options:
-i8 -mkl=parallel
Compilation and linking:
On Thunder:
$ module load intel/2021.1.1
$ ifort mkl_ssyev.f90 -o exe_mkl_ssyev_f90
On Thunder Prime:
$ module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
$ ifort mkl_ssyev.f90 -o exe_mkl_ssyev_f90
Our job script:
mkl_ssyev_f90.pbs
#!/bin/bash
#PBS
-q default
#PBS
-N job_mkl_ssyev_f90
#PBS
-l select=1:mem=3gb:ncpus=4:ompthreads=4
#PBS
-l walltime=08:00:00
#PBS
-W group_list=x-ccast-prj-[GROUP_NAME]
##On Thunder using module load intel/2021.1.1
##On Thunder Prime, replace below line into module load intel-parallel-studio/cluster.2020.4-gcc-vcxt
module load intel/2021.1.1
cd
$PBS_O_WORKDIR
## "time" is added to time the calculation
echo "Executed at: $(date)"
time ./exe_mkl_ssyev_f90
echo "Finished at: $(date)"
exit
0
Submit the job:
$ qsub mkl_ssyev_f90.pbs
Output the results:
$ cat job_mkl_ssyev_f90.o[JOB_ID]
Eigenvalues
----------------------------------------
|-11.07
| -6.23 | 0.86 | 8.87 | 16.09
Eigenvectors
(stored columnwise)
----------------------------------------
| 0.30 | -0.61 | -0.40 | -0.37 | -0.49
----------------------------------------
| 0.51 | -0.29 | 0.41 | -0.36 | 0.61
----------------------------------------
| 0.08 | -0.38 | 0.66 |
0.50 | -0.40
----------------------------------------
| 0.00 | -0.45 | -0.46 | 0.62 |
0.46
----------------------------------------
4. Miscellaneous
4.1 Helpful Commands
• List modules : module avail
• Load module : module load
[MODULE_NAME]
• Unload module: module unload
[MODULE_NAME]
• Submit job: qsub [PBS_FILE]
• Check the status of your jobs: qstat
-u [USER_NAME]
• Execute the last command starting with [COMMAND]: ![COMMAND]
4.2 Why Isn’t My Program Working?
Things to check if your program isn’t running:
• Did you load the Intel module?
• Have you compiled and linked the code
properly?
• Is the syntax correct?