Using Singularity on HPC Clusters

A tutorial on using Singularity, with examples in bioinformatics and machine learning

Singularity is an open-source computer program that performs operating-system-level virtualization also known as containerization. Singularity lets the users develop and customize their workflow without the need for admin intervention. Singularity allows users to package a complete environment into a single file known as a container image, which includes system libraries, custom user software, configuration files, and most other dependencies. A Singularity image file can be easily copied and runs on Linux-based computing platform.

Multiple versions of Singularity are installed on CCAST systems. On Thunder, check all available software modules by typing:
$ module avail

NOTE: Terminal commands are denoted by inline code prefixed with $, such as $ module avail in the above example. Variable inputs are denoted by capital letters in brackets, e.g., [JOB ID].

Before using Singularity on Thunder, you need to check available version. Currently, on Thunder this is 3.4.0 and on Thunder Prime this is 3.8.0. Then, load Singularity using the module load command:
$ module load singularity/[VERSION]

To use Singularity on Thunder, you either need to build a container image with Singularity or download Singularity container images from existing container libraries. 

Build a container image

To build a Singularity container, you will need root access to a system with Singularity installed (Thunder do NOT give root access to users!). When you have a Linux system to which you have root (admin) access, the first step is to install Singularity on it. You can follow the installation steps. Once Singularity is installed, you can build container images with Singularity by a Singularity definition file. For detailed information on writing Singularity definition files, please see the Container Definition docs.

Note that building a new container image from a container definition file is not currently possible on Thunder because it requires elevated privileges ("root access"). If you see error messages:
FATAL: Unable to build XXX: you must be the root user to build from a definition file
You need to build a container image on a Linux system to which you have root (admin) access. For example, you can build a container image on a Linux system on your personal computer. Regarding build images by a Singularity definition file, please refer to Building containers from Singularity definition files

Download Singularity container images from existing container libraries 

You can download Singularity container images from existing container libraries. List of some useful container libraries:

1. Docker Based Container Libraries: Docker Hub, Nvidia GPU-Accelerated Containers, Quay (bioinformatics), and BioContainers (bioinformatics). 

2. Singularity Libraries: Singularity Library, and Singularity Hub

The pull command of Singularity allows you to download or build a container from a given URI. To download an image using pull:
$ singularity pull [OPTION] [OUTPUT FILE NAME] [URI]

For example, on a system with Singularity installed, you can download an image named as singularity-images.sif from a URL of Singularity Hub:
$ singularity pull singularity-images.sif shub://vsoch/singularity-images

If you prepare container images on your local machine, you need to move them to Thunder. Regarding transferring files to Thunder, see the CCAST User Guide for more details.

Running Singularity on Thunder 

A Singularity job needs a Singularity container image and job files that you intend to run as well as a job submission script to submit it to the job scheduler (PBS Pro on Thunder/Thunder Prime). See the CCAST User Guide for more information on running jobs on CCAST systems in general.

Example files

On Thunder:
All the source codes and job submission scripts discussed in this document can be found in the following compressed file: /mmfs1/thunder/projects/ccastest/training/examples/Singularity_Tutorial_examples.tar.gz.
 
To copy the examples to your SCRATCH directory (/mmfs1/thunder/scratch/[USERNAME]):
$ cp /mmfs1/thunder/projects/ccastest/training/examples/Singularity_Tutorial_examples.tar.gz $SCRATCH

On Thunder Prime:

All the source codes and job submission scripts discussed in this document can be found in the following compressed file: /mmfs1/projects/ccastest/examples/Singularity_Tutorial_examples.tar.gz.
 
To copy the examples to your SCRATCH directory (/mmfs1/scratch/[USERNAME]):
$ cp /mmfs1/projects/ccastest/examples/Singularity_Tutorial_examples.tar.gz $SCRATCH


To uncompress the *tar.gz file:
$ cd $SCRATCH
$ tar -xvf Singularity_Tutorial_examples.tar.gz

In the following, we examine a few Singularity examples more specifically:

Example 1: “HelloWorld!”

Singularity container images contain runscripts. These are user defined scripts that define the actions a container should perform when someone runs it. The runscript can be triggered with the run command. In this simple example, when we run the container image hello-world_latest.sif, the system prints a string in the output file. 

Singularity_job.pbs

#!/bin/bash
#PBS -q default
#PBS -N Singularity_test
##serial jobs: only 1 processor core is requested
#PBS -l select=1:mem=2gb:ncpus=1
#PBS -l walltime=00:10:00
##replace "x-ccast-prj" below with "x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj 

module load singularity/[VERSION]

cd $PBS_O_WORKDIR

singularity run hello-world_latest.sif

exit 0

On Thunder, you need to open this file using a UNIX/Linux text editor and edit the line “#PBS -W” to be sure that your project group name is correct. If you do not remember your project group name, execute the command “id” or “groups” when you are on Thunder.

To submit the job:
$ qsub Singularity_job.pbs

To check the status of the job (It may show nothing if the job has completed):
$ qstat -u $USER

To view the error file:
$ cat Singularity_test.e[JOB ID]

To view the output file:
$ cat Singularity_test.o[JOB ID]

The expected output is a string, for example:
HelloWorld! 

This example uses run command to launch a Singularity container and execute a runscript. You can check available commands by executing the command:
$ singularity --help

\Usage:
  singularity [global options...] <command>

Here, some commands are listed:
exec        Run a command within a container
pull         Pull an image from a URI
run          Run the user-defined default command within a container
run-help  Show the user-defined help for an image
search     Search a Container Library for images

Example 2: “OBITools for bioinformatics”

One of the more common advantages of using Singularity is the ability to use pre-built containers for specific applications which may be difficult to install and maintain by yourself. This example job use a pre-built container image named as obitools.simg from Singularity Hub. This image is created to run programs specifically designed for analyzing NGS data in a DNA metabarcoding context. It integrates OBITools package 1.0, ecoPrimers 1.0.1, ecoPCR 0.5, and EMBOSS, In this example, the obicount is one command that can be executed within the container. By running this container, the obicount command of OBITools can be executed to count the number of sequence records of the input file.

Singularity_job.pbs

#!/bin/bash
#PBS -q default
#PBS -N Singularity_OBITools_test
##serial jobs: only 1 processor core is requested
#PBS -l select=1:mem=2gb:ncpus=1
#PBS -l walltime=00:10:00
##replace "x-ccast-prj" below with "x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj 

module load singularity/[VERSION]

cd $PBS_O_WORKDIR

##obicount command prints the number of sequence records contained in the fasta file.
echo "The number of sequence records is :"
singularity exec obitools.simg obicount -a $PBS_O_WORKDIR/rfam-5.8s-database-id98.fasta

exit 0

To submit the job:
$ qsub Singularity_job.pbs

To check the status of the job (It may show nothing if the job has completed):
$ qstat -u $USER

To view the error file:
$ cat Singularity_OBITools_test.e[JOB ID]

To view the output file:
$ cat Singularity_OBITools_test.o[JOB ID]

The expected output is:
The number of sequence records is :
13034

Example 3: “eDNA analysis for bioinformatics”

This example job use a pre-built container image named as ednatools.simg from Singularity Hub. This image integrates useful programs for eDNA analysis, which includes vsearch 2.13.4, pear 0.9.11, fastq-join 1.3.1, pandaseq 2.11, jellyfish 2.2.6, casper 0.8.2, FLASH 1.2.11, fastq-multx, cutadapt 2.3, SWARM 2.2.2, Reaper 13.274, TAGcleaner 0.16, Flexbar 3.0.3, usearch 11.0.667, deML 1.0, NGmerge, and FASTP. In this example, the fastq-join is one command that can be executed within the container. By running this container, the fastq-join command read pairing based on the percent similarity and length of the overlap. 

Singularity_job.pbs

#!/bin/bash
#PBS -q default
#PBS -N Singularity_eDNA_test
##serial jobs: only 1 processor core is requested
#PBS -l select=1:mem=2gb:ncpus=1
#PBS -l walltime=00:10:00
##replace "x-ccast-prj" below with "x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj 

module load singularity/[VERSION]

cd $PBS_O_WORKDIR

##runscripts within the container is triggered with the run command
singularity run ednatools.simg

##fastq-join command joins two paired-end and reads on the overlapping ends
singularity exec ednatools.simg fastq-join -p 2 -m 200 $PBS_O_WORKDIR/EGL245_S19_L001_R1_001.fastq $PBS_O_WORKDIR/EGL245_S19_L001_R2_001.fastq -o test_%.fastq

exit 0

To submit the job:
$ qsub Singularity_job.pbs

To check the status of the job (It may show nothing if the job has completed):
$ qstat -u $USER

To view the error file:
$ cat Singularity_eDNA_test.e[JOB ID]

To view the output file:
$ cat Singularity_eDNA_test.o[JOB ID]

The expected output is:

Opening container...ubuntu beaver: vsearch, PEAR, fastq-join, pandaseq, jellyfish, casper, FLASH, fastq-multx, cutadapt, SWARM, REAPER, tally, minion, swan, tagCleaner, flexbar, usearch, deML, trimmomatic, prinseq, NGmerge, FASTP 
Total reads: 211364
Total joined: 86
Average join len: 208.66
Stdev join len: 8.16
Version: 1.3.1

Example 4: “PyTorch for machine learning”

Singularity containers allow you to package the environment that your code depends on inside of a portable unit. It is also useful for installing software, packages, libraries, etc. in environments where you do not have root privileges. This example job runs a simple PyTorch program to train a neural net and print loss using the container pytorch_latest.sif

Singularity_job.pbs

#!/bin/bash
#PBS -q default
#PBS -N Singularity_pytorch_test
#PBS -l select=1:mem=8gb:ncpus=4
#PBS -l walltime=01:00:00
##replace "x-ccast-prj" below with "x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj 

module load singularity/[VERSION]

cd $PBS_O_WORKDIR

#on Thunder
singularity exec pytorch_latest.sif python3 $PBS_O_WORKDIR/pytorch.py

#change the above line on Thunder Prime into
#singularity exec pytorch_20.12-py3.sif python3 $PBS_O_WORKDIR/pytorch.py

exit 0

To submit the job:
$ qsub Singularity_job.pbs

To check the status of the job (It may show nothing if the job has completed):
$ qstat -u $USER

To view the error file:
$ cat Singularity_pytorch_test.e[JOB ID]

To view the output file:
$ cat Singularity_pytorch_test.o[JOB ID]

The expected output is:

Input #      LOSS 
199991 5.0943803842073976e-08
199992 5.116343970712478e-08
199993 5.2332254085740715e-08
199994 5.175438388960174e-08
199995 5.150118198571363e-08
199996 5.1882821594517736e-08
199997 5.066658559371717e-08
199998 5.026544158681645e-08
199999 4.989855995063408e-08

Example 5: “TensorFlow for machine learning”

TensorFlow is commonly used for machine learning projects but can be difficult to install on older systems, and is updated frequently. Running TensorFlow from a container removes installation problems and makes trying out new versions easy. The official TensorFlow repository on Docker Hub contains TensorFlow based containers. This example job runs TensorFlow program within container tensorflow_latest.sif to perform machine learning task. 

Singularity_job.pbs

#!/bin/bash
#PBS -q default
#!/bin/bash
#PBS -q default
#PBS -N Singularity_tf_test
#PBS -l select=1:mem=4gb:ncpus=2
#PBS -l walltime=01:00:00
##replace "x-ccast-prj" below with "x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj 

module load singularity/[VERSION]

cd $PBS_O_WORKDIR

singularity exec tensorflow_latest.sif python $PBS_O_WORKDIR/tf.py

exit 0

To submit the job:
$ qsub Singularity_job.pbs

To check the status of the job (It may show nothing if the job has completed):
$ qstat -u $USER

To view the error file:
$ cat Singularity_tf_test.e[JOB ID]

To view the output file:
$ cat Singularity_tf_test.o[JOB ID]

The expected output is:

Epoch 1/5
1875/1875 - 2s - loss: 0.2990 - accuracy: 0.9125
Epoch 2/5
1875/1875 - 2s - loss: 0.1429 - accuracy: 0.9570
Epoch 3/5
1875/1875 - 2s - loss: 0.1102 - accuracy: 0.9665
Epoch 4/5
1875/1875 - 2s - loss: 0.0873 - accuracy: 0.9729
Epoch 5/5
1875/1875 - 2s - loss: 0.0759 - accuracy: 0.9758

See Also: