Running Bioinformatics Software on HPC Clusters

Essential information for HPC users who run bioinformatics software.

  1. Introduction
  2. Available bioinformatics software
  3. Running bioinformatics software
  4. How to install bioinformatics software?
  5. How can you get help?
  6. List of bioinformatics software

1. Introduction

All users running bioinformatics software on CCAST’s HPC systems are required to carefully read the CCAST User Guide. This document as well as the User Guide will be updated often.

In general, users should have a basic knowledge of Linux environment, HPC systems, and job scheduling and workload management systems (specifically, PBS Proused on Thunder), and some shell scripting experience. The essential information can be found in the CCAST User Guide. Users are also strongly recommended to attend the Advanced Research Computing Training Program as well as other special training events offered by CCAST.

2. Available bioinformatics software

There are several dozens of bioinformatics software tools currently available for system-wide use on Thunder; see the list below. On Thunder, users can check available software modules by issuing the command module avail. Users can also install their own software version in their HOME or PROJECTS directory; more below.

3. Running bioinformatics software

As noted in the CCAST User Guide, “To be able to run your jobs and run them efficiently, you need to have some basic knowledge of the application you are using. This includes whether the application is serial (i.e., runs on only one core) or parallel (i.e., can run on multiple cores). If it is parallel, what is the underlying parallel programming model: shared-memory (e.g., using OpenMP, Pthreads, etc.), distributed-memory (e.g., using MPI), or hybrid? You need such information to determine how you would like to request resources for your jobs.” Please carefully read the documentation for the application you want to run, and the installation, running, and testing notes provided in a list of articles in this series (Click on the name of the software to read the article).

Example jobs for bioinformatics applications are available in /mmfs1/thunder/projects/ccastest/training/examples on Thunder. The best way to get started is to copy an example job from that directory to your SCRATCH directory, modify the PBS job script as needed, and test run it a few times to be familiar with running jobs on Thunder before running your own jobs. Note that the requested resources (ncpus, mem, etc.) in the example PBS scripts are not optimized for your jobs. Also, as a reminder, do NOT run jobs on the login node and do NOT run jobs from your HOME directory.

4. How to install bioinformatics software?

The software tools listed below have been installed on Thunder for all users to use. This means you do NOT need to install them yourselves. If users want to install and test their own version of certain software tools in their HOME or PROJECTS directory, they can do so by following the instructions in their respective articles. We encourage you to try and be familiar with such tasks since it is very likely that your research will require you to install new software that is currently not available on the Thunder cluster or to have a newer version of a certain software tool installed. 

The articles also provide you with essential information about the applications. Look for keywords such as “MPI”, “threads”, etc. as they indicate whether the software is parallel and thus can run on multiple cores. Such information will help you decide how to request resources for your jobs.

5. How can you get help?

Please read the CCAST User Guide and this document carefully and check the CCAST website and e-mails before contacting us. If you still cannot find answers to your questions, send an e-mail to ndsu.ccast.support@ndsu.edu.

6. List of bioinformatics software


#

Name

Module name[1]

Example?[2]

Install notes?[3]

1

ABySS

ABySS/2.0.2-gcc

Yes

Yes

2

BCFtools

BCFtools/1.9

Yes

Yes

3

BLAST+

BLAST+/2.6.0-gcc

BLAST+/2.7.1

BLAST+/2.8.1

Yes

Yes

4

Bowtie

Bowtie/1.2-gcc

 

 

5

Bowtie 2

Bowtie2/2.3.4.1-gcc

 

 

6

BUSCO

BUSCO/3.0.1-gcc

 

 

7

BWA

BWA/0.7.17-gcc

 

 

8

Canu

Canu/1.8

Yes

Yes

9

ClustalW-MPI

ClustalW-MPI/0.13-gcc

Yes

Yes

10

CLUSTALW

CLUSTALW/2.1-gcc

 

 

11

Cufflinks

Cufflinks/2.2.1-gcc

 

 

12

DIAMOND

DIAMOND/0.9.17

 

 

13

eggNOG-mapper

eggNOG-mapper/1.0.3

 

 

14

EnTAP

EnTAP/0.8.0-beta

EnTAP/0.8.2-beta

EnTAP/0.8.3-beta

 

 

15

FastQC

FastQC/0.11.8

Yes

Yes

16

FastUniq

FastUniq/1.1-gcc

 

 

17

FSA

FSA/1.15.9-gcc

Yes

Yes

18

GeneMarkS-T

GeneMarkS-T/5.1

 

 

19

HISAT2

HISAT2/2.1.0-gcc

 

 

20

HMMER

HMMER/3.1b2-gcc

 

 

21

InterProScan

InterProScan/5.27-66.0

 

 

22

Jellyfish

Jellyfish/2.2.10

Yes

see Trinity

23

Maq

Maq/0.7.1

Yes

Yes

24

MaSuRCA

MaSuRCA/3.3.2

Yes

Yes

25

MEGAHIT

MEGAHIT/1.2.6

Yes

Yes

26

Meraculous-2D

Meraculous-2D/2.2.6

Yes

Yes

27

Ragout

Ragout/2.2

Yes

Yes

28

RepeatMasker

RepeatMasker/4.0.9

Yes

Yes

29

RMAP

RMAP/2.1

Yes

Yes

30

RSEM

RSEM/1.3.0-gcc

 

 

31

Salmon 

Salmon/0.13.1

Yes

see Trinity

32

SAMtools 

SAMtools/1.6-gcc

Yes

see BCFtools

33

SOAPdenovo-Trans

SOAPdenovo-Trans/1.04

Yes

Yes

34

SOAPdenovo2

SOAPdenovo2/r241

Yes

Yes

35

SortMeRNA

SortMeRNA/3.0.3-gcc

Yes

Yes

36

STAR

STAR/2.5.3a-gcc

 

 

37

StringTie

StringTie/1.3.6

Yes

Yes

38

TopHat

TopHat/2.1.1-gcc

 

 

39

Transrate

Transrate/1.0.3

Yes

Yes

40

Trimmomatic

Trimmomatic/0.39

Yes

Yes

41

Trinity

trinity/2.6.6-gcc

trinity/2.8.4

Yes

Yes

42

Trinotate

Trinotate/3.1.1

Yes

Yes

43

Velvet

Velvet/1.2.10

Yes

Yes

 


[1] To use a certain application on Thunder, load appropriate environment variables by executing the command “module load <module name>” (usually within a PBS job script); e.g., “module load ABySS/2.0.2-gcc”. Be mindful that Linux commands are case-sensitive.

[2] Example jobs for many applications available on Thunder in the following directory: /mmfs1/thunder/projects/ccastest/training/examples  

[3] The running, installation, and testing notes are available in each program-specific Knowledge Base article.


See Also: