Topics Map > Services > Research Computing and Support > CCAST
Running Bioinformatics Software on HPC Clusters
Essential information for HPC users who run bioinformatics software.
- Introduction
- Available bioinformatics software
- Running bioinformatics software
- How to install bioinformatics software?
- How can you get help?
- List of bioinformatics software
1. Introduction
All users running bioinformatics software on CCAST’s HPC systems are required to carefully read the CCAST User Guide. This document as well as the User Guide will be updated often.
2. Available bioinformatics software
There are several dozens of bioinformatics software tools currently available for system-wide use on Thunder; see the list below. On Thunder, users can check available software modules by issuing the command module avail. Users can also install their own software version in their HOME or PROJECTS directory; more below.3. Running bioinformatics software
As noted in the CCAST User Guide, “To be able to run your jobs and run them efficiently, you need to have some basic knowledge of the application you are using. This includes whether the application is serial (i.e., runs on only one core) or parallel (i.e., can run on multiple cores). If it is parallel, what is the underlying parallel programming model: shared-memory (e.g., using OpenMP, Pthreads, etc.), distributed-memory (e.g., using MPI), or hybrid? You need such information to determine how you would like to request resources for your jobs.” Please carefully read the documentation for the application you want to run, and the installation, running, and testing notes provided in a list of articles in this series (Click on the name of the software to read the article).
Example jobs for bioinformatics applications are available in /gpfs1/projects/ccastest/training/examples on Thunder. The best way to get started is to copy an example job from that directory to your SCRATCH directory, modify the PBS job script as needed, and test run it a few times to be familiar with running jobs on Thunder before running your own jobs. Note that the requested resources (ncpus, mem, etc.) in the example PBS scripts are not optimized for your jobs. Also, as a reminder, do NOT run jobs on the login node and do NOT run jobs from your HOME directory.4. How to install bioinformatics software?
The software tools listed below have been installed on Thunder for all users to use. This means you do NOT need to install them yourselves. If users want to install and test their own version of certain software tools in their HOME or PROJECTS directory, they can do so by following the instructions in their respective articles. We encourage you to try and be familiar with such tasks since it is very likely that your research will require you to install new software that is currently not available on the Thunder cluster or to have a newer version of a certain software tool installed.
The articles also provide you with essential information about the applications. Look for keywords such as “MPI”, “threads”, etc. as they indicate whether the software is parallel and thus can run on multiple cores. Such information will help you decide how to request resources for your jobs.
5. How can you get help?
Please read the CCAST User Guide and this document carefully and check the CCAST website and e-mails before contacting us. If you still cannot find answers to your questions, send an e-mail to ndsu.ccast.support@ndsu.edu.6. List of bioinformatics software
# | Name | Module name[1] | Example?[2] | Install notes?[3] |
1 | ABySS/2.0.2-gcc | Yes | Yes | |
2 | BCFtools/1.9 | Yes | Yes | |
3 | BLAST+/2.6.0-gcc BLAST+/2.7.1 BLAST+/2.8.1 | Yes | Yes | |
4 | Bowtie | Bowtie/1.2-gcc |
|
|
5 | Bowtie 2 | Bowtie2/2.3.4.1-gcc |
|
|
6 | BUSCO | BUSCO/3.0.1-gcc |
|
|
7 | BWA | BWA/0.7.17-gcc |
|
|
8 | Canu/1.8 | Yes | Yes | |
9 | ClustalW-MPI/0.13-gcc | Yes | Yes | |
10 | CLUSTALW | CLUSTALW/2.1-gcc |
|
|
11 | Cufflinks | Cufflinks/2.2.1-gcc |
|
|
12 | DIAMOND | DIAMOND/0.9.17 |
|
|
13 | eggNOG-mapper | eggNOG-mapper/1.0.3 |
|
|
14 | EnTAP | EnTAP/0.8.0-beta EnTAP/0.8.2-beta EnTAP/0.8.3-beta |
|
|
15 | FastQC/0.11.8 | Yes | Yes | |
16 | FastUniq | FastUniq/1.1-gcc |
|
|
17 | FSA/1.15.9-gcc | Yes | Yes | |
18 | GeneMarkS-T | GeneMarkS-T/5.1 |
|
|
19 | HISAT2 | HISAT2/2.1.0-gcc |
|
|
20 | HMMER | HMMER/3.1b2-gcc |
|
|
21 | InterProScan | InterProScan/5.27-66.0 |
|
|
22 | Jellyfish/2.2.10 | Yes | see Trinity | |
23 | Maq/0.7.1 | Yes | Yes | |
24 | MaSuRCA/3.3.2 | Yes | Yes | |
25 | MEGAHIT/1.2.6 | Yes | Yes | |
26 | Meraculous-2D/2.2.6 | Yes | Yes | |
27 | Ragout/2.2 | Yes | Yes | |
28 | RepeatMasker/4.0.9 | Yes | Yes | |
29 | RMAP/2.1 | Yes | Yes | |
30 | RSEM | RSEM/1.3.0-gcc |
|
|
31 | Salmon/0.13.1 | Yes | see Trinity | |
32 | SAMtools/1.6-gcc | Yes | see BCFtools | |
33 | SOAPdenovo-Trans/1.04 | Yes | Yes | |
34 | SOAPdenovo2/r241 | Yes | Yes | |
35 | SortMeRNA/3.0.3-gcc | Yes | Yes | |
36 | STAR | STAR/2.5.3a-gcc |
|
|
37 | StringTie/1.3.6 | Yes | Yes | |
38 | TopHat | TopHat/2.1.1-gcc |
|
|
39 | Transrate/1.0.3 | Yes | Yes | |
40 | Trimmomatic/0.39 | Yes | Yes | |
41 | trinity/2.6.6-gcc trinity/2.8.4 | Yes | Yes | |
42 | Trinotate/3.1.1 | Yes | Yes | |
43 | Velvet/1.2.10 | Yes | Yes |
[1] To use a certain application on Thunder, load appropriate environment variables by executing the command “module load <module name>” (usually within a PBS job script); e.g., “module load ABySS/2.0.2-gcc”. Be mindful that Linux commands are case-sensitive.
[2] Example jobs for many applications available on Thunder in the following directory: /gpfs1/projects/ccastest/training/examples
[3] The running, installation, and testing notes are available in each program-specific Knowledge Base article.