Bioinformatics - BLAST+

Instructions on how to run (and, if needed, install a customized version of) BLAST+

BLAST+ is a utility for running BLAST searches on your own server without size, volume, and database restrictions.

  1. Running BLAST+ on Thunder
  2. Install customized BLAST+ on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running BLAST+ on Thunder


Example: Extract a sequence from a pre-built database and blast this sequence against the database


Location: /gpfs1/projects/ccastest/training/examples/BLAST+_example


File list

· blast+_job.pbs: job submission script 

· db (directory): a directory of pre-built blast database


Steps

· Copy example directory to your SCRATCH directory

o    cp -r /gpfs1/projects/ccastest/training/examples/BLAST+_example $SCRATCH

· Go to the copied directory

o    cd  $SCRATCH/BLAST+_example

· Edit the job submission script as needed, then submit the job

o    qsub blast+_job.pbs


2. Install Customized BLAST+ on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.

Summary

(a)    GCC C++ (system default is adequate, module load not necessary)

(b)    Utilizes Pthreads for shared memory parallelization. Does NOT support multi-node calculations.

Details

In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.


(a) Install


· Navigate to your software directory

o    cd /gpfs1/home/USERNAME/SOFTWARE

· Download BLAST+ source (More versions available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ ) 

o    wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.9.0+-src.tar.gz

· Extract files

o    tar zxvf ncbi-blast-2.9.0+-src.tar.gz

· Configure & build software

o    cd ncbi-blast-2.9.0+-src.tar.gz

o    cd c++

o    ./configure

o    cd ReleaseMT/Build

o    make all_r


(b) Test


· Navigate to your scratch directory

o    cd /gpfs1/scratch/USERNAME

· Create a directory for testing BLAST+

o    mkdir blast.job

o    cd blast.job

· Download a sample blast database

o    mkdir db

o    cd db

o    wget https://ftp.ncbi.nlm.nih.gov/blast/db/refseq_rna.00.tar.gz

· Extract database

o    tar zxvf refseq_rna.00.tar.gz

o    cd ..

· Write and submit the job

o    qsub blast+_job.pbs


 ------------------------------------------- file blast+_job.pbs -------------------------------------------

#!/bin/bash 

#PBS -q default 

#PBS -N BLAST+_test 

#PBS -j oe 

##changes the values as needed; always set “select=1”.

#PBS -l select=1:mem=3gb:ncpus=3

#PBS -l walltime=08:00:00 

##replace "x-ccast-prj" below with "x-ccast-prj-[your project group name]"

#PBS -W group_list=x-ccast-prj 

 

# Add your BLAST+ binaries to $PATH

export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/ncbi-blast-2.9.0+-src/c++/ReleaseMT/bin

 

cd ${PBS_O_WORKDIR} 

##note:num_threads=ncpus-1

export OMP_NUM_THREADS=$(($NCPUS-1)) 

 

##call blastdbcmd to extract the sequence of "nm_000122" from the installed database ("refseq_rna.00") ##to a file 

blastdbcmd -db ${PBS_O_WORKDIR}/db/refseq_rna.00 -entry nm_000122 -out 

test_query_$PBS_JOBID.fa

##run a test blastn search using the sequence in "test_query_$PBS_JOBID.fa" as query against ##"refseq_rna.00" database 

##general search option: -task <string>; permissible values: 'blastn', 'blastn-short', 'dc-megablast', ##'megablast', or 'rmblastn' 

##query filtering option: -dust <string>; (format: 'yes', 'level window linker', or 'no' to disable) 

##formatting options: -outfmt <string> 

##-max_target_seqs: maximum number of aligned sequences to keep (value of 5 or more is recommended)

blastn -num_threads $OMP_NUM_THREADS -query test_query_$PBS_JOBID.fa -db

${PBS_O_WORKDIR}/db/refseq_rna.00 -task blastn -dust no -outfmt "7 qseqid sseqid 

evalue bitscore" -max_target_seqs 20 -out blastn_search_$PBS_JOBID.log 


See Also: