Bioinformatics - ClustalW-MPI

Instructions on how to run (and, if needed, install a customized version of) ClustalW-MPI

ClustalW-MPI is a parallel implementation of Clustal-W, based on MPI.

  1. Running ClustalW-MPI on Thunder
  2. Install customized ClustalW-MPI on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running ClustalW-MPI on Thunder


Example: Conduct a multiple alignment on a set of sequences


Location/gpfs1/projects/ccastest/training/examples/ClustalW_MPI_example


File list

· clustalw-mpi_job.pbs: job submission script 

· 16s.fasta: a set of sequences in fasta format


Steps

· copy example directory to your SCRATCH directory

o    cp -r /gpfs1/projects/ccastest/training/examples/ClustalW_MPI_example $SCRATCH

· Go to the copied directory

o    cd  ./ClustalW_MPI_example

· Edit the job submission script as needed, then submit the job

o    clustalw-mpi_job.pbs


2. Install Customized ClustalW-MPI on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.


Summary


· GCC 4.2 or greater (system GCC is 4.8.5, “module load” not necessary).
· Require the Boost C++ libraries. (available via ‘module load boost’).
· Google sparsehash is recommended for building.
· MPI (Message Passing Interface) is needed for building the MPI-enabled version of the ABySS (available via “module load openmpi” or “module load mpich)

Details

In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.


(a) Install

· Go to SOFTWARE inside your home directory – create if it does not exist: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE" 

· Download ClustalW-MPI, unzip and go to the unzipped directory: 

o    "wget http://www.bii.a-star.edu.sg/docs/software/clustalw-mpi-0.13.tar.gz" 

o    "tar -xzvf clustalw-mpi-0.13.tar.gz"  

o    "cd clustalw-mpi-0.13"

· Load mpich:

o    "module load mpich"

· Build:

o    "make"


(b) Test

· Make a test directory and go into it: 

o    "cd /gpfs1/scratch/USERNAME" 

o    "mkdir clustalw-mpi_example"

o    "cd clustalw-mpi_example"

· Download a sample of test data to the current location:

o    "wget https://github.com/mmatschiner/tutorials/raw/master/multiple_sequence_alignment/data/16s.fasta"

· Write and submit the job 

o    "qsub clustalw-mpi_job.pbs"


------------------------------------------- file clustalw-mpi_job.pbs -------------------------------------------

#!/bin/bash
#PBS -q default
#PBS -N ClustalW-MPI_test
#PBS -l select=1:mem=10gb:ncpus=4
#PBS -l walltime=02:00:00
#PBS -W group_list=x-ccast-prj-[your project group name here]
cd $PBS_O_WORKDIR
#Set the path to your executable location
export MY_CLUSTALW_MPI=/gpfs1/home/USERNAME/SOFTWARE/clustalw-mpi-0.13
module load mpich
mpirun -np $NCPUS $MY_CLUSTALW_MPI/clustalw-mpi -infile=16s.fasta

exit 0

See Also: