Bioinformatics - Maq

Instructions on how to run (and, if needed, install a customized version of) Maq

Maq builds assembly by mapping short reads to reference sequences.

  1. Running Maq on Thunder
  2. Install customized Maq on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running Maq on Thunder


Example: Map reads to the reference sequence and assemble reads based on mapping result


Location: /gpfs1/projects/ccastest/training/examples/Maq_example


File list:

· maq_job.pbs: job submission script  

· chr8.fa: a reference sequence in fasta format

· test_reads.fq: a set of sequences in fastq format 


Steps:

· Copy the example directory to your SCRATCH directory

o   cp -r /gpfs1/projects/ccastest/training/examples/Maq_example $SCRATCH

· Go to the copied directory

o   cd  $SCRATCH/Maq_example

· Edit the job submission script as needed, then submit the job

o    qsub maq_job.pbs


2. Install Customized Maq on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.

Summary

(a) bzip2 is required for building; (available via ‘module load bzip2’) 

(b) Option “--threads INT: Number of output compression threads to use in addition to the main thread. Only used when --output-type is b or z. Default: 0.


Details

In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.


(a) Install

· Go to the SOFTWARE directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE" 

· Download Maq, unzip and go to the unzipped directory:

o    "wget https://sourceforge.net/projects/maq/files/maq/0.7.1/maq-0.7.1.tar.bz2" 

o    "tar -xjvf maq-0.7.1.tar.bz2"  

o    "cd maq-0.7.1"

· The file "stdhash.hh" has an issue. We need change all the declarations "int ret = direct_insert_aux..." to "int ret = this->direct_insert_aux...". Use below command to do so:

o    "sed -i -E 's/(int ret = )(direct_insert_aux)/\1this->\2/' stdhash.hh"

· Configure and specify the install location:

o    "./configure --prefix=/gpfs1/home/USERNAME/SOFTWARE/maq_install_here"

· Build and install:

o    "make && make install"


(b) Test

· Make a test directory and go into it: 

o    "cd /gpfs1/scratch/USERNAME " 

o    "mkdir MAQ_example"

o    "cd MAQ_example"

· Download test data to current location: (use the test data of RMAP)

o    "wget https://github.com/smithlabcode/rmap/raw/master/test/input/{chr8.fa,test_reads.fq}"

· Write and submit the job 

o    "qsub maq_job.pbs"


------------------------------------------- file maq_job.pbs -------------------------------------------

#!/bin/bash

#PBS -q default

#PBS -N Maq_Test

#PBS -l select=1:mem=10gb:ncpus=1

#PBS -l walltime=02:00:00

## Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”

#PBS -W group_list=x-ccast-prj

cd $PBS_O_WORKDIR

# Set path to your maq binaries

export MY_MAQ=/gpfs1/home/USERNAME/SOFTWARE/maq_install_here/bin

$MY_MAQ/maq.pl easyrun -d outdir chr8.fa test_reads.fq

exit 0

See Also: