Topics Map > Services > Research Computing and Support > CCAST
Bioinformatics - MaSuRCA
Instructions on how to run (and, if needed, install a customized version of) MaSuRCA
MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches.
1. Running MaSuRCA on Thunder
Example: Assemble single- or paired-end reads
Location: /mmfs1/thunder/projects/ccastest/training/examples/MaSuRCA_example
File list:
· masurca_job.pbs: job submission script
· frag_1.fastq: a set of sequences in fastq format
· fraq_2.fastq: a set of sequences in fastq format
· config.txt: a configuration file to direct MaSuRCA to generate the desired bash script
Steps:
· Copy the example directory to your SCRATCH directory
o “cp -r /mmfs1/thunder/projects/ccastest/training/examples/MaSuRCA_example $SCRATCH”
· Go to the copied directory
o “cd $SCRATCH/MaSuRCA_example”
· Edit the job submission script as needed, then submit the job
o “qsub masurca_job.pbs”
2. Install Customized MaSuRCA on Thunder
Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.
Summary
(a) Require installation of gcc 4.7 or higher. (System GCC is 4.8.5 – module load not needed)
(b) bzip2-devel is required for building; (available via ‘module load bzip2')
(c) Other tools are installed by itself. (such as jellyfish – module load not needed)
(d) Having NUM_THREADS in the config file in the first step of running.
Details
In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME” is your username on Thunder.
(a) Install
· Go to the SOFTWARE directory:
o “cd /mmfs11/home/USERNAME/SOFTWARE”
· Download and unzip:
o "wget https://github.com/alekseyzimin/masurca/releases/download/3.3.2/MaSuRCA-3.3.2.tar.gz"
o "tar -zxvf MaSuRCA-3.3.2.tar.gz"
· Go to the MaSuRCA directory:
o “cd /mmfs1/home/USERNAME/SOFTWARE/MaSuRCA-3.3.2”
· Load bzip2 module
o “module load bzip2”
· Install MaSuRCA
o “./install.sh”
(b) Test
MaSuRCA runs with 2 steps. The first step uses a configuration file to generate a shell script called assemble.sh. Then, executes the shell script to complete the actual assembly. The easiest way is to copy the sample configuration file to the directory of choice for running the assembly and then modify.
· Test in scratch directory:
o "cd /mmfs1/thunder/scratch/USERNAME"
· Make a directory for it:
o "mkdir MaSuRCA_test"
· Go into it.
o "cd MaSuRCA_test"
· Download data and unzip:
o "wget http://gage.cbcb.umd.edu/data/Staphylococcus_aureus/Data.original/frag_1.fastq.gz"
o "wget http://gage.cbcb.umd.edu/data/Staphylococcus_aureus/Data.original/frag_2.fastq.gz"
o "gunzip frag_1.fastq.gz frag_2.fastq.gz"
· Write config file:
o Copy the template config file sr_config_example.txt:
o "cp /mmfs1/home/USERNAME/SOFTWARE/MaSuRCA-3.3.2/sr_config_example.txt .
· Modify the sr_config_example.txt:
o Specify input:
o "PE= pe 180 27 /mmfs1/home/USERNAME/MaSuRCA_test/frag_1.fastq /mmfs1/home/USERNAME/MaSuRCA_test/frag_2.fastq"
· Ignore jump
o "#JUMP......"
· Set threads:
o "NUM_THREADS = 4"
· Write and submit the job:
o "qsub masurca_test.pbs”
---------------masurca_test.pbs-----------------
#!/bin/bash
#PBS -q default
#PBS -N MaSuRCA_test
#PBS -l select=1:mem=20gb:ncpus=4
#PBS -l walltime=10:00:00
#PBS -W group_list=x-ccast-prj-[your project group name here]
cd $PBS_O_WORKDIR
#Set path to your MaSuRCa binaries
export PATH=$PATH:/mmfs1/home/USERNAME/SOFTWARE/MaSuRCA-3.3.2/bin
masurca sr_config_example.txt
./assemble.sh
exit 0