Topics Map > Services > Research Computing and Support > CCAST
Bioinformatics - SOAPdenovo2
Instructions on how to run (and, if needed, install a customized version of) SOAPdenovo2
SOAPdenovo2 is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads.
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.1. Running SOAPdenovo2 on Thunder
Example: Assemble shorts reads into genomes
Location: /mmfs1/thunder/projects/ccastest/training/examples/SOAPdenovo2_example
File list
· soapdenovo2_job.pbs: job submission script
· config.txt: configuration file
· frag_1.cor.fastq: paired-end reads in fastq format
· frag_2.cor.fastq: paired-end reads in fastq format
Steps
· Copy example directory to your SCRATCH directory
o “cp -r /mmfs1/thunder/projects/ccastest/training/examples/SOAPdenovo2_example $SCRATCH”
· Go to the copied directory
o “cd $SCRATCH/SOAPdenovo2_example”
· Edit the job submission script as needed, then submit the job
o “qsub soapdenovo2_job.pbs”
2. Install Customized SOAPdenovo2 on Thunder
Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.
Summary
(a) "-p" option: number of CPU for use, 8 by default.
(b) No software dependencies
Details
In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME” is your username on Thunder.
(a) Install
· Go to the SOFTWARE directory:
o "cd /mmfs1/home/USERNAME/SOFTWARE"
· Git clone the SOAPdenovo2:
o "git clone https://github.com/aquaskyline/SOAPdenovo2.git"
· Build:
o "cd SOAPdenovo2"
o "make"
(b) Test
· Make a test directory and go into it:
o "cd /mmfs1/thunder/scratch/USERNAME "
o "mkdir SOAPdenovo2_example"
o "cd SOAPdenovo2_example"
· Download the test data to current location:
o "wget http://rcs.bu.edu/examples/bioinformatics/soapdenovo/bambus2/frag_{1,2}.cor.fastq"
· Write the config file:
------------------------------------------- file config.txt -------------------------------------------
[LIB]
avg_ins=180
reverse_seq=0
asm_flags=1
rank=1
q1=frag_1.cor.fastq
q2=frag_2.cor.fastq
· Write and submit the job
o "qsub soapdenovo2_job.pbs"
------------------------------------------- file soapdenovo2_job.pbs -------------------------------------------
#!/bin/bash
#PBS -q default
#PBS -N test
##does not work for multiple nodes (i.e., select=1)
##change mem, ncpus, and walltime as needed:
#PBS -l select=1:mem=10gb:ncpus=4
#PBS -l walltime=1:00:00
## Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”
#PBS -W group_list=x-ccast-prj
cd $PBS_O_WORKDIR
# Set path of your SOAPdenovo2 Binaries
export MY_SOAPDENOVO2=/mmfs1/home/USERNAME/SOFTWARE/SOAPdenovo2
$MY_SOAPDENOVO2/SOAPdenovo-63mer all -K 31 -p $NCPUS -s config.txt -o output
exit 0