Topics Map > Services > Research Computing and Support > CCAST

Bioinformatics - SOAPdenovo2

Instructions on how to run (and, if needed, install a customized version of) SOAPdenovo2

SOAPdenovo2 is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads.

  1. Running SOAPdenovo2 on Thunder
  2. Install customized SOAPdenovo2 on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running SOAPdenovo2 on Thunder


Example: Assemble shorts reads into genomes


Location: /gpfs1/projects/ccastest/training/examples/SOAPdenovo2_example


File list

· soapdenovo2_job.pbs: job submission script  

· config.txt: configuration file

· frag_1.cor.fastq: paired-end reads in fastq format

· frag_2.cor.fastq: paired-end reads in fastq format


Steps

· Copy example directory to your SCRATCH directory

o    cp -r /gpfs1/projects/ccastest/training/examples/SOAPdenovo2_example $SCRATCH

· Go to the copied directory

o    cd  $SCRATCH/SOAPdenovo2_example

· Edit the job submission script as needed, then submit the job

o    qsub soapdenovo2_job.pbs


2. Install Customized SOAPdenovo2 on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.

Summary

(a)    "-p" option: number of CPU for use, 8 by default.

(b)    No software dependencies

Details

In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.


(a) Install

·       Go to the SOFTWARE directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE

·       Git clone the SOAPdenovo2: 

o    "git clone https://github.com/aquaskyline/SOAPdenovo2.git

·       Build:

o    "cd SOAPdenovo2"

o    "make"

(b) Test

·       Make a test directory and go into it: 

o    "cd /gpfs1/scratch/USERNAME " 

o    "mkdir SOAPdenovo2_example"

o    "cd SOAPdenovo2_example"

·       Download the test data to current location:  

o    "wget http://rcs.bu.edu/examples/bioinformatics/soapdenovo/bambus2/frag_{1,2}.cor.fastq

·       Write the config file: 

------------------------------------------- file config.txt -------------------------------------------

[LIB]

avg_ins=180

reverse_seq=0

asm_flags=1

rank=1

q1=frag_1.cor.fastq

q2=frag_2.cor.fastq


·       Write and submit the job 

o    "qsub soapdenovo2_job.pbs"

------------------------------------------- file soapdenovo2_job.pbs -------------------------------------------

#!/bin/bash

#PBS -q default

#PBS -N test

##does not work for multiple nodes (i.e., select=1)

##change mem, ncpus, and walltime as needed:

#PBS -l select=1:mem=10gb:ncpus=4

#PBS -l walltime=1:00:00

## Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”

#PBS -W group_list=x-ccast-prj

cd $PBS_O_WORKDIR

 

# Set path of your SOAPdenovo2 Binaries

export MY_SOAPDENOVO2=/gpfs1/home/USERNAME/SOFTWARE/SOAPdenovo2

 

$MY_SOAPDENOVO2/SOAPdenovo-63mer all -K 31 -p $NCPUS -s config.txt -o output

 

exit 0

See Also:




Keywords:ccast, hpc, thunder, bioinformatics, soapdenovo2   Doc ID:108081
Owner:Liu Y.Group:IT Knowledge Base
Created:2020-12-25 11:54 CDTUpdated:2020-12-29 02:10 CDT
Sites:IT Knowledge Base
CleanURL:https://kb.ndsu.edu/soapdenovo2
Feedback:  0   0