Topics Map > Services > Research Computing and Support > CCAST

Bioinformatics - Canu

Instructions on how to run (and, if needed, install a customized version of) Canu

Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).

  1. Running Canu on Thunder
  2. Install customized Canu on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running Canu on Thunder


Example: assemble pacbio sequences into assemblies


Location: /gpfs1/projects/ccastest/training/examples/Canu_example


File list

· canu_job.pbs: job submission script 

· pacbio.fastq: pacbio sequences in fastq format


Steps

· Copy example directory to your SCRATCH directory

o    cp -r /gpfs1/projects/ccastest/training/examples/Canu_example $SCRATCH

· Go to the copied directory

o    cd  $SCRATCH/Canu_example

· Edit the job submission script as needed, then submit the job

o    qsub canu_job.pbs


2. Install Customized Canu on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.

Summary


(a) Canu specializes in assembling PacBio or Oxford Nanopore sequences.


(b) There are two modes that Canu runs in: locally, using just one machine, or grid-enabled, using multiple hosts managed by a grid engine like PBS Pro. In both cases, Canu will auto-detect available resources and configure job sizes based on the resources and genome size you’re assembling. Thus, most users should be able to run the command without modifying the defaults.


(c) After initial job submitting to PBS, Canu will submit additional PBS jobs by itself, but CCAST job must provide the group name to be accepted. So must set the gridOptions in the command.


Details


In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.


(a) Install

· Go to or create SOFTWARE directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE

· Git clone the Canu: 

o    "git clone https://github.com/marbl/canu.git

· Go to the Canu source code directory:  

o    "cd canu/src"

· Install: It will be installed to "/gpfs1/home/USERNAME/SOFTWARE/canu/Linux-amd64/bin".

o    "make"


(b) Test

· Make a test directory and go into it: 

o    "cd /gpfs1/scratch/USERNAME

o    "mkdir Canu_example"

o    "cd Canu_example"

· Download test data to current location:  

o    "curl -L -o pacbio.fastq http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p6_25x.filtered.fastq

· Write and submit the job 

o    "qsub canu_job.pbs"


------------------------------------------- file canu_job.pbs -------------------------------------------

#!/bin/bash

#PBS -q default

#PBS -N test

##works for multiple nodes (i.e., select>=1)

##change select, mem, ncpus, and walltime as needed:

#PBS -l select=2:mem=10gb:ncpus=4

#PBS -l walltime=02:00:00

##change "x-ccast-prj" to "x-ccast-prj-[your project group name]"

#PBS -W group_list=x-ccast-prj

cd $PBS_O_WORKDIR

#Add path to your installation directory here

export MY_CANU=/gpfs1/home/USERNAME/SOFTWARE/canu/Linux-amd64/bin

$MY_CANU/canu -p ecoli -d ecoli-pacbio useGrid=false genomeSize=4.8m -pacbio-raw pacbio.fastq

 

exit 0


See Also:




Keywords:ccast, hpc, thunder, bioinformatics, canu   Doc ID:108033
Owner:Liu Y.Group:IT Knowledge Base
Created:2020-12-22 11:30 CDTUpdated:2020-12-29 02:16 CDT
Sites:IT Knowledge Base
CleanURL:https://kb.ndsu.edu/canu
Feedback:  0   0