Topics Map > Services > Research Computing and Support > CCAST

Bioinformatics - Canu

Instructions on how to run (and, if needed, install a customized version of) Canu

Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).

  1. Running Canu on Thunder
  2. Install customized Canu on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running Canu on Thunder

Example: assemble pacbio sequences into assemblies

Location: /gpfs1/projects/ccastest/training/examples/Canu_example

File list

· canu_job.pbs: job submission script 

· pacbio.fastq: pacbio sequences in fastq format


· Copy example directory to your SCRATCH directory

o    cp -r /gpfs1/projects/ccastest/training/examples/Canu_example $SCRATCH

· Go to the copied directory

o    cd  $SCRATCH/Canu_example

· Edit the job submission script as needed, then submit the job

o    qsub canu_job.pbs

2. Install Customized Canu on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.


(a) Canu specializes in assembling PacBio or Oxford Nanopore sequences.

(b) There are two modes that Canu runs in: locally, using just one machine, or grid-enabled, using multiple hosts managed by a grid engine like PBS Pro. In both cases, Canu will auto-detect available resources and configure job sizes based on the resources and genome size you’re assembling. Thus, most users should be able to run the command without modifying the defaults.

(c) After initial job submitting to PBS, Canu will submit additional PBS jobs by itself, but CCAST job must provide the group name to be accepted. So must set the gridOptions in the command.


In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.

(a) Install

· Go to or create SOFTWARE directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE

· Git clone the Canu: 

o    "git clone

· Go to the Canu source code directory:  

o    "cd canu/src"

· Install: It will be installed to "/gpfs1/home/USERNAME/SOFTWARE/canu/Linux-amd64/bin".

o    "make"

(b) Test

· Make a test directory and go into it: 

o    "cd /gpfs1/scratch/USERNAME

o    "mkdir Canu_example"

o    "cd Canu_example"

· Download test data to current location:  

o    "curl -L -o pacbio.fastq

· Write and submit the job 

o    "qsub canu_job.pbs"

------------------------------------------- file canu_job.pbs -------------------------------------------


#PBS -q default

#PBS -N test

##works for multiple nodes (i.e., select>=1)

##change select, mem, ncpus, and walltime as needed:

#PBS -l select=2:mem=10gb:ncpus=4

#PBS -l walltime=02:00:00

##change "x-ccast-prj" to "x-ccast-prj-[your project group name]"

#PBS -W group_list=x-ccast-prj


#Add path to your installation directory here

export MY_CANU=/gpfs1/home/USERNAME/SOFTWARE/canu/Linux-amd64/bin

$MY_CANU/canu -p ecoli -d ecoli-pacbio useGrid=false genomeSize=4.8m -pacbio-raw pacbio.fastq


exit 0

See Also:

Keywords:ccast, hpc, thunder, bioinformatics, canu   Doc ID:108033
Owner:Liu Y.Group:IT Knowledge Base
Created:2020-12-22 10:30 CSTUpdated:2020-12-29 01:16 CST
Sites:IT Knowledge Base
Feedback:  0   0