Topics Map > Services > Research Computing and Support > CCAST
Bioinformatics - Canu
Instructions on how to run (and, if needed, install a customized version of) Canu
Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).
1. Running Canu on Thunder
Example: assemble pacbio sequences into assemblies
· canu_job.pbs: job submission script
· pacbio.fastq: pacbio sequences in fastq format
· Copy example directory to your SCRATCH directory
o “cp -r /gpfs1/projects/ccastest/training/examples/Canu_example $SCRATCH”
· Go to the copied directory
o “cd $SCRATCH/Canu_example”
· Edit the job submission script as needed, then submit the job
o “qsub canu_job.pbs”
2. Install Customized Canu on Thunder
Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.
(a) Canu specializes in assembling PacBio or Oxford Nanopore sequences.
(b) There are two modes that Canu runs in: locally, using just one machine, or grid-enabled, using multiple hosts managed by a grid engine like PBS Pro. In both cases, Canu will auto-detect available resources and configure job sizes based on the resources and genome size you’re assembling. Thus, most users should be able to run the command without modifying the defaults.
(c) After initial job submitting to PBS, Canu will submit additional PBS jobs by itself, but CCAST job must provide the group name to be accepted. So must set the gridOptions in the command.
In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME” is your username on Thunder.
· Go to or create SOFTWARE directory:
o "cd /gpfs1/home/USERNAME/SOFTWARE"
· Git clone the Canu:
o "git clone https://github.com/marbl/canu.git"
· Go to the Canu source code directory:
o "cd canu/src"
· Install: It will be installed to "/gpfs1/home/USERNAME/SOFTWARE/canu/Linux-amd64/bin".
· Make a test directory and go into it:
o "cd /gpfs1/scratch/USERNAME"
o "mkdir Canu_example"
o "cd Canu_example"
· Download test data to current location:
o "curl -L -o pacbio.fastq http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p6_25x.filtered.fastq"
· Write and submit the job
o "qsub canu_job.pbs"
------------------------------------------- file canu_job.pbs -------------------------------------------
#PBS -q default
#PBS -N test
##works for multiple nodes (i.e., select>=1)
##change select, mem, ncpus, and walltime as needed:
#PBS -l select=2:mem=10gb:ncpus=4
#PBS -l walltime=02:00:00
##change "x-ccast-prj" to "x-ccast-prj-[your project group name]"
#PBS -W group_list=x-ccast-prj
#Add path to your installation directory here
$MY_CANU/canu -p ecoli -d ecoli-pacbio useGrid=false genomeSize=4.8m -pacbio-raw pacbio.fastq