Instructions on how to run (and, if needed, install a customized version of) ABySS
ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor (i.e., serial) version is useful for assembling genomes up to 100 Mbases. The parallel version is implemented using OpenMP and MPI and capable of assembling larger genomes.
Example: Assemble paired-end sequences into contigs.
Location: /gpfs1/projects/ccastest/training/examples/ABySS_example
File list:
· abyss_job.pbs: job submission script
· frag_1.fastq: paired-end sequence file 1
· frag_2.fastq: paired-end sequence file 2
Steps:
· Copy the example directory to your SCRATCH directory
o “cp -r /gpfs1/projects/ccastest/training/examples/ABySS_example $SCRATCH”
· Go to the copied directory
o “cd $SCRATCH/ABySS_example”
· Edit the job submission script as needed, then submit the job
o “qsub abyss_job.pbs”
Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.
Summary
Details
In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME” is your username on Thunder.
(a) Install Google sparsehash
· Go to where you want to install Google sparsehash:
o "cd /gpfs1/home/USERNAME/SOFTWARE"
· Download Google sparsehash using git:
o "git clone https://github.com/sparsehash/sparsehash.git"
· Go to the cloned directory having the configure file:
o "cd google-sparsehash"
· Configure and specify where Google sparsehash will be installed:
o "./configure --prefix=/gpfs1/home/USERNAME/SOFTWARE/google_sparsehash_install_here"
· Build:
o "make"
· Install:
o "make install"
(b) Install ABySS
· Go to where you want to install ABySS:
o "cd /gpfs1/home/USERNAME/SOFTWARE"
· Download ABySS from the author's website:
o "wget http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases/2.1.5/abyss-2.1.5.tar.gz"
· Unzip the downloaded file:
o "tar -zxvf abyss-2.1.5.tar.gz"
· Go to ABySS source directory:
o "cd /gpfs1/home/USERNAME/SOFTWARE/abyss-2.1.5"
· Load the OpenMPI:
o “module load openmpi”
· Configure, specify the installation location and Google sparsehash path:
o "./configure --prefix=/gpfs1/home/USERNAME/SOFTWARE/abyss_install_here --with-sparsehash=/gpfs1/home/USERNAME/SOFTWARE/google_sparsehash_install_here"
· Build:
o "make"
· Install:
o "make install"
(c) Test abyss-pe
· Go to your /scratch directory and make a test directory and go into it:
o "cd /gpfs1/scratch/USERNAME"
o "mkdir ABySS_test"
o "cd ABySS_test"
· Download and unzip data sample:
o "wget http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases/1.3.4/test-data.tar.gz"
o "tar xzvf test-data.tar.gz"
· Write a PBS job script and submit the job
o PBS script:
----------------------------------- abyss_job.pbs ---------------------------------
#!/bin/bash
#PBS -q default
#PBS -N ABySS_test
#PBS -l select=1:mem=10gb:ncpus=4
#PBS -l walltime=02:00:00
##Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”
#PBS -W group_list=x-ccast-prj
cd $PBS_O_WORKDIR
#Add the executable to the path
export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/abyss_install_here/bin
module load openmpi
abyss-pe np=$NCPUS k=25 name=test in='test-data/reads1.fastq test-data/reads2.fastq'
exit 0
· Submit the job:
o "qsub abyss_job.pbs"