Bioinformatics - ABySS

Instructions on how to run (and, if needed, install a customized version of) ABySS

ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor (i.e., serial) version is useful for assembling genomes up to 100 Mbases. The parallel version is implemented using OpenMP and MPI and capable of assembling larger genomes.   

  1. Running ABySS on Thunder
  2. Install customized ABySS on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running ABySS on Thunder

Example: Assemble paired-end sequences into contigs.

Location: /gpfs1/projects/ccastest/training/examples/ABySS_example

File list:

· abyss_job.pbs: job submission script

· frag_1.fastq: paired-end sequence file 1

· frag_2.fastq: paired-end sequence file 2


· Copy the example directory to your SCRATCH directory

cp -r /gpfs1/projects/ccastest/training/examples/ABySS_example $SCRATCH

· Go to the copied directory

cd  $SCRATCH/ABySS_example

· Edit the job submission script as needed, then submit the job

o qsub abyss_job.pbs

2. Install Customized ABySS on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.


  • GCC 4.2 or greater. (system GCC is 4.8.5, “module load” not necessary)
  • Require the Boost C++ libraries. (available via ‘module load boost’)
  • Google sparsehash is recommended for building.
  • MPI (Message Passing Interface) is needed for building the MPI-enabled version of the ABySS (available via “module load openmpi” or “module load mpich)


In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.

(a) Install Google sparsehash

· Go to where you want to install Google sparsehash: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE

· Download Google sparsehash using git:

o    "git clone

· Go to the cloned directory having the configure file: 

o    "cd google-sparsehash

· Configure and specify where Google sparsehash will be installed: 

o    "./configure --prefix=/gpfs1/home/USERNAME/SOFTWARE/google_sparsehash_install_here

· Build: 

o    "make"

· Install: 

o    "make install"

(b) Install ABySS

· Go to where you want to install ABySS: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE"

· Download ABySS from the author's website: 

o    "wget

· Unzip the downloaded file: 

o    "tar -zxvf abyss-2.1.5.tar.gz" 

· Go to ABySS source directory:  

o    "cd /gpfs1/home/USERNAME/SOFTWARE/abyss-2.1.5

· Load the OpenMPI:

o    module load openmpi

· Configure, specify the installation location and Google sparsehash path:   

o    "./configure --prefix=/gpfs1/home/USERNAME/SOFTWARE/abyss_install_here --with-sparsehash=/gpfs1/home/USERNAME/SOFTWARE/google_sparsehash_install_here"

· Build:

o    "make"

· Install:

o    "make install"

(c) Test abyss-pe

· Go to your /scratch directory and make a test directory and go into it:

o    "cd /gpfs1/scratch/USERNAME" 

o    "mkdir ABySS_test"

o    "cd ABySS_test"

· Download and unzip data sample:

o    "wget"

o    "tar xzvf test-data.tar.gz"

· Write a PBS job script and submit the job 

o    PBS script:

----------------------------------- abyss_job.pbs ---------------------------------


#PBS -q default

#PBS -N ABySS_test

#PBS -l select=1:mem=10gb:ncpus=4

#PBS -l walltime=02:00:00

##Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”

#PBS -W group_list=x-ccast-prj


#Add the executable to the path

export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/abyss_install_here/bin

module load openmpi

abyss-pe np=$NCPUS k=25 name=test in='test-data/reads1.fastq test-data/reads2.fastq'

exit 0

· Submit the job:

o    "qsub abyss_job.pbs

See Also: