Instructions on how to run (and, if needed, install a customized version of) MEGAHIT
MEGAHIT is an ultra-fast and memory efficient NGS assembler. It is optimized for metagenomes, but also works well on generic single genome assembly (small or mammalian size) and single-cell assembly.
File list:
· megahit_job.pbs: job submission script
· r3_1.fa: a set of sequences in fasta format
· r3_2.fa: a set of sequences in fasta format
Steps:
· Copy the example directory to your SCRATCH directory
o “cp -r /gpfs1/projects/ccastest/training/examples/MEGAHIT_example $SCRATCH”
· Go to the copied directory
o “cd $SCRATCH/MEGAHIT_example”
· Edit the job submission script as needed, then submit the job
o “qsub megahit_job.pbs”
Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.
Summary
(a) For building: zlib (Installed. Can be checked by "ldconfig -p | grep libz"), cmake >= 2.8 (CCAST 2.8.12.2), g++ >= 4.8.4 (CCAST 4.8.5).
(b) For running: gzip (Installed) and bzip2 (Installed).
(c) For self-testing: Python 3 (module load).
(d) "-t" option for number of threads.
Details
In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME” is your username on Thunder.
(a) Install
· Go to your software directory:
o "cd /gpfs1/home/USERNAME/SOFTWARE"
· Git clone the MEGAHIT:
o "git clone https://github.com/voutcn/megahit.git"
· Go to the MEGAHIT directory and update the submodule:
o "cd megahit"
o "git submodule update --init"
· Create a build directory and go into it:
o "mkdir build && cd build"
· Build and Self-test: (Self-test needs Python3)
o "cmake .. -DCMAKE_BUILD_TYPE=Release"
o "make -j4"
· Make test
o "module load python/3.4.3-gcc"
o "make simple_test"
(b) Test
· Make a test directory and go into it:
o "cd /gpfs1/scratch/USERNAME "
o "mkdir Megahit_example"
o "cd Megahit_example"
· Copy two pair-end sequences from the given test data to current location:
o "cp /gpfs1/home/USERNAME/SOFTWARE/megahit/test_data/r3* ."
· Write and submit the job
o "qsub megahit_job.pbs"
------------------------------------------- file megahit_job.pbs -------------------------------------------
#!/bin/bash
#PBS -q default
#PBS -N test
##does not work for multiple nodes (i.e., select=1)
##change mem, ncpus, and walltime as needed:
#PBS -l select=1:mem=10gb:ncpus=4
#PBS -l walltime=02:00:00
## Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”
#PBS -W group_list=x-ccast-prj
cd $PBS_O_WORKDIR
# Set path to MEGAHIT binaries
export MY_MEGAHIT=/gpfs1/home/USERNAME/SOFTWARE/megahit/build
$MY_MEGAHIT/megahit -1 r3_1.fa -2 r3_2.fa -t $NCPUS -o OUTPUT_DIR
exit 0