Topics Map > Services > Research Computing and Support > CCAST
Bioinformatics - Meraculous-2D
Instructions on how to run (and, if needed, install a customized version of) Meraculous-2D
Meraculous-2D is a whole genome assembler for NGS reads (Illumina) that is capable of assembling large, diploid genomes with modest computational requirements.
1. Running Meraculous-2D on Thunder
Example: Assemble sequencing reads into contigs/genomes
Location: /mmfs1/thunder/projects/ccastest/training/examples/Meraculous-2D_example
File list:
· meraculous-2d_job.pbs: job submission script
· frags.fastq.25K: sequences in fastq format
· jumps.fastq.25K: sequences in fastq format
· miraculous.config: configuration file of this job
Steps:
· Copy the example directory to your SCRATCH directory
o “cp -r /mmfs1/thunder/projects/ccastest/training/examples/Meraculous-2D_example $SCRATCH”
· Go to the copied directory
o “cd $SCRATCH/Meraculous-2D_example”
· Edit the job submission script as needed, then submit the job
o “qsub miraculous-2d_job.pbs”
2. Install Customized Meraculous-2D on Thunder
Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.
Summary
(a) GCC C++ (system default is adequate, module load not necessary)
(b) Utilizes Pthreads for shared memory parallelization. Does NOT support multi-node calculations. (a) Meraculous can execute distributed/parallel jobs on either a single multi-core system or on a cluster.
(e.g. cluster parameters include "use_cluster", "cluster_num_nodes", "cluster_slots_per_task" etc.;
Supported cluster software: SGE/UGE, SLURM. However, it does not support PBS)
(b) libgd 2.0+ (system default. Can be checked by "ldconfig -p | grep libgd");
(c) cmake 2.8+ (system default is 2.8.12.2);
(d) GCC g++ 4.7+ & 7.1- (available via ‘module load gcc/7.1.0’);
(e) GNU make 3.81 (system default 3.82);
(f) Boost 1.57.0+ (available via “module load boost/1.66.0-gcc”);
(g) Perl 5.10+ (install described below);
(h) Log4perl 1.31 (install described below)
(i) gnuplot 4.2 (system default 4.6);
(j) qqacct (optional but recommended for Grid Engine cluster environments. Not installed due to poor documentation).
Details
In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME” is your username on Thunder.
(a) Install Perl locally (It's inconvenient to install modules and manage the path using default Perl without root.)
· Go to the SOFTWARE directory:
o "cd /mmfs1/home/USERNAME/SOFTWARE"
· Download Perl, unzip and go into the uncompressed directory:
o "wget https://www.cpan.org/src/5.0/perl-5.30.0.tar.gz"
o "tar xzvf perl-5.30.0.tar.gz"
o "cd perl-5.30.0"
· Load gcc and cmake in case the default is outdated:
o "module load gcc/7.3.0-gcc"
o "module load cmake/3.10.2-gcc"
· Install Perl locally with threads support:
o "./Configure -des -Dprefix=/mmfs1/home/USERNAME/SOFTWARE/perl -Dusethreads"
o "make"
o "make test"
o "make install"
· Add Perl bin directory to $PATH for remainder of install:
o export Path=$PATH:/mmfs1/home/USERNAME/SOFTWARE/perl/bin
(b) Install Perl Log4perl module
(Version 1.44 version on the Meraculous manual failed the "make test"; Version 1.49 can pass the test.)
· Go to the SOFTWARE directory:
o "cd /mmfs1/home/USERNAME/SOFTWARE"
· Download, unzip, and go into it:
o "wget https://cpan.metacpan.org/authors/id/M/MS/MSCHILLI/Log-Log4perl-1.49.tar.gz"
o "tar xzvf Log-Log4perl-1.49.tar.gz"
o "cd Log-Log4perl-1.49"
· Install Log4perl:
o "perl Makefile.PL"
o "make"
o "make test"
o "make install"
(c) Install Meraculous-2D
- Go to the SOFTWARE directory:
"cd /mmfs1/home/USERNAME/SOFTWARE"
· Download, unzip Meraculous-2D and go to the uncompressed directory:
o "wget https://sourceforge.net/home/meraculous20/files/Meraculous-v2.2.6.tar.gz"
o "tar xvzf Meraculous-v2.2.6.tar.gz"
o "cd Meraculous-v2.2.6"
· Set desired install directory for Meraculous2D & build
o "sh install.sh /mmfs1/home/USERNAME/SOFTWARE/Meraculous-2D_install_here"
(d) Test
Use the provided test data.
· Copy data from test directory:
o "cd /mmfs1/thunder/scratch/USERNAME”
o “mkdir Meraculous-2D_test”
o “cd Meraculous-2D_test”
o "cp -r /gpfs1/home/USERNAME/SOFTWARE/Meraculous-2D_install_here/etc/meraculous/pipeline ."
· Modify the "local_num_procs" to the requested ncpus in the "meraculous.config" file.
------------------------------------------- file meraculous.config -------------------------------------------
...
use_cluster 0
local_num_procs 4
...
· Write and submit the job
o "qsub meraculous-2d_job.pbs"
------------------------------------------- file meraculous-2d_job.pbs -------------------------------------------
#!/bin/bash
#PBS -q default
#PBS -N test
##cluster run not support PBS (i.e., always set select=1)
##change select, mem, ncpus, and walltime as needed:
#PBS -l select=1:mem=10gb:ncpus=4
#PBS -l walltime=02:00:00
##change "x-ccast-prj" to "x-ccast-prj-[your project group name here]"
#PBS -W group_list=x-ccast-prj
cd $PBS_O_WORKDIR
# Add path of Perl binaries to $PATH
export PATH=$PATH:/mmfs1/home/USERNAME/SOFTWARE/perl/bin
# Set path of Meraculous2d root directory
export MERACULOUS_ROOT=/mmfs1/home/USERNAME/SOFTWARE/Meraculous-2D_install_here
##modify the "local_num_procs" to your requested ncpus in the "meraculous.config" file.
$MERACULOUS_ROOT/bin/run_meraculous.sh -c meraculous.config
exit 0