Topics Map > Services > Research Computing and Support > CCAST

Bioinformatics - Meraculous-2D

Instructions on how to run (and, if needed, install a customized version of) Meraculous-2D

Meraculous-2D is a whole genome assembler for NGS reads (Illumina) that is capable of assembling large, diploid genomes with modest computational requirements.

  1. Running Meraculous-2D on Thunder
  2. Install customized Meraculous-2D on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running Meraculous-2D on Thunder


Example: Assemble sequencing reads into contigs/genomes


Location: /gpfs1/projects/ccastest/training/examples/Meraculous-2D_example


File list:

·  meraculous-2d_job.pbs: job submission script  

·  frags.fastq.25K: sequences in fastq format

·  jumps.fastq.25K: sequences in fastq format

·  miraculous.config: configuration file of this job


Steps:

·  Copy the example directory to your SCRATCH directory

o   cp -r /gpfs1/projects/ccastest/training/examples/Meraculous-2D_example $SCRATCH

·  Go to the copied directory

o   cd  $SCRATCH/Meraculous-2D_example

·  Edit the job submission script as needed, then submit the job

o    qsub miraculous-2d_job.pbs


2. Install Customized Meraculous-2D on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.


Summary

(a)    GCC C++ (system default is adequate, module load not necessary)

(b)    Utilizes Pthreads for shared memory parallelization. Does NOT support multi-node calculations. (a)    Meraculous can execute distributed/parallel jobs on either a single multi-core system or on a cluster. 

      (e.g. cluster parameters include "use_cluster", "cluster_num_nodes", "cluster_slots_per_task" etc.; 

              Supported cluster software: SGE/UGE, SLURM. However, it does not support PBS)

(b)    libgd 2.0+ (system default. Can be checked by "ldconfig -p | grep libgd"); 

(c)    cmake 2.8+ (system default is 2.8.12.2); 

(d)    GCC g++ 4.7+ & 7.1- (available via ‘module load gcc/7.1.0’);

(e)    GNU make 3.81 (system default 3.82); 

(f)     Boost 1.57.0+ (available via “module load boost/1.66.0-gcc”);

(g)    Perl 5.10+ (install described below);

(h)    Log4perl 1.31 (install described below)

(i)     gnuplot 4.2 (system default 4.6);

(j)     qqacct (optional but recommended for Grid Engine cluster environments. Not installed due to poor documentation).


Details


In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.


(a) Install Perl locally (It's inconvenient to install modules and manage the path using default Perl without root.)

· Go to the SOFTWARE directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE

· Download Perl, unzip and go into the uncompressed directory:

o    "wget https://www.cpan.org/src/5.0/perl-5.30.0.tar.gz"

o    "tar xzvf perl-5.30.0.tar.gz"

o    "cd perl-5.30.0"

· Load gcc and cmake in case the default is outdated: 

o    "module load gcc/7.3.0-gcc"

o    "module load cmake/3.10.2-gcc"

· Install Perl locally with threads support: 

o    "./Configure -des -Dprefix=/gpfs1/home/USERNAME/SOFTWARE/perl -Dusethreads"

o    "make"

o    "make test"

o    "make install"

· Add Perl bin directory to $PATH for remainder of install:

o    export Path=$PATH:/gpfs1/home/USERNAME/SOFTWARE/perl/bin


(b) Install Perl Log4perl module 

(Version 1.44 version on the Meraculous manual failed the "make test"; Version 1.49 can pass the test.)

· Go to the SOFTWARE directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE

· Download, unzip, and go into it: 

o    "wget https://cpan.metacpan.org/authors/id/M/MS/MSCHILLI/Log-Log4perl-1.49.tar.gz"

o    "tar xzvf Log-Log4perl-1.49.tar.gz"

o    "cd Log-Log4perl-1.49"

· Install Log4perl:

o    "perl Makefile.PL"

o    "make"

o    "make test"

o    "make install"


(c) Install Meraculous-2D

  • Go to the SOFTWARE directory: 

"cd /gpfs1/home/USERNAME/SOFTWARE

· Download, unzip Meraculous-2D and go to the uncompressed directory: 

o "wget https://sourceforge.net/home/meraculous20/files/Meraculous-v2.2.6.tar.gz

o "tar xvzf Meraculous-v2.2.6.tar.gz"

o "cd Meraculous-v2.2.6"

· Set desired install directory for Meraculous2D & build

o    "sh install.sh /gpfs1/home/USERNAME/SOFTWARE/Meraculous-2D_install_here"


(d) Test

Use the provided test data.

· Copy data from test directory: 

o    "cd /gpfs1/scratch/USERNAME

o    “mkdir Meraculous-2D_test

o    “cd Meraculous-2D_test

o    "cp -r /gpfs1/home/USERNAME/SOFTWARE/Meraculous-2D_install_here/etc/meraculous/pipeline .

· Modify the "local_num_procs" to the requested ncpus in the "meraculous.config" file. 

 

------------------------------------------- file meraculous.config -------------------------------------------

...

 

use_cluster 0

 

local_num_procs 4

 

...

 

· Write and submit the job 

o    "qsub meraculous-2d_job.pbs"

------------------------------------------- file meraculous-2d_job.pbs -------------------------------------------

#!/bin/bash

#PBS -q default

#PBS -N test

##cluster run not support PBS (i.e., always set select=1)

##change select, mem, ncpus, and walltime as needed:

#PBS -l select=1:mem=10gb:ncpus=4

#PBS -l walltime=02:00:00

##change "x-ccast-prj" to "x-ccast-prj-[your project group name here]"

#PBS -W group_list=x-ccast-prj

 

cd $PBS_O_WORKDIR

 

# Add path of Perl binaries to $PATH

export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/perl/bin

 

# Set path of Meraculous2d root directory

export MERACULOUS_ROOT=/gpfs1/home/USERNAME/SOFTWARE/Meraculous-2D_install_here

 

##modify the "local_num_procs" to your requested ncpus in the "meraculous.config" file.

$MERACULOUS_ROOT/bin/run_meraculous.sh -c meraculous.config

 

exit 0

 

See Also:




Keywords:ccast, hpc, thunder, bioinformatics, meraculous-2d, meraculous2d   Doc ID:108076
Owner:Liu Y.Group:IT Knowledge Base
Created:2020-12-23 20:16 CSTUpdated:2020-12-29 01:13 CST
Sites:IT Knowledge Base
CleanURL:https://kb.ndsu.edu/meraculous-2d
Feedback:  0   0