Topics Map > Services > Research Computing and Support > CCAST

Bioinformatics - Trinotate

Instructions on how to run (and, if needed, install a customized version of) Trinotate

Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms.

  1. Running Trinotate on Thunder
  2. Install customized Trinotate on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running Trinotate on Thunder


Example: Run Trinotate pipeline analysis of transcriptomic sequences


Location: /gpfs1/projects/ccastest/training/examples/Trinotate_example


File list

·  trinotate_job.pbs: job submission script   

· data (directory): files involved in the pipeline


Steps

· Copy example directory to your SCRATCH directory

o    cp -r /gpfs1/projects/ccastest/training/examples/Trinotate_example $SCRATCH

· Go to the copied directory

o    cd  $SCRATCH/Trinotate_example

· Edit the job submission script as needed, then submit the job

o    qsub trinotate_job.pbs


2. Install Customized Trinotate on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.

Summary

(a) Trinity (available via “module load Trinity/2.8.4”)

(b) NCBI BLAST+ (available via “module loadBLAST+/2.8.1)

(c) HMMER (available via “module load HMMER/3.1b2-gcc”)

(d) TransDecoder (installation described below)

(e) SQLite (installation described below)

(f) Perl-DBI (installation described below)

(g) signalP v4 (optional - installation described below)

(h) tmhmm v2 (optional - installation described below)

(i)  RNAMMER (optional - installation described below).

(j)  No multiple threads options found in Trinotate itself. But they can be found in other tools in the pipeline: TransDecoder.Predict (--cpu);blastx/blastp (-num_threads); hmmscan (--cpu).

Details

In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.


(a) Install TransDecoder

·       Go to the SOFTWARE directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE

·       Download TransDecoder and unzip: (Perl codes, no need to build.)

o    "wget https://github.com/TransDecoder/TransDecoder/archive/TransDecoder-v5.5.0.tar.gz"

o    "tar -xzvf TransDecoder-v5.5.0.tar.gz"

(b) Test TransDecoder

There are several test examples in "TransDecoder-TransDecoder-v5.5.0/sample_data" which can be largely tested via the provided bash scripts invoked in the make command. The tests also called BLAST+ and HMMER commands. So, they are needed to load.

·       Go to the top directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE/TransDecoder-TransDecoder-v5.5.0

·       Write and Submit the test job:

o    "qsub transdecoder_job.pbs"

------------------------------------------transdecoder_job.pbs------------------------------------------

#!/bin/bash

#PBS -q default

#PBS -N TransDecoder_test

#PBS -l select=1:mem=5gb:ncpus=2

#PBS -l walltime=1:00:00

## Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”

#PBS -W group_list=x-ccast-prj

cd $PBS_O_WORKDIR

module load BLAST+

module load HMMER

# Add TransDecoder root directory to $PATH

export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/TransDecoder-TransDecoder-v5.5.0

make test

exit 0

(c) Install SQLite

·       Go to the scratch directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE

·       Download SQLite and unzip:

o    "wget https://www.sqlite.org/2019/sqlite-autoconf-3280000.tar.gz"

o    "tar -xzvf sqlite-autoconf-3280000.tar.gz"

·       Configure and specify install location; Build from source; and Install:

o    "./configure --prefix=/gpfs1/home/USERNAME/SOFTWARE/sqlite_install_here"

o    "make"

o    make install

(d) Install Perl locally (It's convenient to install modules without root)

·       Go to the SOFTWARE directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE

·       Download Perl, unzip and go into the uncompressed directory:

o    "wget https://www.cpan.org/src/5.0/perl-5.30.0.tar.gz"

o    "tar xzvf perl-5.30.0.tar.gz"

o    "cd perl-5.30.0"

·       Load gcc and cmake in case the default is outdated: 

o    "module load gcc/7.3.0-gcc"

o    "module load cmake/3.10.2-gcc"

·       Install Perl locally with threads support: 

o    "./Configure -des -Dprefix=/gpfs1/home/USERNAME/SOFTWARE/perl -Dusethreads"

o    "make"

o    "make test"

o    "make install"

·       Add Perl to path for remainder of installation steps: 

o    "export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/perl/bin

(e) Install Perl DBI (database interfaces) module

·       Download perl-DBI module, unzip, and go into it: 

o    "wget https://cpan.metacpan.org/authors/id/T/TI/TIMB/DBI-1.642.tar.gz"

o    "tar xzvf DBI-1.642.tar.gz"

o    "cd DBI-1.642"

·       Install DBI module:

o    "perl Makefile.PL"

o    "make"

o    "make test"

o    "make install"

(e) Install Perl DBD::SQLite (SQLite drivers for DBI) module

·       Download SQLite driver for DBI, unzip, and go into it: 

o    "wget https://cpan.metacpan.org/authors/id/I/IS/ISHIGAKI/DBD-SQLite-1.62.tar.gz"

o    "tar xzvf DBD-SQLite-1.62.tar.gz"

o    "cd DBD-SQLite-1.62"

·       Install DBD-SQLite:

o    "perl Makefile.PL"

o    "make"

o    "make test"

o    make install

(f) Install Perl CGI (needed for TrinotateWeb) module

This is another way to install Perl Modules that can automatically install the dependencies.

·       Open the CPAN Shell: 

o    "perl -MCPAN -e shell"

·       Install along with the dependencies: 

o    "install CGI"

·       Quit CPAN Shell: 

o    "q"

(g) Install Trinotate

·       Download Trinotate, unzip, and go into it: (Perl codes, no need to build.)

o    "wget https://github.com/Trinotate/Trinotate/archive/Trinotate-v3.1.1.tar.gz"

o    "tar xzvf Trinotate-v3.1.1.tar.gz"

o    "cd Trinotate-Trinotate-v3.1.1"

·       Set an environmental variable to save the location of your version of executables for the remainder of install and test("$TRINOTATE_HOME/name of the command" can be used to invoke the command. It can avoid conflicting with the other versions of the same command.) 
export TRINOTATE_HOME=/gpfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1

·       Download several data resources including the latest version of swissprot, pfam, and other companion resources, create and populate a Trinotate boilerplate sqlite database (Trinotate.sqlite), and yield uniprot_sprot.pep file to be used with BLAST, and the Pfam-A.hmm.gz file to be used for Pfam searches:

o    "$TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl  Trinotate"

·       Load Blast+ and HMMER:

o    "module load BLAST+"

o    "module load HMMER"

·       Prepare the protein database for BLAST searches:

o    "makeblastdb -in uniprot_sprot.pep -dbtype prot"

·       Unzip and prepare the Pfam database for use with 'hmmscan':

o    "gunzip Pfam-A.hmm.gz"

o    "hmmpress Pfam-A.hmm"

(h) Test Trinotate (built-in test script)

There is a built-in test in "/gpfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/sample_data". It focuses on the test of Trinotate and skips some steps in the pipeline by providing the intermediate data directly. One can see runMe.sh for the command details. 

·       Go to the scratch directory directory: 

o    "cd /gpfs1/scratch/USERNAME

·       Copy sample data to your scratch directory

o    "cp -r /gpfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/sample_data .

·       Write and submit the job:

o    cd sample_data

o    "qsub trinotate_job.pbs"

------------------------------------------------- trinotate_job.pbs---------------------------------------------------------

#!/bin/bash

#PBS -q default

#PBS -N Trinotate_Test

#PBS -l select=1:mem=5gb:ncpus=4

#PBS -l walltime=02:00:00

#PBS -W group_list=x-ccast-prj-[your project group name here]

cd $PBS_O_WORKDIR

 

module load Trinity/2.8.4

module load BLAST+/2.8.1

module load HMMER/3.1b2-gcc

 

# Add dependency programs to $PATH

export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/perl/bin

export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/sqlite_install_here/bin

export TRINOTATE_HOME=/gpfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1

 

./runMe.sh

exit 0

(i) Test Trinotate automatic pipeline script (For Advanced Users) 

Test the automatic pipeline tool called "autoTrinotate.pl" in "/gpfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/auto".

·       To automate the pipeline, you need to have local signalp-4.1, tmhmm-2.0 and rnammer-1.2: 

·       Download binaries:

o    Request the signalp-4.1 at "http://www.cbs.dtu.dk/cgi-bin/sw_request?signalp+4.1".

o    Request the tmhmm-2.0 at "http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?tmhmm".

o    Request the rnammer-1.2 at "http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer".

·       Download them to the home directory "/gpfs1/home/USERNAME/SOFTWARE".

·       Unzip the downloaded files:

o    "cd /gpfs1/home/USERNAME/SOFTWARE"

o    "tar xzvf signalp-4.1g.Linux.tar.gz"

o    "tar xzvf tmhmm-2.0c.Linux.tar.gz"

o    "mkdir rnammer-1.2 && tar xzvf rnammer-1.2.src.tar.Z -C rnammer-1.2"

·       Modify signalp-4.1:

·       Modify file "signalp" in "/gpfs1/home/USERNAME/SOFTWARE/signalp-4.1" with your version of Perl in the first line and the path of signal, and storage of temporary files:

---------partial file "/gpfs1/home/USERNAME/SOFTWARE/signalp-4.1/signalp"------------

#!/gpfs1/home/USERNAME/SOFTWARE/perl/bin/perl

...

###############################################################################

#               GENERAL SETTINGS: CUSTOMIZE TO YOUR SITE

###############################################################################

 

# full path to the signalp-4.1 directory on your system (mandatory)

BEGIN {

    $ENV{SIGNALP} = '/gpfs1/home/USERNAME/SOFTWARE/signalp-4.1/';

}

 

# determine where to store temporary files (must be writable to all users)

my $outputDir = "/gpfs1/home/USERNAME/SOFTWARE/tmp";

 

# max number of sequences per run (any number can be handled)

my $MAX_ALLOWED_ENTRIES=2000000;

 

###############################################################################

...

·       Modify tmhmm-2.0c:

o    Modify file "tmhmm" in "/gpfs1/home/USERNAME/SOFTWARE/tmhmm-2.0c/bin" with your version of Perl in the first line :

---------partial file "/gpfs1/home/USERNAME/SOFTWARE/tmhmm-2.0c/bin/tmhmm"------------

#!/gpfs1/home/USERNAME/SOFTWARE/perl/bin/perl

 

# This is version 2.0c of tmhmm

...

o    Modify file "tmhmmformat.pl" in "/gpfs1/home/USERNAME/SOFTWARE/tmhmm-2.0c/bin" with your version of Perl in the first line:

---------partial file "/gpfs1/home/USERNAME/SOFTWARE/tmhmm-2.0c/bin/tmhmmformat.pl"------------

#!/gpfs1/home/USERNAME/SOFTWARE/perl/bin/perl -w

 

# This is version 2.0c of tmhmmformat.pl

...

·       Modify rnammer-1.2:

o    Modify the files in "/gpfs1/home/USERNAME/SOFTWARE/rnammer-1.2" with your version of Perl in the first line: (use sed this time)

o    "cd /gpfs1/home/USERNAME/SOFTWARE/rnammer-1.2"

o    "sed -i 's_#!/usr/bin/perl_#!/gpfs1/home/USERNAME/SOFTWARE/perl/bin/perl_g' core-rnammer rnammer xml2fsa xml2gff"

·       Rnammer requires the older version of hmmsearch (v2).

·       Download and unzip:

o    "wget http://eddylab.org/software/hmmer/hmmer-2.3.2.tar.gz"

o    "tar xzvf hmmer-2.3.2.tar.gz"

·       Install:

o    "cd hmmer-2.3.2"

o    "./configure --prefix=/gpfs1/home/USERNAME/SOFTWARE/hmmer2"

o    make

o    make install

·       Rename hmmsearch as hmmsearch2:

o    "cd /gpfs1/home/USERNAME/SOFTWARE/hmmer2/bin"

o    "mv hmmsearch hmmsearch2"

·       Modify the rnammer file in "/gpfs1/home/USERNAME/SOFTWARE/rnammer-1.2" with your path of Rnammer, Hmmsearch and Perl:

---------partial file "/gpfs1/home/USERNAME/SOFTWARE/rnammer-1.2/rnammer"------------

...

 

# the path of the program

my $INSTALL_PATH = "/gpfs1/home/USERNAME/SOFTWARE/rnammer-1.2";

 

...

 

if ( $uname eq "Linux" ) {

        $HMMSEARCH_BINARY = "/gpfs1/home/USERNAME/SOFTWARE/hmmer2/bin/hmmsearch2";

        $PERL = "/gpfs1/home/USERNAME/SOFTWARE/perl/bin/perl";

} elsif ( $uname eq "IRIX64" ) {

        $HMMSEARCH_BINARY = "/gpfs1/home/USERNAME/SOFTWARE/hmmer2/bin/hmmsearch2";

        $PERL = "/gpfs1/home/USERNAME/SOFTWARE/perl/bin/perl";

} else {

        die "unknown platform\n";

}

 

...

 

·       Modify the core-rnammer file in "/gpfs1/home/USERNAME/SOFTWARE/rnammer-1.2" by removing " --cpu 1":

o    "cd /gpfs1/home/USERNAME/SOFTWARE/rnammer-1.2"

o    "sed -i 's/ --cpu 1//g' core-rnammer"

·       Install Perl XML::Simple and Getopt::Long modules (needed for rnammer):

·       Open the CPAN Shell: 

o    "perl -MCPAN -e shell"

·       Install along with the dependencies: 

o    "install XML::Simple"

o    "install Getopt::Long"

·       Quit CPAN Shell: 

o    "q"

·       Install Perl URI::Escape module (needed for TransDecoder, but not noticed in above TransDecoder test):

·       Open the CPAN Shell: 

o    "perl -MCPAN -e shell"

·       Install along with the dependencies: 

o    "install URI::Escape"

·       Quit CPAN Shell: 

o    "q"

·       Make a test directory for Trinotate test and go into it: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE

o    "mkdir Trinotate_example && cd Trinotate_example"

·       Copy the test data and unzip:  

o    "cp /gpfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/auto/testing/{*.gz,conf.txt} ."

o    "gunzip *.gz"

·       Customize the path for your resources in the file "conf.txt": 

------------------------------------------- partial conf.txt -------------------------------------------

...

[GLOBALS]

 

#  ** edit the progs and dbs section to point to your local resources.

 

# progs

TRANSDECODER_DIR=/gpfs1/home/USERNAME/SOFTWARE/TransDecoder-TransDecoder-v5.5.0

BLASTX_PROG=blastx

BLASTP_PROG=blastp

SIGNALP_PROG=/gpfs1/home/USERNAME/SOFTWARE/signalp-4.1/signalp

TMHMM_PROG=/gpfs1/home/USERNAME/SOFTWARE/tmhmm-2.0c/bin/tmhmm

RNAMMER_TRANS_PROG=/gpfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/util/rnammer_support/RnammerTranscriptome.pl

RNAMMER=/gpfs1/home/USERNAME/SOFTWARE/rnammer-1.2/rnammer

HMMSCAN_PROG=hmmscan

 

# dbs

SWISSPROT_PEP=mini_sprot.pep

PFAM_DB=/gpfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/Pfam-A.hmm

...

 

·       Write and submit the job: 

o    "qsub trinotate_job.pbs"

------------------------------------------- file trinotate_job.pbs -------------------------------------------

#!/bin/bash

#PBS -q default

#PBS -N test

#PBS -l select=1:mem=10gb:ncpus=4

#PBS -l walltime=02:00:00

##change "x-ccast-prj" to "x-ccast-prj-[your project group name here]"

#PBS -W group_list=x-ccast-prj

cd $PBS_O_WORKDIR

 

# Load required modules

module load Trinity/2.8.4

module load BLAST+/2.8.1

module load HMMER/3.1b2-gcc

 

# Add dependency programs to $PATH

export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/perl/bin

export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/sqlite_install_here/bin

export TRINOTATE_HOME=/gpfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1

 

# Index the database for BLAST searches

makeblastdb -in mini_sprot.pep -dbtype prot

 

# Pulling Trinotate Boilerplate sqlite database from ftp site

wget "https://data.broadinstitute.org/Trinity/Trinotate_v3_RESOURCES/Trinotate_v3.sqlite.gz" -O my.sqlite.gz

 

# Unzip

gunzip -c my.sqlite.gz > my.sqlite

 

# Include the FASTA.pm for the signalp-4.1

export PERL5LIB=/gpfs1/home/USERNAME/SOFTWARE/signalp-4.1/lib:$PERL5LIB

 

# Run the automation script

$TRINOTATE_HOME/auto/autoTrinotate.pl --Trinotate_sqlite my.sqlite --transcripts myTrinity.fasta --gene_to_trans_map myTrinity.fasta.gene_to_trans_map --conf conf.txt --CPU $NCPUS

 

exit 0


See Also:




Keywords:ccast, hpc, thunder, bioinformatics, trinotate   Doc ID:108088
Owner:Liu Y.Group:IT Knowledge Base
Created:2020-12-27 17:39 CDTUpdated:2020-12-29 02:07 CDT
Sites:IT Knowledge Base
CleanURL:https://kb.ndsu.edu/trinotate
Feedback:  0   0