Topics Map > Services > Research Computing and Support > CCAST
Bioinformatics - Trinotate
Instructions on how to run (and, if needed, install a customized version of) Trinotate
Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms.
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.1. Running Trinotate on Thunder
Example: Run Trinotate pipeline analysis of transcriptomic sequences
Location: /mmfs1/thunder/projects/ccastest/training/examples/Trinotate_example
File list
· trinotate_job.pbs: job submission script
· data (directory): files involved in the pipeline
Steps
· Copy example directory to your SCRATCH directory
o “cp -r /mmfs1/thunder/projects/ccastest/training/examples/Trinotate_example $SCRATCH”
· Go to the copied directory
o “cd $SCRATCH/Trinotate_example”
· Edit the job submission script as needed, then submit the job
o “qsub trinotate_job.pbs”
2. Install Customized Trinotate on Thunder
Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.
Summary
(a) Trinity (available via “module load Trinity/2.8.4”)
(b) NCBI BLAST+ (available via “module loadBLAST+/2.8.1)
(c) HMMER (available via “module load HMMER/3.1b2-gcc”)
(d) TransDecoder (installation described below)
(e) SQLite (installation described below)
(f) Perl-DBI (installation described below)
(g) signalP v4 (optional - installation described below)
(h) tmhmm v2 (optional - installation described below)
(i) RNAMMER (optional - installation described below).
(j) No multiple threads options found in Trinotate itself. But they can be found in other tools in the pipeline: TransDecoder.Predict (--cpu);blastx/blastp (-num_threads); hmmscan (--cpu).
Details
In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME” is your username on Thunder.
(a) Install TransDecoder
· Go to the SOFTWARE directory:
o "cd /mmfs1/home/USERNAME/SOFTWARE"
· Download TransDecoder and unzip: (Perl codes, no need to build.)
o "wget https://github.com/TransDecoder/TransDecoder/archive/TransDecoder-v5.5.0.tar.gz"
o "tar -xzvf TransDecoder-v5.5.0.tar.gz"
(b) Test TransDecoder
There are several test examples in "TransDecoder-TransDecoder-v5.5.0/sample_data" which can be largely tested via the provided bash scripts invoked in the make command. The tests also called BLAST+ and HMMER commands. So, they are needed to load.
· Go to the top directory:
o "cd /mmfs1/home/USERNAME/SOFTWARE/TransDecoder-TransDecoder-v5.5.0"
· Write and Submit the test job:
o "qsub transdecoder_job.pbs"
------------------------------------------transdecoder_job.pbs------------------------------------------
#!/bin/bash
#PBS -q default
#PBS -N TransDecoder_test
#PBS -l select=1:mem=5gb:ncpus=2
#PBS -l walltime=1:00:00
## Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”
#PBS -W group_list=x-ccast-prj
cd $PBS_O_WORKDIR
module load BLAST+
module load HMMER
# Add TransDecoder root directory to $PATH
export PATH=$PATH:/mmfs1/home/USERNAME/SOFTWARE/TransDecoder-TransDecoder-v5.5.0
make test
exit 0
(c) Install SQLite
· Go to the scratch directory:
o "cd /mmfs1/home/USERNAME/SOFTWARE"
· Download SQLite and unzip:
o "wget https://www.sqlite.org/2019/sqlite-autoconf-3280000.tar.gz"
o "tar -xzvf sqlite-autoconf-3280000.tar.gz"
· Configure and specify install location; Build from source; and Install:
o "./configure --prefix=/mmfs1/home/USERNAME/SOFTWARE/sqlite_install_here"
o "make"
o “make install”
(d) Install Perl locally (It's convenient to install modules without root)
· Go to the SOFTWARE directory:
o "cd /mmfs1/home/USERNAME/SOFTWARE"
· Download Perl, unzip and go into the uncompressed directory:
o "wget https://www.cpan.org/src/5.0/perl-5.30.0.tar.gz"
o "tar xzvf perl-5.30.0.tar.gz"
o "cd perl-5.30.0"
· Load gcc and cmake in case the default is outdated:
o "module load gcc/7.3.0-gcc"
o "module load cmake/3.10.2-gcc"
· Install Perl locally with threads support:
o "./Configure -des -Dprefix=/mmfs1/home/USERNAME/SOFTWARE/perl -Dusethreads"
o "make"
o "make test"
o "make install"
· Add Perl to path for remainder of installation steps:
o "export PATH=$PATH:/mmfs1/home/USERNAME/SOFTWARE/perl/bin”
(e) Install Perl DBI (database interfaces) module
· Download perl-DBI module, unzip, and go into it:
o "wget https://cpan.metacpan.org/authors/id/T/TI/TIMB/DBI-1.642.tar.gz"
o "tar xzvf DBI-1.642.tar.gz"
o "cd DBI-1.642"
· Install DBI module:
o "perl Makefile.PL"
o "make"
o "make test"
o "make install"
(e) Install Perl DBD::SQLite (SQLite drivers for DBI) module
· Download SQLite driver for DBI, unzip, and go into it:
o "wget https://cpan.metacpan.org/authors/id/I/IS/ISHIGAKI/DBD-SQLite-1.62.tar.gz"
o "tar xzvf DBD-SQLite-1.62.tar.gz"
o "cd DBD-SQLite-1.62"
· Install DBD-SQLite:
o "perl Makefile.PL"
o "make"
o "make test"
o “make install”
(f) Install Perl CGI (needed for TrinotateWeb) module
This is another way to install Perl Modules that can automatically install the dependencies.
· Open the CPAN Shell:
o "perl -MCPAN -e shell"
· Install along with the dependencies:
o "install CGI"
· Quit CPAN Shell:
o "q"
(g) Install Trinotate
· Download Trinotate, unzip, and go into it: (Perl codes, no need to build.)
o "wget https://github.com/Trinotate/Trinotate/archive/Trinotate-v3.1.1.tar.gz"
o "tar xzvf Trinotate-v3.1.1.tar.gz"
o "cd Trinotate-Trinotate-v3.1.1"
· Set an environmental variable to save the location of your version of executables for the remainder of install and test("$TRINOTATE_HOME/name of the command" can be used to invoke the command. It can avoid conflicting with the other versions of the same command.)
“export TRINOTATE_HOME=/mmfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1”
· Download several data resources including the latest version of swissprot, pfam, and other companion resources, create and populate a Trinotate boilerplate sqlite database (Trinotate.sqlite), and yield uniprot_sprot.pep file to be used with BLAST, and the Pfam-A.hmm.gz file to be used for Pfam searches:
o "$TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate"
· Load Blast+ and HMMER:
o "module load BLAST+"
o "module load HMMER"
· Prepare the protein database for BLAST searches:
o "makeblastdb -in uniprot_sprot.pep -dbtype prot"
· Unzip and prepare the Pfam database for use with 'hmmscan':
o "gunzip Pfam-A.hmm.gz"
o "hmmpress Pfam-A.hmm"
(h) Test Trinotate (built-in test script)
There is a built-in test in "/gpfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/sample_data". It focuses on the test of Trinotate and skips some steps in the pipeline by providing the intermediate data directly. One can see runMe.sh for the command details.
· Go to the scratch directory directory:
o "cd /mmfs1/thunder/scratch/USERNAME”
· Copy sample data to your scratch directory
o "cp -r /mmfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/sample_data ."
· Write and submit the job:
o “cd sample_data”
o "qsub trinotate_job.pbs"
------------------------------------------------- trinotate_job.pbs---------------------------------------------------------
#!/bin/bash
#PBS -q default
#PBS -N Trinotate_Test
#PBS -l select=1:mem=5gb:ncpus=4
#PBS -l walltime=02:00:00
#PBS -W group_list=x-ccast-prj-[your project group name here]
cd $PBS_O_WORKDIR
module load Trinity/2.8.4
module load BLAST+/2.8.1
module load HMMER/3.1b2-gcc
# Add dependency programs to $PATH
export PATH=$PATH:/mmfs1/home/USERNAME/SOFTWARE/perl/bin
export PATH=$PATH:/mmfs1/home/USERNAME/SOFTWARE/sqlite_install_here/bin
export TRINOTATE_HOME=/mmfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1
./runMe.sh
exit 0
(i) Test Trinotate automatic pipeline script (For Advanced Users)
Test the automatic pipeline tool called "autoTrinotate.pl" in "/mmfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/auto".
· To automate the pipeline, you need to have local signalp-4.1, tmhmm-2.0 and rnammer-1.2:
· Download binaries:
o Request the signalp-4.1 at "http://www.cbs.dtu.dk/cgi-bin/sw_request?signalp+4.1".
o Request the tmhmm-2.0 at "http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?tmhmm".
o Request the rnammer-1.2 at "http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?rnammer".
· Download them to the home directory "/mmfs1/home/USERNAME/SOFTWARE".
· Unzip the downloaded files:
o "cd /mmfs1/home/USERNAME/SOFTWARE"
o "tar xzvf signalp-4.1g.Linux.tar.gz"
o "tar xzvf tmhmm-2.0c.Linux.tar.gz"
o "mkdir rnammer-1.2 && tar xzvf rnammer-1.2.src.tar.Z -C rnammer-1.2"
· Modify signalp-4.1:
· Modify file "signalp" in "/gpfs1/home/USERNAME/SOFTWARE/signalp-4.1" with your version of Perl in the first line and the path of signal, and storage of temporary files:
---------partial file "/gpfs1/home/USERNAME/SOFTWARE/signalp-4.1/signalp"------------
#!/mmfs1/home/USERNAME/SOFTWARE/perl/bin/perl
...
###############################################################################
# GENERAL SETTINGS: CUSTOMIZE TO YOUR SITE
###############################################################################
# full path to the signalp-4.1 directory on your system (mandatory)
BEGIN {
$ENV{SIGNALP} = '/mmfs1/home/USERNAME/SOFTWARE/signalp-4.1/';
}
# determine where to store temporary files (must be writable to all users)
my $outputDir = "/mmfs1/home/USERNAME/SOFTWARE/tmp";
# max number of sequences per run (any number can be handled)
my $MAX_ALLOWED_ENTRIES=2000000;
###############################################################################
...
· Modify tmhmm-2.0c:
o Modify file "tmhmm" in "/gpfs1/home/USERNAME/SOFTWARE/tmhmm-2.0c/bin" with your version of Perl in the first line :
---------partial file "/gpfs1/home/USERNAME/SOFTWARE/tmhmm-2.0c/bin/tmhmm"------------
#!/mmfs1/home/USERNAME/SOFTWARE/perl/bin/perl
# This is version 2.0c of tmhmm
...
o Modify file "tmhmmformat.pl" in "/gpfs1/home/USERNAME/SOFTWARE/tmhmm-2.0c/bin" with your version of Perl in the first line:
---------partial file "/mmfs1/home/USERNAME/SOFTWARE/tmhmm-2.0c/bin/tmhmmformat.pl"------------
#!/mmfs1/home/USERNAME/SOFTWARE/perl/bin/perl -w
# This is version 2.0c of tmhmmformat.pl
...
· Modify rnammer-1.2:
o Modify the files in "/gpfs1/home/USERNAME/SOFTWARE/rnammer-1.2" with your version of Perl in the first line: (use sed this time)
o "cd /mmfs1/home/USERNAME/SOFTWARE/rnammer-1.2"
o "sed -i 's_#!/usr/bin/perl_#!/mmfs1/home/USERNAME/SOFTWARE/perl/bin/perl_g' core-rnammer rnammer xml2fsa xml2gff"
· Rnammer requires the older version of hmmsearch (v2).
· Download and unzip:
o "wget http://eddylab.org/software/hmmer/hmmer-2.3.2.tar.gz"
o "tar xzvf hmmer-2.3.2.tar.gz"
· Install:
o "cd hmmer-2.3.2"
o "./configure --prefix=/mmfs1/home/USERNAME/SOFTWARE/hmmer2"
o “make”
o “make install”
· Rename hmmsearch as hmmsearch2:
o "cd /mmfs1/home/USERNAME/SOFTWARE/hmmer2/bin"
o "mv hmmsearch hmmsearch2"
· Modify the rnammer file in "/mmfs1/home/USERNAME/SOFTWARE/rnammer-1.2" with your path of Rnammer, Hmmsearch and Perl:
---------partial file "/gpfs1/home/USERNAME/SOFTWARE/rnammer-1.2/rnammer"------------
...
# the path of the program
my $INSTALL_PATH = "/mmfs1/home/USERNAME/SOFTWARE/rnammer-1.2";
...
if ( $uname eq "Linux" ) {
$HMMSEARCH_BINARY = "/mmfs1/home/USERNAME/SOFTWARE/hmmer2/bin/hmmsearch2";
$PERL = "/mmfs1/home/USERNAME/SOFTWARE/perl/bin/perl";
} elsif ( $uname eq "IRIX64" ) {
$HMMSEARCH_BINARY = "/mmfs1/home/USERNAME/SOFTWARE/hmmer2/bin/hmmsearch2";
$PERL = "/mmfs1/home/USERNAME/SOFTWARE/perl/bin/perl";
} else {
die "unknown platform\n";
}
...
· Modify the core-rnammer file in "/mmfs1/home/USERNAME/SOFTWARE/rnammer-1.2" by removing " --cpu 1":
o "cd /mmfs1/home/USERNAME/SOFTWARE/rnammer-1.2"
o "sed -i 's/ --cpu 1//g' core-rnammer"
· Install Perl XML::Simple and Getopt::Long modules (needed for rnammer):
· Open the CPAN Shell:
o "perl -MCPAN -e shell"
· Install along with the dependencies:
o "install XML::Simple"
o "install Getopt::Long"
· Quit CPAN Shell:
o "q"
· Install Perl URI::Escape module (needed for TransDecoder, but not noticed in above TransDecoder test):
· Open the CPAN Shell:
o "perl -MCPAN -e shell"
· Install along with the dependencies:
o "install URI::Escape"
· Quit CPAN Shell:
o "q"
· Make a test directory for Trinotate test and go into it:
o "cd /mmfs1/home/USERNAME/SOFTWARE"
o "mkdir Trinotate_example && cd Trinotate_example"
· Copy the test data and unzip:
o "cp /mmfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/auto/testing/{*.gz,conf.txt} ."
o "gunzip *.gz"
· Customize the path for your resources in the file "conf.txt":
------------------------------------------- partial conf.txt -------------------------------------------
...
[GLOBALS]
# ** edit the progs and dbs section to point to your local resources.
# progs
TRANSDECODER_DIR=/mmfs1/home/USERNAME/SOFTWARE/TransDecoder-TransDecoder-v5.5.0
BLASTX_PROG=blastx
BLASTP_PROG=blastp
SIGNALP_PROG=/mmfs1/home/USERNAME/SOFTWARE/signalp-4.1/signalp
TMHMM_PROG=/mmfs1/home/USERNAME/SOFTWARE/tmhmm-2.0c/bin/tmhmm
RNAMMER_TRANS_PROG=/mmfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/util/rnammer_support/RnammerTranscriptome.pl
RNAMMER=/mmfs1/home/USERNAME/SOFTWARE/rnammer-1.2/rnammer
HMMSCAN_PROG=hmmscan
# dbs
SWISSPROT_PEP=mini_sprot.pep
PFAM_DB=/mmfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1/Pfam-A.hmm
...
· Write and submit the job:
o "qsub trinotate_job.pbs"
------------------------------------------- file trinotate_job.pbs -------------------------------------------
#!/bin/bash
#PBS -q default
#PBS -N test
#PBS -l select=1:mem=10gb:ncpus=4
#PBS -l walltime=02:00:00
##change "x-ccast-prj" to "x-ccast-prj-[your project group name here]"
#PBS -W group_list=x-ccast-prj
cd $PBS_O_WORKDIR
# Load required modules
module load Trinity/2.8.4
module load BLAST+/2.8.1
module load HMMER/3.1b2-gcc
# Add dependency programs to $PATH
export PATH=$PATH:/mmfs1/home/USERNAME/SOFTWARE/perl/bin
export PATH=$PATH:/mmfs1/home/USERNAME/SOFTWARE/sqlite_install_here/bin
export TRINOTATE_HOME=/mmfs1/home/USERNAME/SOFTWARE/Trinotate-Trinotate-v3.1.1
# Index the database for BLAST searches
makeblastdb -in mini_sprot.pep -dbtype prot
# Pulling Trinotate Boilerplate sqlite database from ftp site
wget "https://data.broadinstitute.org/Trinity/Trinotate_v3_RESOURCES/Trinotate_v3.sqlite.gz" -O my.sqlite.gz
# Unzip
gunzip -c my.sqlite.gz > my.sqlite
# Include the FASTA.pm for the signalp-4.1
export PERL5LIB=/mmfs1/home/USERNAME/SOFTWARE/signalp-4.1/lib:$PERL5LIB
# Run the automation script
$TRINOTATE_HOME/auto/autoTrinotate.pl --Trinotate_sqlite my.sqlite --transcripts myTrinity.fasta --gene_to_trans_map myTrinity.fasta.gene_to_trans_map --conf conf.txt --CPU $NCPUS
exit 0