Bioinformatics - FastQC

Instructions on how to run (and, if needed, install a customized version of) FastQC

FastQC is a quality control tool for high throughput sequence data.

  1. Running FastQC on Thunder
  2. Install customized FastQC on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running FastQC on Thunder


Example: Examine the sequencing quality of fastq files


Location: /gpfs1/projects/ccastest/training/examples/FastQC_example


File list:


·  fastqc_job.pbs: job submission script

· EGL245_S19_L001_R1_001.fastq: a sequence file in fastq format

·  EGL245_S19_L001_R2_001.fastq: a sequence file in fastq format

·  EGL290_S65_L001_R1_001.fastq: a sequence file in fastq format

·  EGL290_S65_L001_R2_001.fastq: a sequence file in fastq format


Steps:


·  Copy the example directory to your SCRATCH directory

o   cp -r /gpfs1/projects/ccastest/training/examples/FastQC_example $SCRATCH

·  Go to the copied directory

o   cd $SCRATCH/FastQC_example 

·  Edit the job submission script as needed, then submit the job

o    qsub fastqc_job.pbs


2. Install Customized FastQC on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.

Summary  

(a)    Require a suitable Java Runtime Environment; (available via ‘module load java/jdk/1.8u141-b15’) 

(b)    Option “-t: number of files processed simultaneously. 250MB memory allocated for each thread.

Details

In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.


(a) Install

·       Go to the SOFTWARE directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE" 

·       Download fastqc: 

o    "wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip" 

·       Unzip to current location: 

o    "unzip fastqc_v0.11.8.zip"

·       Give permission to the fastqc script: 

o    "cd FastQC"

o    "chmod +x fastqc"

(b) Test

·       Go to the /scratch directory: 

o    "cd /gpfs1/scratch/USERNAME" 

·       Download and unzip sample data:  (Only use some of them)

o    "wget http://de.cyverse.org/dl/d/C4416643-CA5C-4CDD-9FB0-86520AB61059/example_reads.tar.gz

o    "tar -zxvf example_reads.tar.gz"

·       Make, go to test directory and copy four data samples: 

o    "mkdir fastqc_example"         

o    "cd fastqc_example"

o    "cp /gpfs1/scratch/USERNAME/example_reads/*_S{19,65}_* ."

·       Write and submit the job 

o    "qsub fastqc_job.pbs


--------------- fastqc_job.pbs -----------------

#!/bin/bash

#PBS -q default

#PBS -N fastqc_test

#PBS -l select=1:mem=10gb:ncpus=4

#PBS -l walltime=1:00:00

## Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”

#PBS -W group_list=x-ccast-prj

cd $PBS_O_WORKDIR

# add FastQC binaries location to #PATH

export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/FastQC

fastqc -t $NCPUS EGL245_S19_L001_R1_001.fastq EGL245_S19_L001_R2_001.fastq EGL290_S65_L001_R1_001.fastq EGL290_S65_L001_R2_001.fastq

exit 0

See Also: