Topics Map > Services > Research Computing and Support > CCAST

Bioinformatics - FastQC

Instructions on how to run (and, if needed, install a customized version of) FastQC

FastQC is a quality control tool for high throughput sequence data.

  1. Running FastQC on Thunder
  2. Install customized FastQC on Thunder
Please refer to the CCAST User Guide and the the article Running Bioinformatics Software on HPC Clusters for general information about using CCAST resources and running bioinformatics software on CCAST's HPC clusters.

1. Running FastQC on Thunder

Example: Examine the sequencing quality of fastq files

Location: /gpfs1/projects/ccastest/training/examples/FastQC_example

File list:

·  fastqc_job.pbs: job submission script

· EGL245_S19_L001_R1_001.fastq: a sequence file in fastq format

·  EGL245_S19_L001_R2_001.fastq: a sequence file in fastq format

·  EGL290_S65_L001_R1_001.fastq: a sequence file in fastq format

·  EGL290_S65_L001_R2_001.fastq: a sequence file in fastq format


·  Copy the example directory to your SCRATCH directory

o   cp -r /gpfs1/projects/ccastest/training/examples/FastQC_example $SCRATCH

·  Go to the copied directory

o   cd $SCRATCH/FastQC_example 

·  Edit the job submission script as needed, then submit the job

o    qsub fastqc_job.pbs

2. Install Customized FastQC on Thunder

Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.


(a)    Require a suitable Java Runtime Environment; (available via ‘module load java/jdk/1.8u141-b15’) 

(b)    Option “-t: number of files processed simultaneously. 250MB memory allocated for each thread.


In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME is your username on Thunder.

(a) Install

·       Go to the SOFTWARE directory: 

o    "cd /gpfs1/home/USERNAME/SOFTWARE" 

·       Download fastqc: 

o    "wget" 

·       Unzip to current location: 

o    "unzip"

·       Give permission to the fastqc script: 

o    "cd FastQC"

o    "chmod +x fastqc"

(b) Test

·       Go to the /scratch directory: 

o    "cd /gpfs1/scratch/USERNAME" 

·       Download and unzip sample data:  (Only use some of them)

o    "wget

o    "tar -zxvf example_reads.tar.gz"

·       Make, go to test directory and copy four data samples: 

o    "mkdir fastqc_example"         

o    "cd fastqc_example"

o    "cp /gpfs1/scratch/USERNAME/example_reads/*_S{19,65}_* ."

·       Write and submit the job 

o    "qsub fastqc_job.pbs

--------------- fastqc_job.pbs -----------------


#PBS -q default

#PBS -N fastqc_test

#PBS -l select=1:mem=10gb:ncpus=4

#PBS -l walltime=1:00:00

## Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”

#PBS -W group_list=x-ccast-prj


# add FastQC binaries location to #PATH

export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/FastQC

fastqc -t $NCPUS EGL245_S19_L001_R1_001.fastq EGL245_S19_L001_R2_001.fastq EGL290_S65_L001_R1_001.fastq EGL290_S65_L001_R2_001.fastq

exit 0

See Also:

Keywords:ccast, hpc, thunder, bioinformatics, fastqc, "sequence analysis"   Doc ID:108024
Owner:Liu Y.Group:IT Knowledge Base
Created:2020-12-21 16:27 CSTUpdated:2020-12-29 01:15 CST
Sites:IT Knowledge Base
Feedback:  0   0