Topics Map > Services > Research Computing and Support > CCAST
Bioinformatics - FastQC
Instructions on how to run (and, if needed, install a customized version of) FastQC
FastQC is a quality control tool for high throughput sequence data.
1. Running FastQC on Thunder
Example: Examine the sequencing quality of fastq files
Location: /gpfs1/projects/ccastest/training/examples/FastQC_example
File list:
· fastqc_job.pbs: job submission script
· EGL245_S19_L001_R1_001.fastq: a sequence file in fastq format
· EGL245_S19_L001_R2_001.fastq: a sequence file in fastq format
· EGL290_S65_L001_R1_001.fastq: a sequence file in fastq format
· EGL290_S65_L001_R2_001.fastq: a sequence file in fastq format
Steps:
· Copy the example directory to your SCRATCH directory
o “cp -r /gpfs1/projects/ccastest/training/examples/FastQC_example $SCRATCH”
· Go to the copied directory
o “cd $SCRATCH/FastQC_example ”
· Edit the job submission script as needed, then submit the job
o “qsub fastqc_job.pbs”
2. Install Customized FastQC on Thunder
Warning: This part is intended ONLY for those who want to install and test their own version in their HOME directory.
Summary
(a) Require a suitable Java Runtime Environment; (available via ‘module load java/jdk/1.8u141-b15’)
(b) Option “-t”: number of files processed simultaneously. 250MB memory allocated for each thread.
Details
In the following pages, we assume that you want to install the software in a directory named “SOFTWARE” inside your HOME directory on the CCAST’s Thunder cluster. “USERNAME” is your username on Thunder.
(a) Install
· Go to the SOFTWARE directory:
o "cd /gpfs1/home/USERNAME/SOFTWARE"
· Download fastqc:
o "wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip"
· Unzip to current location:
o "unzip fastqc_v0.11.8.zip"
· Give permission to the fastqc script:
o "cd FastQC"
o "chmod +x fastqc"
(b) Test
· Go to the /scratch directory:
o "cd /gpfs1/scratch/USERNAME"
· Download and unzip sample data: (Only use some of them)
o "wget http://de.cyverse.org/dl/d/C4416643-CA5C-4CDD-9FB0-86520AB61059/example_reads.tar.gz"
o "tar -zxvf example_reads.tar.gz"
· Make, go to test directory and copy four data samples:
o "mkdir fastqc_example"
o "cd fastqc_example"
o "cp /gpfs1/scratch/USERNAME/example_reads/*_S{19,65}_* ."
· Write and submit the job
o "qsub fastqc_job.pbs”
--------------- fastqc_job.pbs -----------------
#!/bin/bash
#PBS -q default
#PBS -N fastqc_test
#PBS -l select=1:mem=10gb:ncpus=4
#PBS -l walltime=1:00:00
## Replace “x-ccast-prj” with “x-ccast-prj-[your project group name here]”
#PBS -W group_list=x-ccast-prj
cd $PBS_O_WORKDIR
# add FastQC binaries location to #PATH
export PATH=$PATH:/gpfs1/home/USERNAME/SOFTWARE/FastQC
fastqc -t $NCPUS EGL245_S19_L001_R1_001.fastq EGL245_S19_L001_R2_001.fastq EGL290_S65_L001_R1_001.fastq EGL290_S65_L001_R2_001.fastq
exit 0