R scripts for RNA-Seq at HCI

This guide describes how to run alignments and quality metrics on the CHPC clusters and then run multiQC and other R scripts on the output files. The scripts are located in /home/BioApps/hciR on uinta, so add that directory to your path to run the *.R files below. The basic steps to align 50 bp single-end reads to mouse are listed below.

setup_jobs.R -c kingspeak -s single -l 50 -d mouse -i <fastq directory> -e <hci email>
pstart 14176X*
module load multiqc
multiqc .
add_readme.R -d mouse -r 14176R -f 14176X1_170425.fastq.gz
render.R -f README.Rmd
read_featureCounts.R -d .
add_deseq.R -c counts.txt -s samples.txt -d mouse
render.R -f DESeq.Rmd

1. Setup jobs

Run setup_jobs.R to create a cmd.txt file and link the Fastq files in the Fastq directory to create 12 separate sample directories (14176X1 to 14176X12).

setup_jobs.R -c kingspeak -s single -l 50 -d mouse -i Fastq -e chris.stubben
# Created cmd.txt file for mouse star50
# Linked 12 Fastq files

Review the cmd.txt file and check the commands to the run cutadapt, FastQC, STAR, featureCounts, RSEM, CollectRnaSeqMetrics, and samtools idxstats. Use pysano to start the jobs by matching the sample directory names.

pstart 14176X*

The default is to align 50 bp single-end HiSeq reads to the Ensembl 96 human reference. The cmd.txt file can be configured with the -s option to align HiSeq paired-end, NovaSeq paired-end with an optical duplicate removal step, Qiagen or NEB small RNA libraries. Reference databases are available for even-numbered releases in Ensembl from human, mouse, fly, pig, rabbit, elephant, vervet, rat, sheep, worm, yeast or zebrafish with splice junctions optimized for 50 or 125 bp reads. Use the -h option to view the help pages for more details.

2. Run multiQC

MultiQC searches the current working directory for analysis logs and compiles a report that includes interactive summary tables and plots for all seven commands in the cmd.txt files.

module load multiqc
multiqc .

3. Add README

Add a markdown file with details on the STAR alignments, quality metrics and viewing BAM and BigWig files in IGV. Check the file and make any changes and then render the HTML report.

add_readme.R -d mouse -r 14176R -f 14176X1_170425.fastq.gz
# Created README.Rmd markdown file for mouse star50
# Run 'render.R -f README.Rmd' to create an html report
render.R -f README.Rmd
# processing file: README.Rmd
# ...
# Output created: README.html

4. Combine featureCount output

Combine the featureCounts output files into a single count matrix in counts.txt.

read_featureCounts.R -d .
# Reading ./14176X1/14176X1.counts
# Reading ./14176X2/14176X2.counts
# ...
# Saved 12 samples to counts.txt

5. Run DESeq2

Create a tab-delimited file with sample IDs in the first column matching the count column names and a treatment column with sample groups to compare (and optionally add sample names and other metadata). Add a markdown file with code to run DESeq2 using all possible contrasts in the treatment column and render the markdown file to save an Excel file and html report with sample visualizations.

cat samples.txt
# id trt
# 14176X1  WT
# 14176X2  WT
# 14176X3  TG
# 14176X4  TG
# ...
add_deseq.R  -c counts.txt -s samples.txt -d mouse
# Created DESeq.Rmd markdown file with mouse annotations from Ensembl version 96
# Run 'render.R -f DESeq.Rmd' to create an html report
render.R -f DESeq.Rmd