This guide describes how to run alignments and quality metrics on the CHPC clusters and then run multiQC and other R
scripts on the output files. The scripts are located in /home/BioApps/hciR
on uinta
, so add that directory to your path to run the *.R
files below. The basic steps to align 50 bp single-end reads to mouse are listed below.
setup_jobs.R -c kingspeak -s single -l 50 -d mouse -i <fastq directory> -e <hci email>
pstart 14176X*
module load multiqc
multiqc .
add_readme.R -d mouse -r 14176R -f 14176X1_170425.fastq.gz
render.R -f README.Rmd
read_featureCounts.R -d .
add_deseq.R -c counts.txt -s samples.txt -d mouse
render.R -f DESeq.Rmd
Run setup_jobs.R
to create a cmd.txt
file and link the Fastq files in the Fastq
directory to create 12 separate sample directories (14176X1 to 14176X12).
setup_jobs.R -c kingspeak -s single -l 50 -d mouse -i Fastq -e chris.stubben
# Created cmd.txt file for mouse star50
# Linked 12 Fastq files
Review the cmd.txt
file and check the commands to the run cutadapt, FastQC, STAR, featureCounts, RSEM, CollectRnaSeqMetrics, and samtools idxstats. Use pysano to start the jobs by matching the sample directory names.
pstart 14176X*
The default is to align 50 bp single-end HiSeq reads to the Ensembl 96 human reference. The cmd.txt
file can be configured with the -s option to align HiSeq paired-end, NovaSeq paired-end with an optical duplicate removal step, Qiagen or NEB small RNA libraries. Reference databases are available for even-numbered releases in Ensembl from human, mouse, fly, pig, rabbit, elephant, vervet, rat, sheep, worm, yeast or zebrafish with splice junctions optimized for 50 or 125 bp reads. Use the -h
option to view the help pages for more details.
MultiQC searches the current working directory for analysis logs and compiles a report that includes interactive summary tables and plots for all seven commands in the cmd.txt
files.
module load multiqc
multiqc .
Add a markdown file with details on the STAR alignments, quality metrics and viewing BAM and BigWig files in IGV. Check the file and make any changes and then render the HTML report.
add_readme.R -d mouse -r 14176R -f 14176X1_170425.fastq.gz
# Created README.Rmd markdown file for mouse star50
# Run 'render.R -f README.Rmd' to create an html report
render.R -f README.Rmd
# processing file: README.Rmd
# ...
# Output created: README.html
Combine the featureCounts output files into a single count matrix in counts.txt
.
read_featureCounts.R -d .
# Reading ./14176X1/14176X1.counts
# Reading ./14176X2/14176X2.counts
# ...
# Saved 12 samples to counts.txt
Create a tab-delimited file with sample IDs in the first column matching the count column names and a treatment column with sample groups to compare (and optionally add sample names and other metadata). Add a markdown file with code to run DESeq2 using all possible contrasts in the treatment column and render the markdown file to save an Excel file and html report with sample visualizations.
cat samples.txt
# id trt
# 14176X1 WT
# 14176X2 WT
# 14176X3 TG
# 14176X4 TG
# ...
add_deseq.R -c counts.txt -s samples.txt -d mouse
# Created DESeq.Rmd markdown file with mouse annotations from Ensembl version 96
# Run 'render.R -f DESeq.Rmd' to create an html report
render.R -f DESeq.Rmd