MultiRepMacsChIPSeq - multirep_macs2_pipeline
Home | Overview | Usage | Variations | Applications | Install |
multirep_macs2_pipeline.pl
This is a wrapper for calling and/or comparing peaks in ChIPSeq or ATACSeq with single or multiple replicas using the Macs2 ChIPSeq caller. It uses BioToolBox applications to normalize duplicate levels and read depths between samples and replicates.
Multiple ChIP samples (experiments) may be provided by repeating the chip option as necessary for every experiment, factor, or antibody sample. Provide a separate name for each sample, in the same order.
ChIP sample replicas should be comma-delimited values to the chip option. Each sample could have one or more replicas. Replicas will be averaged together in a depth-controlled manner. If for some reason you don’t want to merge replicas, then treat them as individual samples.
One control may be used for all samples, or sample-matched controls may be provided by repeating the option, keeping the same order. Control replicas may be provided as comma-delimited lists. If multiple, but not all, ChIP samples share controls, then they should still be listed individually for each ChIP; duplicate controls will be properly handled. If no control is available (for example, ATACSeq often has no genomic input), then a global mean coverage will be calculated from the ChIP samples and used as the control.
Fragment size should be empirically determined by the user, especially when multiple samples and/or replicates are being used. The same fragment size is used across all samples and replicates to ensure equal comparisons. NOTE: even in paired-end mode, fragment size is used for control lambda.
By default, this employs Macs2 local lambda chromatin-bias modeling as the reference track derived from the provided input. This uses three sources to model chromatin bias: fragment (or d in Macs2 parlance), small lambda (default 1000 bp), and large lambda (default 10000 bp) fragment coverage. If desired, either small or local lambda may be turned off by setting to 0. To completely turn off lambda, set the nolambda option, whereupon only the control fragment is directly used as reference. If no control file is provided, then the chromosomal mean from the ChIP file is used as a (poor) substitute.
Advanced users may provide one processed bigWig file per ChIP or control sample.
Version: 20
Options:
Input files
--chip file1,file2... Repeat for each sample set
--name text Repeat for each sample
--control file1,file2... Repeat if matching multiple samples
Output
--dir directory Directory for writing all files (./MultiRepPeakCall)
--out file basename Base filename for merged output files (merged)
Genome size
--genome integer Specify effective mappable genome size
(default empirically determined)
Bam options
--pe Bam files are paired-end, default treat as single-end
Alignment filtering options
--mapq integer Minimum mapping quality, (0)
--chrskip "text" Chromosome skip regex (chrM|MT|alt|Adapter|Lambda|PhiX)
--blacklist file Bed file of repeats or hotspots to avoid
Default determined empirically from control samples.
Specify 'none' for no filtering.
--min integer Minimum paired-end size allowed (50 bp)
--max integer Maximum paired-end size allowed (500 bp)
Duplication filtering
--nodedup Skip deduplication and use all primary, nondup alignments
--dupfrac float Target duplication rate for subsampling (0.05)
--maxdepth integer Maximum position alignment depth ()
set to 1 to remove all duplicates
--optdist integer Maximum distance for optical duplicates (0)
use 100 for HiSeq, 2500 for NovaSeq
--deduppair Run deduplication as paired-end, but coverage as single-end
Fragment coverage
--cutsite Set multiple options specific for ATACSeq cutsite analysis
--size integer Predicted fragment size. REQUIRED for single-end
--shift integer Shift the fragment in special situations
--fraction Record multiple-hit alignments as fraction of hits
--slocal integer Small local lambda size (1000 bp)
--llocal integer Large local lambda size (10000 bp)
--cbin integer ChIP fragment bin size (10 bp)
--slbin integer Small local lambda bin size (50 bp)
--llbin integer Large local lambda bin size (100 bp)
Chromosome-specific normalization
--chrnorm float Specific chromosome normalization factor
--chrapply "text" Apply factor to specified chromosomes via regex
Peak calling
--independent Call peaks independently for each replicate and merge
--cutoff number Threshold q-value for calling peaks (2)
Higher numbers are more significant, -1*log10(q)
--peaksize integer Minimum peak size to call (2 x fragment size)
Required for paired-end alignments.
--peakgap integer Maximum gap between peaks before merging (1 x size)
--broad Also perform broad (gapped) peak calling
--broadcut number Q-value cutoff for linking broad regions (0.5)
--broadgap integer Maximum link size between peaks in broad calls (4 x size bp)
--nolambda Skip lambda control, compare ChIP directly with control
--minpeakover integer Minimum number of overlapping replicate peaks to accept
in final when merging (default n-1, minimum 2)
--samedepth Use same target depth when calculating per sample
q-value enrichment for replicate-mean peaks
Peak scoring
--binsize integer Size of bins in 25 flanking peak bins for profile (100)
--targetdepth text Set method for sequence depth scaling for all count data:
median (default), mean, min
--rawcounts Use unscaled raw counts for re-scoring peaks
--noplot Do not plot figures of results
Job control
--cpu integer Number of CPUs to use per job (4)
--job integer Number of simultaneous jobs (2)
--dryrun Just print the commands without execution
--noorganize Do not organize files into subfolders when finished
--savebam Save de-duplicated bam files
--savebdg Save text bedGraph files
Application Paths
--bam2wig path (bam2wig.pl)
--bamdedup path (bam_partial_dedup.pl)
--bedtools path (bedtools)
--bw2bdg path (bigWigToBedGraph)
--data2wig path (data2wig.pl)
--getdata path (get_datasets.pl)
--getrel path (get_relative_data.pl)
--geteff path (get_chip_efficiency.pl)
--intersect path (intersect_peaks.pl)
--macs path (macs2)
--manwig path (manipulate_wig.pl)
--meanbdg path (generate_mean_bedGraph.pl)
--peak2bed path (peak2bed.pl)
--updatepeak path (update_peak_file.pl)
--pandoc path (pandoc)
--plotpeak path (plot_peak_figures.R)
--printchr path (print_chromosome_lengths.pl)
--reportmap path (report_mappable_space.pl)
--rscript path (Rscript)
--wig2bw path (wigToBigWig)