Skip to the content.

MultiRepMacsChIPSeq - annotate_to_tss

Home Overview Usage Variations Applications Install

annotate_to_tss.pl

A script to annotate genomic intervals, such as enrichment peaks, to neighboring gene Transcription Start Sites (TSS).

This wraps around the window function of bedtools and collates the output into several lists of overlapping, adjacent, and neighborhood genes. This is predicated on the observation that regulatory regions frequently interact with not just the closest gene but rather multiple neighboring genes, some at considerable distances.

Note that there will be one-to-one, one-to-many, many-to-one, and many-to-many relationships between peaks and genes. Expect duplications.

The maximum distance may be defined by the user; all TSS within this radius will be reported. For input regions expected to be promoter-proximal, such as H3K4me3, set this to a lower value.

This uses a transcript annotation file to extract the TSS. For best results, a custom annotation file based on empirical expression data should be used, preferably filtered for positive expression in the tested samples. Otherwise, a genome annotation file will suffice.

By default, there are three output files written: one based on the input peak files, one based on all found intersecting genes, and a relative coverage data file.

out.annotation.tsv          Peak table with overlapping, left, right, and
                              neighborhood gene annotation if found
out.genes.tsv               Gene table with overlapping, closest, adjacent,
                              and neighborhood genes and corresponding peaks
out.tss_profile.txt         Data table of peak spatial coverage relative to
                              closest gene TSS

If requested, additional separate gene lists may be written:

out.overlapping_genes.tsv   Table of all overlapping genes with peak names
out.closest_genes.tsv       Table of the closest genes with peak names
out.adjacent_genes.tsv      Table of all overlapping, immediate left, and 
                              immediate right genes with peak names
out.neighbor_genes.tsv      Table of all genes within search radius

If a gene annotation file was provided, the TSS file is also saved for future reference.

VERSION: 0.7

USAGE:

annotate_to_tss.pl  -i peak.bed -t GRCh38_TSS.txt

OPTIONS:

Input:
  -i --in <file>           Input peak file (bed, narrowPeak). Required.
  -o --out <file>          Output file basename (default input basename)

TSS Annotation (pick one only, required):
  -t --tss <file>          TSS data file from get_gene_regions.pl
  -a --annotation <file>   Gene annotation file (gtf, gff3, UCSC genePred)

Options:
  -d --distance <int>      Neighborhood distance for reporting in Kb (50 Kb)
  --overlap <int>          Gap distance to consider overlapping in bp (100 bp)
  -w --writelists          Write separate gene table lists

General:
  --bedtools <path>        Path to bedtools ()
  --getgene <path>         Path to get_gene_regions ()
  -h --help                Print documentation