UMI Scripts - merge_UMI_fastq
Home | Install | Usage | Applications |
merge_UMI_fastq
A script to merge a separate UMI fastq file with sequence read fastq file(s).
Unique Molecular Indexes (UMIs) are short, random nucleotide sequences that can help to distinguish independent DNA molecules with the same sequence composition. By associating the UMI to a nucleotide molecule at the beginning of library preparation, identical sequence reads can be distinguished between unique, biologically-derived molecules and PCR-derived or sequencing artifact duplicates.
Either paired-end or single-end reads may be merged with a UMI Fastq file.
UMIs can be merged into sequence read Fastq files in three ways:
-
SAM attribute in Fastq comment Inserted as standard SAM attribute tags
RX
andQX
in the read header. This is compatible with e.g. BWA, Bowties2 aligners. DEFAULT. -
SAM attribute in unaligned Bam format Exported as an unaligned SAM/BAM format with RX and QX attribute tags. This is compatible with e.g. Bowtie2, STAR, BWA, and Novoalign. Use the
--bam
option. -
Append to name Appended to the read name as “:UMI_sequence”. This is compatible with all aligners but requires special software to utilize. Non-standard. Use the
--name
option.
Alignments can be de-duplicated utilizing the UMI codes with external software. See bam_umi_dedup.pl in this software package, or Picard ‘UmiAwareMarkDuplicatesWithMateCigar’ as possibilities.
External utilities may be required for execution, inlcuding pigz
and/or gzip
,
and samtools
. These are searched for in your environment PATH
.
Read and UMI Fastq files may be empirically determined based on size and name.
VERSION: 1.01
USAGE:
merge_umi_fastq.pl *.fastq.gz
merge_umi_fastq.pl -1 R1.fq.gz -2 R2.fq.gz -u UMI.fq.gz -o unaligned.bam
OPTIONS
Input:
-1 --read1 <file> First fastq read
-2 --read2 <file> Second fastq read, optional
-u --umi <file> UMI fastq read
-U --umi2 <file> Second UMI fastq file, optional
Output:
-o --out <file> Output file (base)name (input basename)
use 'stdout' for (interleaved) piping
-b --bam Write output as unaligned BAM format
or SAM format if stdout or samtools unavailable
-n --name Append UMI to read name instead of SAM tag RX
Other:
--samtools <path> Path to samtools (/usr/local/bin/samtools)
-h --help Show full description and help