UMI Scripts

Scripts for handling Unique Molecular Indexes in Fastq and Bam files, primarily for de-duplication purposes.

Description

Unique molecular indexes (UMIs) are short, random, nucleotide sequences incorporated into next generation sequence reads for uniquely identifying biological molecules of RNA or DNA. This aids in distinguishing biological duplicates from PCR-derived or technical (optical) duplicates.

This package of scripts aid in processing these UMIs, both from Fastq and Bam files. Embedded UMIs can be extracted from the primary sequence read in a Fastq file, or they may be provided as a separate Fastq file. They may be stored either appended to the read name or assigned to SAM/Bam attribute tag.

These scripts are designed to be fairly simplistic and work as quickly as possible. Users needing more advanced tools or reporting may need to look elsewhere.

Installation

See the Installation documentation.

Usage

See the Usage documentation.

Applications

See the list of available Applications.

Execute each script without options or with the -h or --help options to print the included documentation.

AUTHOR

Timothy J. Parnell, PhD
Cancer Bioinformatics Shared Resource
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

LICENSE

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0. For details, see the full text of the license in the file LICENSE.

This package is distributed in the hope that it will be useful, but it is provided “as is” and without any express or implied warranties. For details, see the full text of the license in the file LICENSE.