MultiRepMacsChIPSeq - sort_data_by_key

sort_data_by_key.pl

This script will sort the features in a data file by the groups indicated in a second provided file. Features are sorted numerically by the row mean values within each group. The primary intention is to improve plotting of data by extrinsic grouping rather than intrinsic ordering.

The data file should be a table, such as generated by get_datasets.pl or get_relative_data.pl, where the ID and/or Name column is a unique identifier and multiple data columns of numeric data exist.

The key or matrix file should contain the same unique feature identifiers (Name or ID) in the first column as the input data file. The groups should be in one or more subsequent columns. If more than one column, the values are concatenated with a ‘-‘ delimiter. The boolean intersection matrix file from intersect_peaks.pl may be used here, for example.

An opportunity is provided to normalize the data values by specifying the reference columns. In each row, the mean value from the reference columns is subtracted from all of the data values. The purpose is to generate a relative difference table as a convenience.

Multiple output files are written:

A sorted data table with a basename suffix of .sorted
A .row_groups.txt annotation file with the row groups
A summary data file for each group for plotting mean profiles

VERSION: 0.2

USAGE:

sort_data_by_key.pl  -i <dataset_file>  -k <matrix_file>

REQUIRED:

-i --input <file>      The input data file
-k --key <file>        The matrix file of keys and groups

OPTIONS:

-g --group <index>     The group column(s) in the key file
					   default is all columns except the first ID column
-v --values <index>    Specific data column indexes to sort by
					   default is all identifiable data columns
-n --norm <index>      Specify the reference data columns for normalization
-f --format <int>      Specify the decimal positions when normalizing
-s --sort [count|name] Specify how to sort the groups: by decreasing 
					   count or by name. Default is count.
-h --help              Show this help