iteres has several modules for transposable element related analysis. stat was used for getting subfamily level alignment statistics from Chip-based sequence data, like Chip-seq, MeDIP-seq etc. cpgstat was used for same purpose, but used for MRE-seq data.
Example run
stat
We provided a sample BAM file at here, please download this BAM file and run following command (download chromosome size file, repeat size file and rmsk.txt file from Download):
iteres stat hg19_lite.size subfam.size rmsk.txt sample.bam
iteres will ouput some stuff on the screen:
* Parsing the rmsk file * Total 5298130 repeats found. * Parsing the SAM/BAM file * Processed read ends: 12548044 * Writing stats and Wig file * Generating bigWig files * Preparing report file * Done, time used 44 seconds.
Note: since the chromosome size file didn't contain those supercontigs, so you would see some warning messages.
This would generate following files:
sample.iteres.report | a simple report file contains the mapping statistics |
sample.iteres.unique.bigWig | read density on repeat consensus by uniquely mapped reads |
sample.iteres.bigWig | read density on repeat consensus by all mapped reads |
sample.iteres.class.stat | statistics of repeats at class level, like RPM/RPKM by uniquely or all mapped reads |
sample.iteres.family.stat | statistics of repeats at family level, like RPM/RPKM by uniquely or all mapped reads |
sample.iteres.subfamily.stat | statistics of repeats at subfamiliy level, like RPM/RPKM by uniquely or all mapped reads |
Run iteres without any parameter will give a list of available modules:
Module parameters
stat
$ iteres stat Obtain alignment statistics for each repeat subfamily, family and class. Usage: iteres stat [options]Options: -S input is SAM [off] -Q unique reads mapping Quality threshold [10] -c coverage threshold for overlapping [0.0001] -N normalized by number of (0: reads in repeats, 1: non-redundant reads, 2: mapped reads, 3: total reads) [0]) -U unique reads normalized by number of (0: unique mapped reads in repeats, 1: unique mapped reads, 2: total reads) [0]) -R remove redundant reads [off] -T treat 1 paired-end read as 2 single-end reads [off] -D discard if only one end mapped in a paired end reads [off] -w keep the wiggle file [off] -B output bed file of mapped reads [off] -V output bed file of unique mapped reads [off] -C Add 'chr' string as prefix of reference sequence [off] -E extend reads to represent fragment [150], specify 0 if want no extension -I Insert length threshold [500] -o output prefix [basename of input without extension] -h help message -? help message
filter
$ iteres filter Obtain alignment statistics of individual loci of each repeat subfamily, family or class. Usage: iteres filter [options]Options: -S input is SAM [off] -Q mapping Quality threshold [10] -g coverage threshold for overlapping [0.0001] -N normalized by number of (0: non-redundant unique mapped reads, 1: unique reads, 2: mapped reads, 3: total reads) [0]) -n use repName (subfamily) as filter [null] -f use repFamily as filter [null] -c use repClass as filter [null] -t only output repeats have more than [1] reads mapped -r output the list of reads [off] -R remove redundant reads [off] -T treat 1 paired-end read as 2 single-end reads [off] -D discard if only one end mapped in a paired end reads [off] -C Add 'chr' string as prefix of reference sequence [off] -E extend reads to represent fragment [150], specify 0 if want no extension -I Insert length threshold [500] -o output prefix [basename of input without extension] -h help message -? help message
nearby (Removed since version 0.3.1)
$ iteres nearby Obtain nearby genes from locations listed in a bed file by querying UCSC database. Usage: iteres nearby [options]Options: -d database to query [hg19] -n output how many genes each direction [1] -o output prefix [basename of input without extension] -h help message -? help message Note: the bed file should contain at least 3 fields which were [chr] [start] [end] also you need to have an internet connection
cpgstat
$ iteres cpgstat obtain CpG statistics for each repeat subfamily, family and class. Usage: iteres cpgstat [options]Options: -w keep the wiggle file [off] -o output prefix [basename of input without extension] -h help message -? help message
cpgfilter
$ iteres cpgfilter obtain CpG statistics for each repeat locus. Usage: iteres cpgfilter [options]Options: -n use repName (subfamily) as filter [null] -f use repFamily as filter [null] -c use repClass as filter [null] -t only output repeats have more than [0] CpG score -o output prefix [basename of input without extension] -h help message -? help message