Intermediate Methylation Detection Algorithm (iMet)

We developed a maximum scoring segment algorithm to identify regions of overlapping MeDIP-seq and MRE-seq signals. Given normalized MeDIP-seq and MRE-seq read densities across all CpGs, the algorithm traced through each CpG sequentially, comparing read counts from both assays. An arbitrary score proportional to the read density was increased when the signals overlap and decreased when they do not, and an additional penalty proportional to the distance between CpGs was assigned. When the score returned to zero at some distance following the initialization of an IM region, the end point of the region was defined by the position with the highest score following the start site.

The tool used for identifying IM regions could be downloaded from here.

This tool was developed by Xin Zhou and is maintained by GiNell Elliott.

Instructions for Running iMet tools

The download file above contains 3 tools for processing MeDIP-Seq and MRE-Seq data, a directory of example data, and a script that will run the pipeline on the example data using a single command. See details on each item below.

List of download contents:

README.txt
iMet.c
medipBedGraph2cpg.c
mreBed2cpg.c
run_iMet_Example.sh

Example_Data:
chr20.chromSize.txt
chr20.CpG_sites.bed
chr20.Breast_Luminal_Epithelial_Cells.Donor1.MeDIP-Seq.bedGraph
chr20.Breast_Luminal_Epithelial_Cells.Donor1.MRE-Seq.bed
chr20.Breast_Luminal_Epithelial_Cells.Donor2.MeDIP-Seq.bedGraph
chr20.Breast_Luminal_Epithelial_Cells.Donor2.MRE-Seq.bed
chr20.Fetal_Brain.Donor7.MeDIP-Seq.bedGraph
chr20.Fetal_Brain.Donor7.MRE-Seq.bed

Generate Results Using Example Data

  1. In the iMet download package, run the file run_iMet_example.sh from the command line (you may first need to make the file executable).
    $ chmod +x run_iMet_Example.sh #makes file executable
    $ ./run_iMet_Example.sh
    
    The script will run the iMet pipeline on MeDIP-Seq and MRE-Seq data from chromosome 20 for 3 different samples. See below for details on each tool.
  2. For each sample, the pipeline will produce 4 output files in the Example_Data directory with the following suffixes: .MeDIP-Seq.cpg, .MRE-Seq.cpg, .raw.IM, .filtered.IM
    1. .MeDIP-Seq.cpg : a four-column bedgraph file with MeDIP-Seq read counts at CpGs only
          chr20   60425   60427   4
          chr20   60431   60433   4
          chr20   60550   60552   4
          chr20   60577   60579   3
      
    2. .MRE-Seq.cpg : a four-column bedgraph file with MRE-Seq read counts at CpGs only
          chr20   60425   60427   1
          chr20   60431   60433   1
          chr20   64322   64324   1
          chr20   64376   64378   16
          chr20   64380   64382   16
      
      
    3. .raw.IM : the complete output of IM regions before post-processing filters (columns: chromosome, start, stop, IM region score, region length, position of each CpG)
          chr20   62318464    62318656    31.979994   193 62318656,62318643,62318632,62318629,62318621,62318611,62318607,62318605,62318594,62318587,62318556,62318535,62318516,62318481,62318471,62318468,62318464,
          chr20   61745800    61745991    29.289997   192 61745991,61745962,61745942,61745885,61745883,61745856,61745839,61745836,61745822,61745818,61745800,
          chr20   61266856    61267004    27.120003   149 61267004,61266976,61266943,61266938,61266936,61266924,61266912,61266896,61266871,61266856,
          chr20   4836317 4836439 23.380001   123 4836439,4836409,4836366,4836358,4836353,4836346,4836337,4836334,4836324,4836317,
      
    4. .filtered.IM : a four column file with IM regions filtered by a score cutoff of 8 (columns: chromosome, start, stop, IM region score). Score cutoff was determined by comparison to randomized data.
          chr20   62318464    62318656    31.979994
          chr20   61745800    61745991    29.289997
          chr20   61266856    61267004    27.120003
          chr20   4836317 4836439 23.380001
      
  3. Example data output as viewed on the UCSC Genome Browser at the 3 imprint control regions found on chromosome 20:

     

     

Requirements for Running iMet Tools

  1. System Requirements

    iMet requires a machine with at least 32G memory

  2. Preliminary Files (any sample normalization should be done prior to running iMet tools)
    1. MeDIP bedGraph files
          #Chromosome Start   Stop    ReadCount
          chr20   60366   60367   1
          chr20   60367   60388   2
          chr20   60388   60404   3
          chr20   60404   60558   4
      
    2. MRE bed files (requires 6 columns with strand information in the 6th column)
          chr20   60423   60467   SOLEXA12_4:6:59:1299:1253   0   -
          chr20   64314   64359   SOLEXA12_4:6:78:387:56  0   +
          chr20   64338   64413   SOLEXA6_80:2:73:1557:1486   0   +
          chr20   64381   64454   SOLEXA11_1:3:62:1196:347    0   +
      
    3. Human chromosome size file with two tab-separated columns
          #Chromsome    Size(bp)
          chr20   63025520
      
    4. Human CpG coordinate file--a bed-style formatted file with the start and stop coordinates of each human CpG
          chr20   60178   60180
          chr20   60425   60427
          chr20   60431   60433
          chr20   60550   60552
      

Running iMet Tools

  1. Run medipBedGraph2cpg on MeDIP-seq bedGraph file
    1. Purpose:
      Converts whole-genome MeDIP-seq read counts to CpG-only read counts (this reduces the size of the file so it can be used as input in the iMet program)
    2. Usage
           ./medipBedGraph2cpg <chromosome size file> <CpG coordinates file> <input MeDIP bed file> <output file of data at CpG sites>
      
    3. Notes:
      Remember to compile before running: cc medipBedGraph2cpg.c -o medipBedGraph2cpg
    4. Example
          $ ./medipBedGraph2cpg chr20.chromSize.txt chr20.CpG_sites.bed chr20.Breast_Luminal_Epithelial_Cells.Donor1.MeDIP-Seq.bedGraph chr20.Breast_Luminal_Epithelial_Cells.Donor1.MeDIP-Seq.cpg
      
  2. Run mreBed2cpg on MRE bed file
    1. Purpose:
      Converts whole-genome MRE-seq read counts to CpG-only read counts for iMet input
    2. Usage
          ./mreBed2cpg <chromosome size file> <CpG coordinates file> <input filtered MRE bed file> <output file of data at CpG sites>
      
    3. Notes
      To compile: cc mreBed2cpg.c -o mreBed2cpg
    4. Example
          $ ./mreBed2cpg chr20.chromSize.txt chr20.CpG_sites.bed chr20.Breast_Luminal_Epithelial_Cells.Donor1.MRE-Seq.bed chr20.Breast_Luminal_Epithelial_Cells.Donor1.MRE-Seq.cpg
      
  3. Run iMet on output files from steps 1 and 2
    1. Purpose
      Compares MeDIP-Seq and MRE-Seq data at individual CpGs to define regions of intmermediate methylation
    2. Usage
          ./iMet <chromosome size file> <CpG coordinates file> <MeDIP data generated by medipBedGraph2cpg> <MRE data generated by mreBed2cpg> <output file of putative intermediately methylated regions>
      
    3. Notes
      To compile (must use -lm flag): cc iMet.c -o iMet -lm
      parameters can be adjusted by directly editing the code-- for instance, the minimum region length, which is 50bp by default
    4. Example
          $ ./iMet chr20.chromSize.txt chr20.CpG_sites.bed chr20.Breast_Luminal_Epithelial_Cells.Donor1.MeDIP-Seq.cpg chr20.Breast_Luminal_Epithelial_Cells.Donor1.MRE-Seq.cpg chr20.Breast_Luminal_Epithelial_Cells.Donor1.raw.IM