CRF-LIS: Differentially Methylated Region Detections using Conditional Random Fields

Intro:

CRF-LIS was developed to detect differentially methylated regions (DMR) by jointly analyzing methylated DNA immunoprecipitation (MeDIP-seq) and methylation-sensitive restriction enzyme (MRE-seq) sequencing data at a fraction (<20%) of the cost of whole-genome bisulfite sequencing (WGBS) method. The CRF-LIS method consists four steps:

  1. Data Pro-prpcissing:
    • Normalize and smooth raw MeDIP-seq and MRE-seq counts and calculate corresponding data-driven features at each CpG site.
    • Derive genomic features associated with each CpG sites.
  2. Train the CRF model with a subset of labels (differentially methylated CpGs, DMC v.s. non-DMC) derived from WGBS data.
  3. Calculate local index of significance (LIS) of each CpG site and identify DMCs by controlling FDR under nominal level.
  4. Merge nearby DMCs into DMRs and report the results.

We have implemented the CRF-LIS method via a set of R functions available at the LISdmr R package. We provided a real dataset example within the R package.

Installation:

Before installing methylMnM package, the user have to install another five required R packages: GenomicRanges, IRanges, infotheo, DSS, devtools.

  1. Load the devtools library.
    library(devtools)
  2. Install the LISdmr package from github.
    install_github(“nanlin999/LISdmr”)
  3. Load the LISdmr package and the CRF-LIS method is ready to use.
    library(LISdmr)

Example Usage:

We provided a real dataset example to detect DMR between Brain cell and H1ES cell based on the first 1000 CpGs at Chromesome 18, which is available at https://github.com/nanlin999/LISdmr.

Simulations:

The code for implementing the simulation studies are available at https://github.com/xiaoyudai/CRF-LIS-simu.