CRF-LIS: Differentially Methylated Region Detections using Conditional Random Fields
Intro:
CRF-LIS was developed to detect differentially methylated regions (DMR) by jointly analyzing methylated DNA immunoprecipitation (MeDIP-seq) and methylation-sensitive restriction enzyme (MRE-seq) sequencing data at a fraction (<20%) of the cost of whole-genome bisulfite sequencing (WGBS) method. The CRF-LIS method consists four steps:
- Data Pro-prpcissing:
- Normalize and smooth raw MeDIP-seq and MRE-seq counts and calculate corresponding data-driven features at each CpG site.
- Derive genomic features associated with each CpG sites.
- Train the CRF model with a subset of labels (differentially methylated CpGs, DMC v.s. non-DMC) derived from WGBS data.
- Calculate local index of significance (LIS) of each CpG site and identify DMCs by controlling FDR under nominal level.
- Merge nearby DMCs into DMRs and report the results.
We have implemented the CRF-LIS method via a set of R functions available at the LISdmr R package. We provided a real dataset example within the R package.
Installation:
Before installing methylMnM package, the user have to install another five required R packages: GenomicRanges, IRanges, infotheo, DSS, devtools.
- Load the devtools library.
library(devtools)
- Install the LISdmr package from github.
install_github(“nanlin999/LISdmr”)
- Load the LISdmr package and the CRF-LIS method is ready to use.
library(LISdmr)
Example Usage:
We provided a real dataset example to detect DMR between Brain cell and H1ES cell based on the first 1000 CpGs at Chromesome 18, which is available at https://github.com/nanlin999/LISdmr.
Simulations:
The code for implementing the simulation studies are available at https://github.com/xiaoyudai/CRF-LIS-simu.