This is the online EpiCompare. A comand-line version of EpiCompare is availabe at https://github.com/hcharles14/EpiCompare .



Skip this step if you don't want to use your own data. The files must have only three columns (chromosome, start, end) specifying the location of the feature.
Uploading and processing files


    


                        
     Keep selected foreground samples in step 3 unchanged and select background samples. If background samples are not specified, the tool will identify shared enhancers for foreground samples.

                        
If both features and samples are not chosen, the default values for them will be used, which are the feature selected in step 1 and foreground samples selected in step 3.
    

                                    





Blood

Brain

Breast

Fat

GI_Colon

GI_Duodenum

GI_Esophagus

GI_Intestine

GI_Rectum

GI_Stomach

Heart

Liver

Lung

Muscle

Ovary

Pancreas

Spleen

Thymus

Vascular

Adrenal

Blood

Brain

GI_Intestine

GI_Stomach

Heart

Kidney

Lung

Muscle

Placenta

Thymus

Blood

Bone

Brain

Breast

ESC

ESC_Derived

Fat

IPSC

Lung

Muscle

Skin

Stromal_Connective

Vascular

Blood

Cervix

Liver

Lung




Blood

Brain

Fat

GI_Colon

GI_Duodenum

GI_Esophagus

GI_Intestine

GI_Rectum

GI_Stomach

Heart

Liver

Lung

Muscle

Ovary

Pancreas

Spleen

Thymus

Vascular

Adrenal

GI_Intestine

GI_Stomach

Muscle

Placenta

Thymus

Blood

Bone

Brain

Breast

ESC

ESC_Derived

IPSC

Lung

Muscle

Skin

Stromal_Connective

Vascular

Blood

Cervix

Liver

Lung


    


                    
     If identifying shared enhancers for foreground samples, don't select background samples. This works for cutoff and clustering method, but not Fisher's exact test method.



                    
If both features and samples are not specified, the default values for them will be used, which are the feature selected in step 1 and foreground samples selected in step 3.
    

                                





Blood

Brain

Fat

GI_Colon

GI_Duodenum

GI_Esophagus

GI_Intestine

GI_Rectum

GI_Stomach

Heart

Liver

Lung

Muscle

Ovary

Pancreas

Spleen

Thymus

Vascular

Adrenal

GI_Intestine

GI_Stomach

Muscle

Placenta

Thymus

Blood

Bone

Brain

Breast

ESC

ESC_Derived

IPSC

Lung

Muscle

Skin

Stromal_Connective

Vascular

Blood

Cervix

Liver

Lung


    


                        
     If identifying shared enhancers for foreground samples, don't select background samples. This works for cutoff and clustering method, but not Fisher's exact test method.

                        
If both features and samples are not specified, the default values for them will be used, which are the feature selected in step 1 and foreground samples selected in step 3.
    

                                    





Blood

Brain

Breast

Fat

GI_Colon

GI_Duodenum

GI_Esophagus

GI_Intestine

GI_Rectum

GI_Stomach

Heart

Liver

Lung

Muscle

Ovary

Pancreas

Spleen

Thymus

Vascular

Adrenal

Blood

Brain

GI_Intestine

GI_Stomach

Heart

Kidney

Lung

Muscle

Placenta

Thymus

Blood

Bone

Brain

Breast

ESC

ESC_Derived

Fat

IPSC

Lung

Muscle

Skin

Stromal_Connective

Vascular

Blood

Cervix

Liver

Lung


    


                        
     If identifying shared enhancers for foreground samples, don't select background samples. This works for cutoff and clustering method, but not Fisher's exact test method.

                        
If both features and samples are not specified, the default values for them will be used, which are the feature selected in step 1 and foreground samples selected in step 3.
    

                                    





Blood

Brain

Breast

Fat

GI_Colon

GI_Duodenum

GI_Esophagus

GI_Intestine

GI_Rectum

GI_Stomach

Heart

Liver

Lung

Muscle

Ovary

Pancreas

Spleen

Thymus

Vascular

Adrenal

Blood

Brain

GI_Intestine

GI_Stomach

Heart

Kidney

Lung

Muscle

Placenta

Thymus

Blood

Bone

Brain

Breast

ESC

ESC_Derived

Fat

IPSC

Lung

Muscle

Skin

Stromal_Connective

Vascular

Blood

Cervix

Liver

Lung


    


                    
     If identifying shared enhancers for foreground samples, don't select background samples. This works for cutoff and clustering method, but not Fisher's exact test method.

                    
If both features and samples are not specified, the default values for them will be used, which are the feature selected in step 1 and foreground samples selected in step 3.
    

                                





Blood

Brain

Breast

Fat

GI_Colon

GI_Duodenum

GI_Esophagus

GI_Intestine

GI_Rectum

GI_Stomach

Heart

Liver

Lung

Muscle

Ovary

Pancreas

Spleen

Thymus

Vascular

Adrenal

Blood

Brain

GI_Intestine

GI_Stomach

Heart

Kidney

Lung

Muscle

Placenta

Thymus

Blood

Bone

Brain

Breast

ESC

ESC_Derived

Fat

IPSC

Lung

Muscle

Skin

Stromal_Connective

Vascular

Blood

Cervix

Liver

Lung

Fisher's exact test method is not applicable when background samples are not specified. K-means clustering method is not applicable for analyzing uploaded data.
For a selected region, no less than 80% (default) of foreground samples have the feature in this region.
For a selected region, no more than 20% (default) of background samples have the feature in this region.
For a selected region, its q value from the test is less than 0.01 (default).
For a selected cluster, the median of feature densities of foreground samples in this cluster are no less than 0.4 (default).
For a selected cluster, the median of feature densities of foreground samples in this cluster are no less than 100% (default) percentile of the feature densities of background samples.

Please select foreground and background samples






Results:


             
            
download unmerged table data



Validation:

Enrichment for H3K27ac peaks

The figure displays the enrichment fold of identified regions for H3K27ac peaks in different tissues.

download figure data


See detailed sample description


ID Sample description Type
E001 ES-I3 Cells ESC_PrimaryCulture
E002 ES-WA7 Cells ESC_PrimaryCulture
E003 H1 Cells ESC_PrimaryCulture
E004 H1 BMP4 Derived Mesendoderm Cultured Cells ESC_DERIVED_PrimaryCulture
E005 H1 BMP4 Derived Trophoblast Cultured Cells ESC_DERIVED_PrimaryCulture
E006 H1 Derived Mesenchymal Stem Cells ESC_DERIVED_PrimaryCulture
E007 H1 Derived Neuronal Progenitor Cultured Cells ESC_DERIVED_PrimaryCulture
E008 H9 Cells ESC_PrimaryCulture
E009 H9 Derived Neuronal Progenitor Cultured Cells ESC_DERIVED_PrimaryCulture
E010 H9 Derived Neuron Cultured Cells ESC_DERIVED_PrimaryCulture
E011 hESC Derived CD184+ Endoderm Cultured Cells ESC_DERIVED_PrimaryCulture
E012 hESC Derived CD56+ Ectoderm Cultured Cells ESC_DERIVED_PrimaryCulture
E013 hESC Derived CD56+ Mesoderm Cultured Cells ESC_DERIVED_PrimaryCulture
E014 HUES48 Cells ESC_PrimaryCulture
E015 HUES6 Cells ESC_PrimaryCulture
E016 HUES64 Cells ESC_PrimaryCulture
E017 IMR90 fetal lung fibroblasts Cell Line LUNG_CellLine
E018 iPS-15b Cells IPSC_PrimaryCulture
E019 iPS-18 Cells IPSC_PrimaryCulture
E020 iPS-20b Cells IPSC_PrimaryCulture
E021 iPS DF 6.9 Cells IPSC_PrimaryCulture
E022 iPS DF 19.11 Cells IPSC_PrimaryCulture
E023 Mesenchymal Stem Cell Derived Adipocyte Cultured Cells FAT_PrimaryCulture
E024 ES-UCSF4 Cells ESC_PrimaryCulture
E025 Adipose Derived Mesenchymal Stem Cell Cultured Cells FAT_PrimaryCulture
E026 Bone Marrow Derived Cultured Mesenchymal Stem Cells STROMAL_CONNECTIVE_PrimaryCulture
E027 Breast Myoepithelial Primary Cells BREAST_Adult
E028 Breast variant Human Mammary Epithelial Cells (vHMEC) BREAST_PrimaryCulture
E029 Primary monocytes from peripheral blood BLOOD_Adult
E030 Primary neutrophils from peripheral blood BLOOD_Adult
E031 Primary B cells from cord blood Blood_Fetal
E032 Primary B cells from peripheral blood BLOOD_Adult
E033 Primary T cells from cord blood Blood_Fetal
E034 Primary T cells from peripheral blood BLOOD_Adult
E035 Primary hematopoietic stem cells BLOOD_Adult
E036 Primary hematopoietic stem cells short term culture BLOOD_Adult
E037 Primary T helper memory cells from peripheral blood 2 BLOOD_Adult
E038 Primary T helper naive cells from peripheral blood BLOOD_Adult
E039 Primary T helper naive cells from peripheral blood BLOOD_Adult
E040 Primary T helper memory cells from peripheral blood 1 BLOOD_Adult
E041 Primary T helper cells PMA-I stimulated BLOOD_Adult
E042 Primary T helper 17 cells PMA-I stimulated BLOOD_Adult
E043 Primary T helper cells from peripheral blood BLOOD_Adult
E044 Primary T regulatory cells from peripheral blood BLOOD_Adult
E045 Primary T cells effector/memory enriched from peripheral blood BLOOD_Adult
E046 Primary Natural Killer cells from peripheral blood BLOOD_Adult
E047 Primary T CD8+ naive cells from peripheral blood BLOOD_Adult
E048 Primary T CD8+ memory cells from peripheral blood BLOOD_Adult
E049 Mesenchymal Stem Cell Derived Chondrocyte Cultured Cells STROMAL_CONNECTIVE_PrimaryCulture
E050 Primary hematopoietic stem cells G-CSF-mobilized Female BLOOD_Adult
E051 Primary hematopoietic stem cells G-CSF-mobilized Male BLOOD_Adult
E052 Muscle Satellite Cultured Cells MUSCLE_PrimaryCulture
E053 Cortex derived primary cultured neurospheres BRAIN_PrimaryCulture
E054 Ganglion Eminence derived primary cultured neurospheres BRAIN_PrimaryCulture
E055 Foreskin Fibroblast Primary Cells skin01 SKIN_PrimaryCulture
E056 Foreskin Fibroblast Primary Cells skin02 SKIN_PrimaryCulture
E057 Foreskin Keratinocyte Primary Cells skin02 SKIN_PrimaryCulture
E058 Foreskin Keratinocyte Primary Cells skin03 SKIN_PrimaryCulture
E059 Foreskin Melanocyte Primary Cells skin01 SKIN_PrimaryCulture
E061 Foreskin Melanocyte Primary Cells skin03 SKIN_PrimaryCulture
E062 Primary mononuclear cells from peripheral blood BLOOD_Adult
E063 Adipose Nuclei FAT_Adult
E065 Aorta VASCULAR_Adult
E066 Liver LIVER_Adult
E067 Brain Angular Gyrus BRAIN_Adult
E068 Brain Anterior Caudate BRAIN_Adult
E069 Brain Cingulate Gyrus BRAIN_Adult
E070 Brain Germinal Matrix BRAIN_Fetal
E071 Brain Hippocampus Middle BRAIN_Adult
E072 Brain Inferior Temporal Lobe BRAIN_Adult
E073 Brain_Dorsolateral_Prefrontal_Cortex BRAIN_Adult
E074 Brain Substantia Nigra BRAIN_Adult
E075 Colonic Mucosa GI_COLON_Adult
E076 Colon Smooth Muscle GI_COLON_Adult
E077 Duodenum Mucosa GI_DUODENUM_Adult
E078 Duodenum Smooth Muscle GI_DUODENUM_Adult
E079 Esophagus GI_ESOPHAGUS_Adult
E080 Fetal Adrenal Gland ADRENAL_Fetal
E081 Fetal Brain Male BRAIN_Fetal
E082 Fetal Brain Female BRAIN_Fetal
E083 Fetal Heart HEART_Fetal
E084 Fetal Intestine Large GI_INTESTINE_Fetal
E085 Fetal Intestine Small GI_INTESTINE_Fetal
E086 Fetal Kidney KIDNEY_Fetal
E087 Pancreatic Islets PANCREAS_Adult
E088 Fetal Lung LUNG_Fetal
E089 Fetal Muscle Trunk MUSCLE_Fetal
E090 Fetal Muscle Leg MUSCLE_Fetal
E091 Placenta PLACENTA_Fetal
E092 Fetal Stomach GI_STOMACH_Fetal
E093 Fetal Thymus THYMUS_Fetal
E094 Gastric GI_STOMACH_Adult
E095 Left Ventricle HEART_Adult
E096 Lung LUNG_Adult
E097 Ovary OVARY_Adult
E098 Pancreas PANCREAS_Adult
E099 Placenta Amnion PLACENTA_Fetal
E100 Psoas Muscle MUSCLE_Adult
E101 Rectal Mucosa Donor 29 GI_RECTUM_Adult
E102 Rectal Mucosa Donor 31 GI_RECTUM_Adult
E103 Rectal Smooth Muscle GI_RECTUM_Adult
E104 Right Atrium HEART_Adult
E105 Right Ventricle HEART_Adult
E106 Sigmoid Colon GI_COLON_Adult
E107 Skeletal Muscle Male MUSCLE_Adult
E108 Skeletal Muscle Female MUSCLE_Adult
E109 Small Intestine GI_INTESTINE_Adult
E110 Stomach Mucosa GI_STOMACH_Adult
E111 Stomach Smooth Muscle GI_STOMACH_Adult
E112 Thymus THYMUS_Adult
E113 Spleen SPLEEN_Adult
E114 A549 EtOH 0.02pct Lung Carcinoma Cell Line LUNG_CellLine
E115 Dnd41 TCell Leukemia Cell Line BLOOD_CellLine
E116 GM12878 Lymphoblastoid Cells BLOOD_PrimaryCulture
E117 HeLa-S3 Cervical Carcinoma Cell Line CERVIX_CellLine
E118 HepG2 Hepatocellular Carcinoma Cell Line LIVER_CellLine
E119 HMEC Mammary Epithelial Primary Cells BREAST_PrimaryCulture
E120 HSMM Skeletal Muscle Myoblasts Cells MUSCLE_PrimaryCulture
E121 HSMM cell derived Skeletal Muscle Myotubes Cells MUSCLE_PrimaryCulture
E122 HUVEC Umbilical Vein Endothelial Primary Cells VASCULAR_PrimaryCulture
E123 K562 Leukemia Cells BLOOD_PrimaryCulture
E124 Monocytes-CD14+ RO01746 Primary Cells BLOOD_Adult
E125 NH-A Astrocytes Primary Cells BRAIN_PrimaryCulture
E126 NHDF-Ad Adult Dermal Fibroblast Primary Cells SKIN_PrimaryCulture
E127 NHEK-Epidermal Keratinocyte Primary Cells SKIN_PrimaryCulture
E128 NHLF Lung Fibroblast Primary Cells LUNG_PrimaryCulture
E129 Osteoblast Primary Cells BONE_PrimaryCulture



Tissue enrichment index for H3K27ac

The figure plots the tissue enrichment index CTM distribution based on H3K27ac expression for identified regions in different tissues.

download figure data

EpiCompare: An online tool to define and explore genomic regions with tissue or cell type-specific epigenomic features


The Human Reference Epigenome Map, generated by the Roadmap Epigenomics Consortium, contains thousands of genome-wide epigenomic datasets that describe epigenomes of a variety of different human tissue and cell types. This map has allowed investigators to obtain a much deeper and more comprehensive view of our regulatory genome, for example defining regulatory elements including all promoters and enhancers for a given tissue or cell type. An outstanding task is to combine and compare different epigenomes in order to identify regions with epigenomic features specific to certain type of tissues or cells, for example, lineage-specific regulatory elements. Currently available tools do not directly address this question. This need motivated us to develop EpiCompare that allows investigators to easily identify regions with epigenetic features unique to specific epigenomes that they choose, making detection of common regulatory elements and/or cell type-specific regulatory elements an interactive and dynamic experience. Investigators can design their tests by choosing different combinations of epigenomes, and choosing different classification algorithms provided by our tool. EpiCompare will then identify regions with specified epigenomic features, and provide a quality assessment of the predictions. Investigators can interact with EpiCompare by investigating Roadmap Epigenomics data, or uploading their own data for comparison. Finally, prediction results can be readily visualized and further explored in the WashU Epigenome Browser.


Index




1. Datasets

For each feature - the ChromHMM state or epigenomic modification peak below, it is converted into binary presence or absence of the feature in each 200bp window, denoted by 1 or 0. A table is generated for each feature by summarizing the presence or absence of the feature in all samples across windows where at least one sample has the feature.

1.1 Chromatin states

Chromatin state data for 15-state model and 18-state model for all tissue/cell types (127 samples have chromatin states from 15-state model and 98 samples from 18-state model) are obtained from Roadmap Epigenomics Project (Roadmap Epigenomics, et al., 2015). Enhancers for 15-state model are defined as state number 6, 7, 12 and enhancers for 18-state model are defined as state number 7, 8, 9, 10, 11, 15. Promoters for 15-state model are defined as state number 1, 2, 10 and promoters for 18-state model are defined as state number 1, 2, 3, 4, 14. Chromatin states are defined on each 200bp window by ChromHMM.

1.2 H3K27ac

H3K27ac peak data for 98 tissue/cell types are obtained from Roadmap Epigenomics Project. The peaks are called by MACS2 (Zhang, et al., 2008). H3K27ac peak data are processed on 200bp window by requiring at least 50bp overlapping with 200bp window in the genome. Only peaks with q-value less than 0.01 are kept.

1.3 H3K4me1

H3K4me1 peak data for 127 tissue/cell types are obtained from Roadmap Epigenomics Project. The peaks are called by MACS2. H3K4me1 peak data are processed on 200bp window by requiring at least 50bp overlapping with 200bp window in the genome. Only peaks with q-value less than 0.01 are kept.

1.4 H3K4me3

H3K4me3 peak data for 127 tissue/cell types are obtained from Roadmap Epigenomics Project. The peaks are called by MACS2. H3K4me3 peak data are processed on 200bp window by requiring at least 50bp overlapping with 200bp window in the genome. Only peaks with q-value less than 0.01 are kept.

1.5 H3K27me3

H3K27me3 peak data for 127 tissue/cell types are obtained from Roadmap Epigenomics Project. The peaks are called by MACS2. H3K27me3 peak data are processed on 200bp window by requiring at least 50bp overlapping with 200bp window in the genome. Only peaks with q-value less than 0.01 are kept.

2. Methods


Three methods are used for identifying regions with epigenomic features specific to combinations of tissue or cell types. All methods require the definition of foreground samples and background samples by users. Foreground samples are the group of samples for which we identify specific regions. Background samples are the group of samples against which we compare foreground samples. The principle of all methods is, to define regions with features specific in foreground samples, the features should be enriched in foreground samples but depleted in background samples. Below is visualization of three methods in EpiCompare with a simple example. F represents foreground samples, B represents background samples, R represents regions, C represents clusters.

2.1 Frequency cutoff

For each region (in this case each 200bp genomic window), the percentages of samples having the feature in foreground samples and background samples are calculated. If the percentage of samples having the feature in the foreground samples is greater than or equal to the defined minimal foreground cutoff (default is 80%) and the percentage of samples having the feature in the background samples is less than or equal to the defined maximal background cutoff (default is 20%), then the region is defined as a positive region. These positive regions are further ranked by the difference between the percentage of samples having the feature in foreground and background samples so users can prioritize top-ranked regions.

2.2 Fisher's exact test

For each 200bp window, a contingency table composed of the number of samples with or without the feature in foreground samples and background samples is calculated. Fisher’s exact test is used to examine whether the percentage of features in foreground samples is significantly greater than in background samples. The p-value is corrected by multiple hypothesis testing using the Benjamini-Hochberg procedure, and regions with q-value less than a cutoff (default is 0.01) are identified and ranked by their q-values. The statistical power of the test depends on the number of foreground and background samples and having more samples can provide more statistical power to identify more significant q-values. Therefore, when the number of foreground samples is small, investigators can use q-value as a ranking measure and obtain the top candidates by setting a higher q-value threshold.

2.3 K-means clustering

First, k-means clustering based on a Jaccard-index distance is performed on the binary data table for each feature, similar to the clustering method used in HoneyBadger2 (Roadmap Epigenomics, et al., 2015). R package flexclust is used for clustering (Leisch, 2006). We chose the optimal cluster number by the elbow method and the silhouette method (Kodinariya and Makwana, 2013). The optimal cluster number for all features is close and around 140, so we fixed the optimal cluster number to be 140. Besides providing the clustering result using the optimal cluster number, we also provided clustering results with three other cluster numbers (90, 200, 250) depending on the users’ need. Next, the percentage of regions having the feature is calculated for each cluster and defined as a feature density table (number of clusters times number of samples). Finally, a cluster specific for a tissue/cell type should have higher feature density in that tissue/cell type than in the background samples. Specifically, to identify clusters specific for foreground samples, we select clusters satisfying the following two conditions: first, the median of feature densities of foreground samples in a cluster is greater than or equal to a threshold (default is 0.4); second, it should also be greater than or equal to the highest feature density in the background samples of that same cluster (this threshold can be set to any percentile of feature densities in the background samples).

3. Output


3.1 Data output

Identified regions are displayed on a table whose columns are chr, start, end, and link to WashU Epigenome Browser so users can visualize and compare the regions in different tissue/cell types. For Fisher’s exact test and frequency cutoff method, a summary is provided about the distribution of ranks for identified regions. Links are provided to download unmerged 200bp regions or merged regions which merge neighboring 200bp regions. A link to GREAT analysis for merged regions is also provided.

3.2 Validation output

3.2.1 Enrichment for H3K27ac peaks

Enrichment for H3K27ac peaks in foreground samples and background samples are calculated for identified regions. Enrichment is defined as below: enrichment=((#bp in overlapped regions)⁄(#bp in H3K27ac sites ))/(#bp (identified regions)⁄(#bp in hg19 genome)) . When the number of foreground samples or background samples is bigger than 10, 10 random samples from foreground samples or background samples are chosen to calculate the enrichment.

3.2.2 Tissue enrichment index on H3K27ac

a tissue enrichment index for identified regions is calculated using H3K27ac RPKM (Reads Per Kilobase of transcript per Million mapped reads). Identified regions are filtered using combined H3K27ac peaks from 98 samples and the tissue enrichment index is calculated for filtered regions. Tissue enrichment index has been routinely used to identify tissue-specific genes (Chang, et al., 2011; Yanai, et al., 2005). Generally, a high tissue enrichment index represents tissue-specific regions. The tissue enrichment index we use is a contribution measure (CTM) (Pan, et al., 2013).


4. How to use the tool


The tool can be run in Chrome, Internet Explorer, and Firefox web browser. It can be run in Safari if used in a window without other web pages. Opening other pages in the same window in Safari with the tool will make the tool disconnected from the server.

Because the free shiny server can only allow one user at one time, to allow multiple users to use the tool simultaneously, we created 3 copies of the tool named EpiCompare1, EpiCompare2 and EpiCompare3 and used a load balancer named EpiCompare (epigenome.wustl.edu/EpiCompare/) to assign users to the 3 copies. This allows 3 users to use the tool at one time.

Below lists each step of using the tool and specific requirements for each step.

4.1 Select a feature

Select the feature for which you plan to identify the specificity. It can be enhancers or promoters defined from 15 state or 18 state ChromHMM model, histone mark H3K27ac, H3K4me1, H3K4me1, or H3K4me3. Only one feature can be chosen. For example, use the default enhancer state from 18-state ChromHMM model here.

4.2 Upload data

Upload your own data for comparison analysis besides using default Roadmap data. Skip this step if you don't want to use your own data. You can upload one file or multiple files. Select one to upload one file and select multiple to upload multiple files together. The files must have only three columns (chromosome, start, end) specifying the location of the feature. The coordinates can be merged or not. The tool will map the coordinates to 200bp window by requiring at least 1bp overlapping. After uploading files, the name of each file will be listed on top of Roadmap samples for selection. The name of the file must not have space. Only frequency cutoff and Fisher's exact test can be used to analyze uploaded data while k-means clustering method cannot be used.

1.Here is a test file . Download the file to your local drive.


2.Upload this file locally. The file will be uploaded and processed.


3.After processing, the name of uploaded file will be listed as user-defined samples.

4.3 Select foreground samples

Select the group of samples for which you identify specific regions. They can be chosen from Roadmap samples or uploaded samples. Click finish selection button after finishing selection. Selected sample IDs for foreground samples will be listed. For example, select 5 adult brain samples here.

4.4 Select background samples

Keep selected foreground samples in step 3 unchanged and select background samples, which are the group of samples against which we compare foreground samples. The selected foreground samples in step 3 must not be unselected because the tool subtracts the selection in step 3 from all selections in step 4 to obtain selected background samples . Click finish selection button after finishing selection. Selected sample IDs for background samples will be listed. If background samples are not specified, the tool will identify shared enhancers for foreground samples. Only frequency cutoff and k-means clustering method can be used without background samples, while Fisher's exact test cannot. For example, select all adult blood samples here.

4.5 Select features and samples for visualization

Select features and samples for visualization in WashU Epigenome Browser. Click finish selection button after finishing selection. Selected sample IDs will be listed. If both features and samples are not chosen, the default values for them will be used, which are the feature selected in step 1 and foreground samples selected in step 3. For example, select ChromHMM18 and H3K27ac as features and 2 brain samples from foreground samples and 2 blood samples from background samples as samples here.

4.6 Select a method

Select frequency cutoff, Fisher's exact test or k-means clustering method and the parameters for each method.

1) Frequency cutoff method: a region is defined as a positive region if the percentage of samples having the feature in the foreground samples is greater than or equal to a cutoff (default is 80%, called foreground cutoff) and the percentage of samples having the feature in the background samples is less than or equal to a cutoff (default is 20%, called background cutoff).

2) Fisher's exact test method: A region is defined as a positive region if the q-value from Fisher's exact test is less than a cutoff (default is 0.01, called q-value cutoff). When the number of foreground samples is small, Fisher’s exact test method cannot identify any regions with q-value threshold less than 0.01. In this case, investigators can use q-value as a ranking measure and obtain the top candidates by setting a high q-value cutoff.

3) K-means clustering method: Clusters specific for foreground samples should satisfy the following two conditions. First, the median of feature densities of foreground samples in this cluster is greater than or equal to a cutoff (default is 0.4, called foreground density cutoff); second, the median of feature densities of foreground samples is also greater than or equal to the percentile cutoff of feature densities of the background samples (default is 100%, which is the maximal feature density of the background samples, called percentile cutoff). The feature density in one cluster is the proportion of regions for one sample having the feature in the cluster. A value of 50% means half of the regions have the feature in the cluster.

Fisher's exact test method is not applicable when background samples are not specified. K-means clustering method is not applicable for analyzing uploaded data.

For example, use the default frequency cutoff method here.


4.7 Submit

Click submit button and start analysis. The result will be available in about 3 minutes.

This is the table of identified regions.


This is the validation results: enrichment for H3K27ac peaks and tissue enrichment index on H3K27ac.


This is the visualziation in Washu Epigenome Browser for an identified region.



5. Reference

Chang, C.W., et al. Identification of human housekeeping genes and tissue-selective genes by microarray meta-analysis. PLoS One 2011;6(7):e22859.

Leisch. A Toolbox for K-Centroids Cluster Analysis. Computational Statistics and Data Analysis, 51 (2), 526-544, 2006.

Pan, J.B., et al. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function. PLoS One 2013;8(12):e80747.

Roadmap Epigenomics, C., et al. Integrative analysis of 111 reference human epigenomes. Nature 2015;518(7539):317-330.

Zhang, Y., et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 2008;9(9):R137.p>

Yanai, I., et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 2005;21(5):650-659.

Any questions are welcome. Please contact yu.he (at) wustl.edu .

A paper about EpiCompare is pubished in Bioinformatics journal. Please cite the tool as:

Yu He, Ting Wang; EpiCompare: An online tool to define and explore genomic regions with tissue or cell type-specific epigenomic features. Bioinformatics 2017 btx371. doi: 10.1093/bioinformatics/btx371