ENCODE Project at UCSC

- -

Genome Browser

Cell Types

Antibodies

Release Log

Downloads

Contributors

Publications

Data Policy

Pilot Project

Jobs

About the ENCODE Data Coordination Center (DCC)

The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of the consortium is to build a comprehensive parts list of the functional elements of the human genome, including elements that act at the protein level (coding genes) and RNA level (non-coding genes), and regulatory elements that control the cells and circumstances in which a gene is active. The discovery and annotation of gene elements is accomplished primarily by sequencing RNA from a diverse range of sources, comparative genomics, integrative bioinformatic methods, and human curation. Regulatory elements are typically investigated through DNA hypersensitivity assays, assays of DNA methylation, and chromatin immunoprecipitation (ChIP) of proteins that interact with DNA, including modified histones and transcription factors, followed by sequencing (ChIP-Seq). The results of ENCODE experiments, collected in the ENCODE DCC database, are displayed on the UCSC Genome Browser. The data can also be downloaded from the ENCODE DCC website in text format.

ENCODE data is now available for the entire human genome. To access ENCODE data, open the Genome Browser, select the March 2006 assembly of the human genome, and go to your region of interest. ENCODE tracks will be marked with the NHGRI logo . The bulk of the ENCODE data can be found in the Expression and Regulation track groups, with a few in the Mapping, Genes, and Variation groups. Although most participating research groups have provided several tracks, generally only selected data from each research group are displayed by default. Click the hyperlinked name of a particular track to display a page containing configuration options and details about the methods used to generate the data. See the Genome Browser User's Guide for further information about displaying tracks and navigating in the Genome Browser. To receive notifications of ENCODE data releases and related news by email, subscribe to the encode-announce mailing list.

Data from the earlier ENCODE project pilot phase, which covered approximately 1% of the genome, are available on the May 2004 and March 2006 human assemblies. The ENCODE Pilot Project web pages provide convenient browser access to these regions.

Before publishing research that uses ENCODE data, please read the data release policy, which places some restrictions on publication use of data for nine months following the data release.

News

18 March 2010 - February and March 2010 ENCODE news

Release 2 of the UW Affy Exon track: This track displays human tissue microarray data using the Affymetrix Human Exon 1.0 GeneChip. This release includes 28 new cell types, and replaces the data for four existing tables (replicate 1 for K562, NB4, and SKMC; replicate 2 for HeLa-S3).

Initial release of the UW Histone track: This track displays maps of histone modifications genome-wide in different cell lines, using ChIP-seq high-throughput sequencing.

Release 2 of the HudsonAlpha Methyl-seq track: Release 2 adds data for five new cell types.

Release 3 of the Gencode Genes track: shows high-quality manual annotations in the ENCODE regions generated by the GENCODE project. Version 3 of the Gencode gene set presents a full merge between HAVANA and ENSEMBL, giving priority to the manually curated Havana objects and using ENSEMBL objects where they are different or fall into un-annotated regions.

Initial release of the CSHL Small RNA-seq track: This track depicts NextGen sequencing information for RNAs between the sizes of 20-200 nt isolated from RNA samples from tissues of sub cellular compartments from ENCODE cell lines.

Release 3 of the UW DNaseI HS track: This track shows DNaseI sensitivity measured genome-wide in different using the Digital DNaseI methodology, and DNaseI hypersensitive sites. This release includes 19 new cell lines as well as new version of NB4 replicate 1.

6 January 2010 - December 2009 ENCODE news

"ENCODE whole-genome data in the UCSC Genome Browser": This paper addresses the history of the ENCODE project, summarizes the datasets available as of September 2009, and outlines methods to access the data. See Nucleic Acids Res. 2010 Jan;38(Database issue):D620-5.

Initial release of the Caltech RNA-seq track: This track contains sequence reads and RPKM transcript abundance measures for sequences that map to either the genome or to known RNA splice sites. The results of four different mapping algorithms are provided, enabling comparison between different mapping algorithms. Results are available for polyA+ and total RNA for the two ENCODE Tier 1 cell lines.

Release 2 of the Broad Histone track: This track displays maps of chromatin state generated using CHIP-seq. Release 2 adds data for the ENCODE Tier 2 cell lines H1-hESC and HepG2, plus NHLF (normal human lung fibroblasts) and HMEC (human mammary epithelial) cells. This expands the track data to 9 cell lines, and 11 antibodies plus an input control.

Release 2 of the CSHL Long RNA-seq track: This track depicts sequencing of long RNAs of more than 200 nucleotides in length. Release 2 adds data from strand-specific assays of total RNA for the two ENCODE Tier 1 cell lines.

Release 2 of the ENCODE Open Chromatin track: This track displays evidence of open chromatin as identified by two complementary methods, DNaseI hypersensitivity and FAIRE, combined with ChIP identification methods. Release 2 adds data from eight additional cell types, expanding the track to 41 experiments in 13 cell lines.

7 November 2009 - October ENCODE News

Sep 2009 data freeze complete: The ENCODE Consortium has just completed data submissions for the fourth production data freeze (Sep 09). The first set of data from this freeze to complete quality review is now available on the UCSC public server, in Release 2 of the ENCODE Transcription Factor Binding Sites from Yale/UC-Davis/Harvard track. Release 2 adds 59 ChIP-seq experiments to this track.

Other October track releases: The Affymetrix/CSHL Subcellular RNA Localization by Tiling Array track was expanded to include 4 additional experiments.

encodeproject.org: By request of the ENCODE Consortium, the domain encodeproject.org has been registered by the ENCODE Data Coordination Center, and is redirected to the ENCODE portal at UCSC.

New grants funded: NHGRI has funded 5 new ENCODE grants, as part of the American Investment and Recovery Act. The new grants include expansion of ENCODE to the mouse genome and proteogenomics.

Job openings at UCSC: The UCSC Genome Browser and ENCODE projects are currently accepting applications for Software Developer and Biological Database Testing/User Support Technician positions. We are looking for talented individuals who would like to use their skills in computer science, biology, and bioinformatics on fast-paced projects featuring the work of top genomics scientists worldwide.

24 September 2009 - ENCODE data releases since July 1

During this period a total of 10 new ENCODE tracks were released to the UCSC public server. Functional elements and region characterization in these tracks include:

DNaseI hypersensitive sites and hotspots (University of Washington)
Regions of DNA methylation, by Methyl-seq and Illumina 27 array (HudsonAlpha Institute)
Bi-directional promoters and negative regulatory elements (NHGRI Elnitski lab)
Transcriptome profiling by RNA-seq, including single-molecule sequencing (CSHL, Helicos)
Expression levels by exon array (University of Washington)
Regions of copy number variation (HudsonAlpha Institute)
Mappability/uniqueness of nmers (University of Massachusetts, Duke, Broad, Rosetta)

For track names and file access, see the Release Log and Downloads links listed in the left menu bar.

We would like to thank the contributing ENCODE labs and the the DCC team at UCSC for their efforts completing these tracks.

1 July 2009 - ENCODE data releases for the period April - June 2009

During this period, a total of seven new ENCODE tracks were released to the UCSC public server. These tracks include high-quality gene annotations, maps of transcription factor binding, histone modifications, and open chromatin, RNA subcellular localization, and RNA/protein binding sites. Read more.

Conditions of Use

The sequence and annotation data displayed in the Genome Browser are freely available for academic, nonprofit, and personal use with the following conditions:

ENCODE data is covered by the ENCODE Consortium Data Release Policy.
The general Conditions of Use for the UCSC Genome Browser apply.