Data Coordination Center (DCC)
The Encyclopedia of
DNA Elements (ENCODE) Consortium is an international collaboration
of research groups funded by the National Human Genome Research
The goal of the consortium is to build a comprehensive parts list of
the functional elements of the human genome, including elements that
act at the protein level (coding genes) and RNA level (non-coding
genes), and regulatory elements that control the cells and
circumstances in which a gene is active. The discovery and annotation
of gene elements is accomplished primarily by sequencing RNA from a
diverse range of sources, comparative genomics, integrative
bioinformatic methods, and human curation. Regulatory elements are
typically investigated through DNA hypersensitivity assays, assays of
DNA methylation, and chromatin immunoprecipitation (ChIP) of proteins
that interact with DNA, including modified histones and transcription
factors, followed by sequencing (ChIP-Seq). The results of ENCODE
experiments, collected in the ENCODE DCC database, are displayed on the
UCSC Genome Browser. The data can also be downloaded from the ENCODE
DCC website in text format.
ENCODE data is now available for the
entire human genome. To access ENCODE data, open the Genome
Browser, select the
March 2006 assembly of the human genome, and go to your region of
interest. ENCODE tracks will be marked with the NHGRI logo . The bulk of
the ENCODE data can be found in the Expression and Regulation
track groups, with a few in the Mapping, Genes, and
Variation groups. Although most
participating research groups have provided several tracks, generally
only selected data from each research group are displayed by default.
Click the hyperlinked name of a particular track to display a page
containing configuration options and details about the methods used to
generate the data. See the Genome Browser User's Guide for
further information about displaying tracks and navigating in the
Genome Browser. To receive notifications of ENCODE data releases and
related news by email, subscribe to the encode-announce mailing list.
Data from the earlier ENCODE project
pilot phase, which covered approximately 1% of the genome, are
available on the May 2004 and March 2006 human
assemblies. The ENCODE Pilot Project web
pages provide convenient browser access to these regions.
Before publishing research that uses
ENCODE data, please read the data release policy,
restrictions on publication use of data for nine
months following the data release.
18 March 2010 -
February and March 2010 ENCODE news
of the UW Affy Exon track:
This track displays
human tissue microarray data using the
Affymetrix Human Exon 1.0 GeneChip. This release includes 28 new cell types, and replaces the data for four existing tables (replicate 1 for K562, NB4, and SKMC; replicate 2 for HeLa-S3).
of the UW Histone track:
This track displays
maps of histone modifications genome-wide in different cell lines, using ChIP-seq high-throughput sequencing.
Release 2 of the HudsonAlpha
Release 2 adds data for five new cell
Release 3 of the Gencode
Genes track: shows high-quality
manual annotations in the ENCODE regions generated by the
Version 3 of the
Gencode gene set presents a full merge between HAVANA and ENSEMBL,
giving priority to the manually curated Havana objects and using
ENSEMBL objects where they are different or fall into un-annotated
Initial release of the CSHL
RNA-seq track: This track depicts
NextGen sequencing information for RNAs between the sizes of 20-200 nt
isolated from RNA samples from tissues of sub cellular compartments
from ENCODE cell lines.
Release 3 of the UW
DNaseI HS track:
This track shows
DNaseI sensitivity measured genome-wide in different using the Digital DNaseI
methodology, and DNaseI hypersensitive sites. This
release includes 19 new cell lines as
well as new version of NB4 replicate 1.
6 January 2010 - December 2009 ENCODE news
"ENCODE whole-genome data in the UCSC
Genome Browser": This paper addresses the history of the
ENCODE project, summarizes the datasets available as of September 2009,
and outlines methods to access the data. See
Nucleic Acids Res. 2010 Jan;38(Database issue):D620-5.
release of the Caltech
RNA-seq track: This track contains sequence reads and RPKM
transcript abundance measures for sequences that map to either the
genome or to known RNA splice sites. The results of four
different mapping algorithms are provided, enabling comparison between
different mapping algorithms. Results are available for polyA+
and total RNA for the two ENCODE Tier 1 cell lines.
2 of the Broad
Histone track: This track displays maps of chromatin state
generated using CHIP-seq. Release 2 adds data for the ENCODE Tier
2 cell lines H1-hESC and HepG2, plus NHLF (normal human lung
fibroblasts) and HMEC (human mammary epithelial) cells. This
expands the track data to 9 cell lines, and 11 antibodies plus
an input control.
2 of the CSHL
RNA-seq track: This track depicts sequencing of long
RNAs of more than 200 nucleotides in length. Release 2 adds data
from strand-specific assays of total RNA for the two ENCODE Tier 1 cell
2 of the ENCODE
Chromatin track: This track displays evidence of open
chromatin as identified by two complementary methods, DNaseI
hypersensitivity and FAIRE, combined with ChIP identification
methods. Release 2 adds data from eight additional cell types,
expanding the track to 41 experiments in 13 cell lines.
November 2009 - October ENCODE News
Sep 2009 data freeze complete: The
just completed data submissions for the fourth
production data freeze (Sep 09). The first set of data from this freeze
to complete quality review is now available on the UCSC public server,
in Release 2 of the ENCODE
Transcription Factor Binding Sites from Yale/UC-Davis/Harvard
track. Release 2 adds 59 ChIP-seq experiments to this track.
Other October track releases: The
by Tiling Array track was expanded to
include 4 additional experiments.
ENCODE Consortium, the domain encodeproject.org
has been registered by the ENCODE Data Coordination Center, and is
redirected to the ENCODE portal at UCSC.
New grants funded: NHGRI
new ENCODE grants, as part of the American Investment and
Recovery Act. The new grants include expansion of ENCODE to the mouse
genome and proteogenomics.
Job openings at UCSC: The
and ENCODE projects are currently accepting
applications for Software
Developer and Biological
Technician positions. We are looking
for talented individuals who would like to use their skills in computer
science, biology, and bioinformatics on fast-paced projects featuring
the work of top genomics scientists worldwide.
September 2009 - ENCODE data releases
since July 1
During this period a total of 10 new
ENCODE tracks were released to the UCSC public server. Functional
elements and region characterization in these tracks include:
For track names and file access, see
the Release Log and Downloads links listed in the
left menu bar.
We would like to thank the
contributing ENCODE labs and the the DCC team at UCSC for their efforts
completing these tracks.
2009 - ENCODE data releases for the
period April - June 2009
During this period, a total of seven
new ENCODE tracks were released to the UCSC public server. These tracks
include high-quality gene annotations, maps of transcription factor
binding, histone modifications, and open chromatin, RNA subcellular
localization, and RNA/protein binding sites. Read more.
The sequence and annotation data
displayed in the Genome Browser are freely available for academic,
nonprofit, and personal use with the following conditions: