In recent years, the sequencing of various eukaryotic genomes and the systematic characterization of transcribed sequences have intensified the efforts to decipher a so-called "regulatory code" within each genome. This regulatory code mediates the controlled expression of specific subsets of genes in a particular cell type, developmental stage, disease state, or environmental response. Determining this code has profound implications in our understanding of human evolution, development, and disease processes.
The complex control of transcription in eukaryotic cells makes deciphering the regulatory code particularly challenging. For instance, transcriptional initiation of a gene requires combinatorial interactions between sequence-specific transcription factors and cognate regulatory sequences, as well as remodeling of local chromatin structures. Each eukaryotic genome encodes between several hundred to several thousand transcription factors. In addition, the histones, a major component of chromatin, can be subject to over 100 different types of modifications and have been shown to exist in non-allelic variants targeted to particular functional sites in the genome. Integrating information over these various layers of control makes deducing the combinatorial code far from straightforward. Furthermore, in mammalian genomes, the transcriptional regulatory sequences for a gene are usually scattered over large regions, and our knowledge of these sequences remains limited.
Research in my lab has been devoted toward the identification and characterization of the transcriptional regulatory code of the human genome. We are taking an integrative approach to this problem. In the last several years, we have developed an efficient experimental strategy that allows comprehensive determination of transcription factor binding sites in the genome, and used this approach to systematically identify the transcriptional promoters, enhancers, and insulators in human cells. In addition, we have successfully used this strategy to determine the downstream target genes for c-Myc and β-catenin, two oncogenes involved in colorectal carcinoma and other human cancers. Results from these studies have not only demonstrated the utility of our approaches for rapid discovery of regulatory sequences in a genome, but also provided new insights in gene regulation and tumorigenesis. We are currently expanding these efforts to identify all the regulatory sequences in the human genome, dissect their combinatorial interactions with the sequence specific transcription factors, and define the functional relationships among chromatin modifications, DNA methylation and gene regulation.