中文摘要
人类对基因组的认识起始于解码1D基因组序列(HGP,ENCODE),发展于揭示基因组表观遗传修饰信号的2D基因组(Epigenetic Roadmap),如今已迈入探索4D基因组的新纪元(染色质三维结构及其动态变化,如4D Nucleome )。针对基因组三维结构及其功能研究,人们开发出Hi-C、ChIA-PET等高通量测序技术,但仍面临若干关键技术难题。因此,我们将结合本课题组自身优势,在本研究中:(1)开发BL-Hi-C技术以降低Hi-C技术的实验噪声,整合表观遗传信号提升染色质相互作用研究的分辨率;(2)开发配套的数据处理算法,并改进ChIA-PET/Hi-C/BL-Hi-C数据处理的统计模型;(3)基于随机过程开发比对时序染色质三维结构数据的计算工具;(4)整合现有工具,搭建完整的4D基因组数据分析和展示平台。最终我们计划将该技术方法体系在人类血液/白血病细胞模型中进行有效性测评。
英文摘要
Human Genome Project, started from decoding the 1D primary genomic sequence (e.g. ENCODE Project) and continued through delineating 2D epigenomic profiles for hundreds of cell types (e.g. Roadmap Epigenome Mapping Project), has entered the era of dissecting 4D chromatin architecture - 3D spatial contact structure and its temporal dynamics (e.g. 4D Nucleome Project). Chromatin is a complex of genomic DNA and proteins that make up the chromosomes within the nucleus of a cell. The organization of genomic material into chromatin is presumed to play an important role in regulating expression of genes. However, the precise relationship between spatial genome organization and expression of resident genes in health and disease remains unclear. Toward understanding 3D genome architecture and its relationship to gene regulation, new high through next-gen sequencing based technologies, such as Hi-C, ChIA-PET, etc. have emerged to allow in depth investigation of 3D chromatin interaction at the genomewide global level. In this proposed study, we aim at solving some specific technical problems that have hindered the progress in this rapidly developing new field. In particular, we will take the advantage and strength of our research team, focusing on (1) developing new BL-Hi-C technology to substantially reduce the noisy in the conventional Hi-C method and at the same time to increase the resolution and enrichment for active promoter/enhancer DNA loops; (2) developing related data analysis algorithm and further improving ChIA-PET and Hi-C/BL-Hi-C statistical models; (3) developing stochastic process data analysis tools for comparing differential changes and dynamics; (4) developing 4D genome integrative data analysis pipeline/platform and user-friendly visualization tools. In addition to explore modern machine learning with existing public data and to benchmark mathematical models with simulations, we plan also to apply and validate our new methods with human blood (both normal and leukemia) cells (K562, Hi-60, NM4,Kusumi,SKNO).
