中文摘要
遗传变异和自然选择是物种形成及生物进化的主要驱动力。但目前人们所关注的研究对象仍集中在蛋白编码区,绝大部分位于非编码区域的变异及所受选择压力仍未受到重视。本研究将围绕着遗传变异与选择在基因组中的作用这一核心问题,重点关注它们对基因组编码区和非编码区作用模式的区别与联系,建立新的计算方法和分析工具。本研究将基于进化索引方法,在人群多态性和ICGC/TCGA的肿瘤基因组数据中,比较在超微观、微观与宏观水平的编码区突变模式;建立基于共近祖树的分析方法,为后续适应性进化的基因及调控元件的识别提供重要工具;结合功能组学数据,利用统计和模式识别等计算生物学手段,建立遗传变异功能评估方法。这些研究将有利于我们建立遗传变异与表型变异之间的相互关系,进一步推动微进化理论的完善与发展。
英文摘要
Genetic variation and natural selection are major driving forces of speciation and molecular evolution. Most of current studies focus on genetic variations on coding regions, but neglect those in noncoding regions. A large fraction of eukaryotic genomes consists of noncoding DNA that is not translated into protein sequence, and little is known about its functional significance and evolutionary impact. In this proposal, we aim at studying the functional impact of genetic variation and natural selection on genomic coding and noncoding regions, and developing new algorithms and computational tools. Firstly, we are going to investigate the mutational pattern and signatures in human polymorphic data as well as ICGC/TCGA cancer genomic data based on an Evolutionary Index method; Secondly, we are going to develop new approaches to detect positive selection based on coalescent tree, and the identified genes and regulatory elements will be used for downstream functional assay and phenotypic evaluation; Thirdly, we are going to build new algorithms and tools to evaluate the phenotypic impact of genetic variation using functional genomic data based on the combination of statistical and machine learning approaches. This study will help us understand the relationship between genetic variation and phenotypic variation, and will promote the improvement and development of microevolution theory.
