基于配对及家系测序数据的基因组结构变异的识别

中文摘要

目前，越来越多的研究表明基因组结构变异与人类各种复杂性状疾病紧密相关。作为精准医学研究中的核心关键技术，基因组结构变异的检测已成为筛选疾病相关基因最迅速和有效的方法之一。然而，现有的基于群体基因组测序数据的结构变异挖掘算法远未成熟，尤其是缺乏对低频和重复区域结构变异的检测能力。本研究针对结构变异挖掘中的关键问题，如结构变异中断点的精准定位和重复序列区域附近的结构变异识别，提出新的计算方法，建立较为完善的统计学模型及质量评估标准，以便快速、准确的从海量数据中挖掘出基因组结构变异。此外，将重点关注适合配对数据及家系数据的结构变异检测技术，建立多信号整合的统计学方法及杂合度估算模型，实现对肿瘤及遗传病家系数据中新生结构变异及纯合缺失变异的自动化处理流程。本项目的研究成果将我们为深入理解复杂性状疾病的分子机制、鉴定易感基因和认识遗传变异和疾病表型的关系提供重要的工具。

英文摘要

Recently, extensive studies have shown that genomic structural variation (SV) is involved in various human genetic disorders. As a key technique in precision medicine, SV detection has been proven to be one of the most efficient way to screen candidate genes related to diseases. However, current SV detection algorithms are far from being perfect and have limits in terms of low frequency and heterozygous SVs, especially for those adjacent to repetitive regions. In this study, we aim at developing new computational algorithms for identifying SVs associated with repetitive sequences and recognizing their precise breakpoints, by employing machine learning and statistical approaches. We will focus on the detection of SVs from paired and family trios data, and we will employ a multi-signal based strategy to build a sophisticated statistical model to estimate heterozygosity rate and to filter false positives, which will help detect de novo SVs and homozygous deletion variants from personal genomes with inherited diseases. In addition, we will set up a distributed system for SV detection and annotation, and using this platform we will explore SV patterns in human personal genomes. This study will facilitate the discovery of SVs and susceptibility genes present in our genomes and change our perspective on DNA structural variation and human disease.