高维遗传数据预测模型构建中组群结构信息整合的新方法及其应用研究

中文摘要

高维遗传数据普遍存在具有生物学意义的组群结构信息，如何利用组群结构信息建立新的整合分析模型，对个体的疾病风险进行精准预测是公共卫生领域的热点方向。本项目将发挥贝叶斯模型在先验信息整合方面的独特优势，通过设置新的组群结构参数，有效整合组群结构信息，通过构造新的group spike-and-slab混合先验分布，研制快速稳健的算法，准确估计基因效应，有效控制假阳性，显著提高模型的预测效能，进而发展出一整套适用于高维GLM和Cox模型的整合组群结构信息的模型构建新方法。本项目拟通过模拟研究验证模型在组群信息整合和预测效能等方面的优越性，通过分析缺血性脑卒中病例对照数据，肿瘤生存数据和微生物多水平组群结构数据，展示本项目提出方法的广泛适用性。本项目的完成将为高维数据组群结构信息整合提供新的理论和方法，具有重要的理论创新意义，同时也将为相关疾病个体化风险预测型构建提供新的思路，具有重要的应用价值。

英文摘要

The group structure is widely existed in high-dimensional genetic data. Using the group structure information to build new integrated analysis model for personalized risk prediction is a hot research area for disease prevention and control. This project will propose a new methodology for integrating the group structure information in modeling high-dimensional genetic data, based on the unique advantage of Bayesian model in integrating prior information. A new group spike-and-slab mixture prior distribution and a fast and stable algorithm will be developed for estimating the genes effect size, controlling the false positive, and improving the prediction power. Extension study will be performed for high-dimensional general linear model and Cox model. A whole set of novel methodology will be developed for modeling the group structure information in high-dimensional genetic data. The effectiveness and advantages of the proposed model in integrating the group structure information will be validated through intensive simulation studies. The wide applicability of the proposed model will be presented by analyzing the case-control data of ischemic stroke, cancers survival data, and microbiome data with multiple level group structure. The accomplishment of this project will lay the new theoretical and methodology foundation for modeling the group structure information in high-dimensional genetic data. The personalized risk prediction model for each real data will provide new insight for personalized risk prediction, which has important significance for public health.