中文摘要
我国半数以上2型糖尿病(T2DM)患者不能早期发现,视网膜病变(DR)是T2DM主要并发症和不可逆性盲的首要因素,但目前缺少基于自然人群以DR为终点事件的大数据风险评估与预测模型。本研究在课题组前期糖代谢、糖基组学和医学图像预测疾病研究工作基础上,采用适合大样本、高维、非结构、混杂大数据的依时竞争风险和高斯过程模型,分析24万健康体检人群资料,包括基线及逐年随访的体检和行为因素数据,研究依时竞争风险模型建立成年人群T2DM风险评估工具;在前期建立的16万社区居民T2DM监测现场,对未确诊者进行筛查,对新发和既往T2DM患者进行随访;收集新发及既往DR患者与对照人群的眼底图像和生物标本,整合N-糖基生物标志物和眼底图像高维纹理等构成的大数据,研究高斯过程模型等方法构建T2DM发生DR的预测模型,为T2DM导致DR的风险评估、早期发现和预测预警提供科学依据,为疾病大数据分析提供方法学支撑。
英文摘要
Up to now, nearly half of patients with Type 2 Diabetes Mellitus (T2DM) cannot be diagnosed at early stage in domestic and overseas, and Diabetes Retinopathy (DR) caused by T2DM is the principal factor leading to irreversible blindness, while studies on big data risk assessment and prediction models for outcome of DR based on general population are rare. Based on our previous experiences in statistical inference using big data (such as studies on association between glycomics and glycometabolism, disease prediction with medical images), as well as the latest advance in this field, we assume that time-dependent competing risk and Gaussian process models are advanced and suitable methods to solve the problem with big data. In the study, T2DM risk assessment will be conducted using time-dependent competing risk model based on physical examination and behavior data of healthy people both at baseline and through yearly follow up (we had collected from a cohort of 240,000 medical examination population). Furthermore, in the T2DM monitoring population pre-established with 160,000 community residents, new patients diagnosed by oral glucose tolerance screening test and patients previously diagnosed will be followed up. Big data with multidimensional texture characteristics extracted from fundus images and N-glycosylation biomarkers of, and previously diagnosed patients and control population will be analyzed to build DR prediction models using Gaussian processes and other data mining methods. The study will provide scientific evidence for risk assessment, early intervention and comprehensive prevention for DR patients caused by T2DM, and supply methodological reference for statistical predication using big data.
