在家系序列数据中同质性检验的连锁研究

中文摘要

全基因组外显子测序数据通常在小家系中收集。单核苷酸多态性（SNP）可以用于连锁分析，但目前只有少量家系进行全基因组测序导致SNP检测功效低。传统连锁分析中假设家系存在同质性（零假设）来寻找LOD值高的区域，备择假设为异质性，即不同家系中疾病基因是否在基因组不同位置。基于此假设的传统连锁分析可能增加检测功效，但对多个易感位点联合作用复杂性状，这个假设有悖常理。因此我们提出颠覆传统假设的新方法：零假设为不同家系中疾病基因存在基因组不同位置上，同质性为备择假设。基于零假设，两个家系中LOD值高的遗传变异出现同一（近似）区域可能性很低。事实上，两个遗传变异出现同一染色体上的概率为50%，进一步缩小范围，出现100bp距离内概率会变得极低。这个"令人惊讶的事实"是本课题立项依据之一。由此，我们提出检测两个家庭中易感疾病遗传变异发生在同一位置可能性的思路。初步研究表明，我们分析方法较原有方法更为有效。

英文摘要

Exome sequence data are often obtained in small human families. Extracting SNPs from DNA variants allows for linkage analysis but power is often low because only small numbers of families are generally sequenced. The customary procedure in linkage analysis is to initially assume homogeneity among families ( hypothesis, H0) and search for high lod scores. A subsequent analysis then generally tests for heterogeneity (alternative hypothesis, H1) to see whether potential disease loci occur at different genomic locations in different families. Combining the two steps may yield increased power but this whole concept is really counterintuitive for complex traits with presumably multiple susceptibility variants contributing to disease. Here we propose a novel approach that reverses the traditional hypothesis testing scenario: We initially assume that disease variants in different families can occur anywhere in the genome ( hypothesis, H0). Under this hypothesis, the fact that variants with large lod scores in two families occur at (approximately) the same position is an unlikely occurrence; in fact, two such variants have an approximate probability of 5% to even occur on the same chromosome, and a much smaller probability to occur within, say, 100 bp of each other. This "surprise factor" has previously been expressed in an ad hoc manner but here we quantify this effect and develop a general hypothesis testing framework and resulting software, where homogeneity is our alternative hypothesis (H1), for which we have developed two test procedures as outlined below. In other words, we want to be able to test for the fact that potential disease variants occur at approximately the same positions in two families. A significant result of such a test would indicate that (1) the two families share some genetic vulnerabilities and (2) there is significant evidence for the presence of variants linked with disease.