中文摘要
Proposed project will seek means of interpreting high throughput resequencing results from human cDNA and DNA in the context of existing biological information. Broad spectrum of disorders, including cancer, late and early onset diseases have been associated with common or de novo genetic variants. Recent advances in high throughput sequencing technology allow examination of possible genetic variants compromising gene integrity in many details. Next generation sequencing data processing is notoriously difficult to interpret and requires cloud computing services for many routine analysis tasks such as variants calling based on alignment of raw reads against the reference genome along with gene expression and isoform composition analysis. Trio studies, whole exome and genome re-sequencing projects uncover large number of possibly harmful genetic variants that need further interpretation in the context of existing biological data. While non-synonymous variants could be relatively easy to interpret according to their potential of disrupting protein integrity, the non-coding variants that could possibly compromise splicing or transcription initiation, require additional computational modeling. Especially this refers to cases where RNA-Seq could not be easily conducted to confirm presence of aberrant gene isoforms. The project can also provide possibility of fast diagnostic data analysis in such time-critical domains as cancer treatment guidance and preimplantation screening. Eventually, private submissions of anonymized genomic information will be supported for individual genome analysis and visualization in the context of accumulated information available in public domain. The proposed approach might offer venue for preventive measures to ameliorate suffering from childhood and common hereditary disorders.
