中文摘要
标准化是卫生统计大数据共享和利用亟待解决的问题,问题的关键在于统计知识和统计变量(数据元)的表达缺乏适宜的数据标准、已有的数据标准间缺乏良好地协同性、统计数据处理过程缺乏明晰的元数据规范。本研究在综合分析大数据环境下我国卫生统计数据资源特征的基础上,利用本体论和元数据理论与技术,以protégé 和Rose UML 为建模工具,建立能够满足我国卫生统计信息化发展需求的统计数据表达标准(卫生统计本体与数据元标准)和统计元数据规范,并通过建立卫生统计大数据概念模型,明确数据标准间(如数据集标准)的协同机制,以期为统计知识和统计数据的人-机共享、信息系统间的统计数据融合、跨业务域数据标准的协同以及统计数据处理的规范化提供必须的信息标准支撑,从而提高统计数据的可用性、增强统计结果的科学性和可解释性,为卫生统计大数据的有效利用奠定坚实的基础。
英文摘要
Standardization was the key issue which needed to be resolved urgently for sharing and use of health statistical big data. The focus of the issue was short of adaptive data presentation standards for statistical knowledge and variable (data element), good harmonization among existing data standards, and explicit metadata specification for the processing of statistical data. In this study, based on the complete analysis on characteristics about health big data resources in China, we applied Ontology and metadata theory and technology to construct statistical data presentation standards (health statistical ontology and data element standard) and statistical metadata specification for meeting demand of health statistical informatization development in China by using Protégé and Rose UML as modeling tools. We also constructed conceptual data model of health statistical big data in order to illustrate harmonization mechanism among existing data standards, such as existing health data set standards. The aim was to provide necessary information standard support for sharing statistical knowledge and data, improving usability of statistical data, enhancing the scientificity and interpretability of statistical results, and establishing firm base for use health statistical big data efficiently.
