Evaluation of Relational and NoSQL Approaches for Cohort Identification from Heterogeneous Data Sources in the National Sleep Research Resource

Ningzhou Zeng; Guo-Qiang Zhang; Xiaojin Li; Licong Cui

Evaluation of Relational and NoSQL Approaches for Cohort Identification from Heterogeneous Data Sources in the National Sleep Research Resource

Abstract

Ningzhou Zeng, Guo-Qiang Zhang, Xiaojin Li and Licong Cui

Patient cohort identification across heterogeneous data sources is a challenging task, which may involve a complicated process of data loading, harmonization and querying. Most existing cohort identification tools use a relational database model implemented in SQL for storing patient data. However, SQL databases have restrictions on the maximum number of columns in a table, which necessitates the breaking down of high dimensional data into multiple tables and as a consequence affects query performance. In this paper, we developed two NoSQL-based patient cohort query systems based on an existing SQL-based system for the cross-cohort query in the National Sleep Resource Research (NSRR). We used eight NSRR datasets in our experiment to evaluate the performance of the NoSQLbased and SQL-based systems in data loading, harmonization and query. Our experiment showed that NoSQL-based approaches outperformed the SQL-based and are rather promising for developing patient cohort query systems across heterogeneous data sources.

免责声明: 此摘要通过人工智能工具翻译，尚未经过审核或验证

分享此文章

期刊亮点

索引于

哥白尼索引
谷歌学术
夏尔巴·罗密欧
打开 J 门
Genamics 期刊搜索
学术钥匙
期刊目录
研究圣经
访问全球在线农业研究 (AGORA)
电子期刊图书馆
参考搜索
哈姆达大学
亚利桑那州EBSCO
OCLC-WorldCat
普罗奎斯特传票
学者指导
SWB 在线目录
虚拟生物学图书馆 (vifabio)
普布隆斯
日内瓦医学教育与研究基金会
欧洲酒吧

健康与医学信息学杂志