Novel Incremental Ranking Framework for Biomedical Data Analytics and Dimensionality Reduction: Big Data Challenges and Opportunities

Emad Elsebakhi; Ognian Asparouhov; Rashid Al-Ali

Novel Incremental Ranking Framework for Biomedical Data Analytics and Dimensionality Reduction: Big Data Challenges and Opportunities

Abstract

Emad Elsebakhi, Ognian Asparouhov and Rashid Al-Ali

Currently, due to the availability of massive biomedical data on each individual, both healthcare and life Science is becoming data-driven. The input-attributes are structured/un-structured data with many challenges, including sparse-binary attributes with imbalanced outcomes, non-unique distributed structure and high-dimensional data, which hamper efforts to make a clinical decision in clinical practice. In recent decades, considerable effort has been made toward overcoming most of these challenges, but still there is an essential need for significant improvements in this field, especially after integrating both omics and phenotype data for future personalized medicine. These challenges motivate us to use the state-of-the-art of big data analytics and large-scale machine learning frameworks to confront most of the challenges and provide proper clinical solutions to assess physicians in clinical practice at the bedside and subsequently provide high quality care while reducing its cost.

This research proposes a new recursive screening incremental ranking machine learning paradigm to empower the desired classifiers, especially for imbalanced training data, to create suitable data-driven clusters without prior information and later reduce the dimensionality of large biomedical data sets. The new framework combines many binary-attributes based on two criteria: (i) the minimum power value for each combination and (ii) the classification power of such a combination. Next, these sets of combined attributes are investigated by physicians to select the proper set of rules that make clinical sense and subsequently to use the result to empower the desired healthcare event (binary or multinomial target) at the bedside. After empowering the target class categories, we select the k-significant risk drivers with a suitable volume of data and high correlation to the desire outcome, and next, we establish the proper segmentation using AND-OR associative relationships. Finally, we use the propensity score to handle the imbalanced data, and next, we build break-through machine learning/data mining predictive models based on functional networks’ maximum-likelihood and Newton-Raphson iterative matrix computation mechanism to expedite the implementations within high performance computing platforms, such as scalable MapReduce HDFS, Spark MLlib, and Google Sibyl.

Comparative studies with both simulated and real-life biomedical databases are carried out for identifying specific biomedical and healthcare outcomes, such as asthma, breast cancer, gene mutations selection and genomic association studies for specific complex diseases. Results have shown that the proposed incremental learning scheme empower the new classifier with reliable and stable performance. The new classifier outperforms the current existing predictive models in both high quality outcome and less expensive in execution time, especially, with imbalanced and sparse with high-dimensional big biomedical data. We recommend future work to be conducted using real-life integrated clinic-genomic big data with genome-wide association studies for future personalized medicine.

免责声明: 此摘要通过人工智能工具翻译，尚未经过审核或验证

分享此文章

期刊亮点

索引于

CAS 来源索引 (CASSI)
哥白尼索引
谷歌学术
夏尔巴·罗密欧
学术期刊数据库
Genamics 期刊搜索
期刊目录
引用因子
电子期刊图书馆
参考搜索
哈姆达大学
亚利桑那州EBSCO
期刊摘要索引目录
世界科学期刊目录
OCLC-WorldCat
学者指导
SWB 在线目录
虚拟生物学图书馆 (vifabio)
普布隆斯
Dtu 查找
日内瓦医学教育与研究基金会

计算机科学与系统生物学杂志