Cohort Identification for Trampoline-associated Traumatic Dental Injuries among Pediatric Patients from Clinical Notes using Machine Learning

Joseph W. Sirrianni1; Jin Peng1; Yungui Huang1; Homa Amini2

Cohort Identification for Trampoline-associated Traumatic Dental Injuries among Pediatric Patients from Clinical Notes using Machine Learning

Abstract

Joseph W. Sirrianni1*, Jin Peng1, Yungui Huang1 and Homa Amini2

Background: Cohort identification is a crucial task for performing retrospective clinical analysis. The utilization of natural language processing, especially the modern and advanced approaches using deep learning modeling, may improve this task by allowing for improved classification of patients by cohort status. However, this utilization has not been applied in the dentaldomain.

Objective: We aim to identify patients that suffer trampoline-associated traumatic dental injuries among all trampoline-associatedinjuries.

Methods: We develop and apply a natural language processing cohort identification pipeline, consisting of text filtering rules and a machine learning model trained using historic data. The pipeline processes a patient’s clinical notes for a series of temporally related encounters and produces a binary prediction of whether the patient has suffered a trampoline-injury or not. We experimented with six different machine learning models: logistic regression, random forest, decision tress, linear-SVM, naïve bayes, and a fine-tuned ClinicalBERT model.

Results: The fine-tuned ClinicalBERT model had the best performance of the models on our evaluation data with a PPV of 0.836 and a sensitivity of 0.898. The application of the pipeline on our data increased the cohort size for all trampoline injuries from an initial 7454 patients to 15,010 patients and the trampoline-associated traumatic dental injuries cohort from an initial 102 patients to 140 patients.

Conclusion: We present a novel natural language processing powered pipeline for identifying a trampoline-associated injury cohort for dental research. Our results demonstrate the superiority of deep learning over traditional machine learning models on our specific task. Our process for identifying patient encounters by activity type is generalizable to several different types of injuries and applicable to other research cohorts.

免责声明: 此摘要通过人工智能工具翻译，尚未经过审核或验证

分享此文章

期刊亮点

索引于

哥白尼索引
谷歌学术
夏尔巴·罗密欧
打开 J 门
Genamics 期刊搜索
学术钥匙
期刊目录
研究圣经
访问全球在线农业研究 (AGORA)
电子期刊图书馆
参考搜索
哈姆达大学
亚利桑那州EBSCO
OCLC-WorldCat
普罗奎斯特传票
学者指导
SWB 在线目录
虚拟生物学图书馆 (vifabio)
普布隆斯
日内瓦医学教育与研究基金会
欧洲酒吧

健康与医学信息学杂志