A Statistical Approach to Correcting Cross Annotations in a
Metagenomic Functional Profile Generated by Short Reads

Du R; Mercante D; An L; Fang Z

A Statistical Approach to Correcting Cross Annotations in a Metagenomic Functional Profile Generated by Short Reads

Abstract

Du R, Mercante D, An L, Fang Z

Background: Categorizing protein coding sequences into one family, if the proteins they encode perform the same biochemical function, and then tabulating the relative abundances among all the families, is a widely-adopted practice for functional profiling of a metagenomic sample. By homology searching of metagenomic sequencing reads against a protein database, the relative abundance of a family can be represented by the number of reads aligned to its members. However, it has been observed that, for short reads generated by next-generation sequencing platforms, some may be erroneously assigned to the functional families they are not associated to. This commonly occurred phenomenon is termed as cross-annotation. Current methods for functional profiling of a metagenomic sample use empirical cutoff values, to select the alignments and ignore such cross-annotation problem, or employ summarized equation to do a simple adjustment. Result: By introducing latent variables, we use the Probabilistic Latent Semantic Analysis to model the proportions of reads assigned to functional families in a metagenomic sample. The approach can be applied on a metagenomic sample after the list of the true functional families being obtained or estimated. It was implemented in metagenomic samples functionally characterized by the database of Clusters of Orthologous Groups of proteins, and successfully addressed the cross-annotation issue on both in vitro-simulated, bioinformatics tool simulated metagenomic samples, and a real-world data. Conclusions: Correcting cross-annotation will increase the accuracy of the functional profiling of a metagenome generated by short reads. It will further benefit differential abundance analysis of metagenomic samples under different conditions.

免责声明: 此摘要通过人工智能工具翻译，尚未经过审核或验证

分享此文章

索引于

哥白尼索引
谷歌学术
夏尔巴·罗密欧
学术期刊数据库
打开 J 门
Genamics 期刊搜索
学术钥匙
期刊目录
研究圣经
中国知网（CNKI）
乌尔里希的期刊目录
访问全球在线农业研究 (AGORA)
电子期刊图书馆
参考搜索
哈姆达大学
亚利桑那州EBSCO
期刊摘要索引目录
OCLC-WorldCat
SWB 在线目录
虚拟生物学图书馆 (vifabio)
普布隆斯
欧洲酒吧

生物识别与生物统计学杂志

A Statistical Approach to Correcting Cross Annotations in a Metagenomic Functional Profile Generated by Short Reads

Abstract

期刊亮点

索引于

相关链接

开放获取期刊