Non-Parametric Bayesian Modelling of Digital Gene Expression Data

Dimitrios V Vavoulis; Julian Gough

Non-Parametric Bayesian Modelling of Digital Gene Expression Data

Abstract

Dimitrios V Vavoulis and Julian Gough

Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or “reads”, which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or even absence of biological replicates, which complicates the statistical analysis of digital gene expression data. Analysis of this type of data has often been based on modified tests originally devised for analysing microarrays; both these and even de novo methods for the analysis of RNA-seq data are plagued by the common problem of low replication. We propose a novel, non-parametric Bayesian approach for the analysis of digital gene expression data. We begin with a hierarchical model for modelling over-dispersed count data and a blocked Gibbs sampling algorithm for inferring the posterior distribution of model parameters conditional on these counts. The algorithm compensates for the problem of low numbers of biological replicates by clustering together genes with tag counts that are likely sampled from a common distribution and using this augmented sample for estimating the parameters of this distribution. The number of clusters is not decided a priori, but it is inferred along with the remaining model parameters. We demonstrate the ability of this approach to model biological data with high fidelity by applying the algorithm on a public dataset obtained from cancerous and non-cancerous neural tissues. Source code implementing the methodology presented in this paper takes the form of the Python Package DGEclust, which is freely available at the following link: https://bitbucket.org/DimitrisVavoulis/dgeclust.

免责声明: 此摘要通过人工智能工具翻译，尚未经过审核或验证

分享此文章

期刊亮点

索引于

CAS 来源索引 (CASSI)
哥白尼索引
谷歌学术
夏尔巴·罗密欧
学术期刊数据库
Genamics 期刊搜索
期刊目录
引用因子
电子期刊图书馆
参考搜索
哈姆达大学
亚利桑那州EBSCO
期刊摘要索引目录
世界科学期刊目录
OCLC-WorldCat
学者指导
SWB 在线目录
虚拟生物学图书馆 (vifabio)
普布隆斯
Dtu 查找
日内瓦医学教育与研究基金会

计算机科学与系统生物学杂志