Mostrando recursos 1 - 20 de 62

  1. Modeling Skewed Spatial Data Using a Convolution of Gaussian and Log-Gaussian Processes

    Zareifard, Hamid; Khaledi, Majid Jafari; Rivaz, Firoozeh; Vahidi-Asl, Mohammad Q.
    In spatial statistics, it is usual to consider a Gaussian process for spatial latent variables. As the data often exhibit non-normality, we introduce a novel skew process, named hereafter Gaussian-log Gaussian convolution (GLGC) to construct latent spatial models which provide great flexibility in capturing skewness. Some properties including closed-form expressions for the moments and the skewness of the GLGC process are derived. Particularly, we show that the mean square continuity and differentiability of the GLGC process are established by those of the Gaussian and log-Gaussian processes considered in its structure. Moreover, the usefulness of the proposed approach is demonstrated through...

  2. Merging MCMC Subposteriors through Gaussian-Process Approximations

    Nemeth, Christopher; Sherlock, Chris
    Markov chain Monte Carlo (MCMC) algorithms have become powerful tools for Bayesian inference. However, they do not scale well to large-data problems. Divide-and-conquer strategies, which split the data into batches and, for each batch, run independent MCMC algorithms targeting the corresponding subposterior, can spread the computational burden across a number of separate computer cores. The challenge with such strategies is in recombining the subposteriors to approximate the full posterior. By creating a Gaussian-process approximation for each log-subposterior density we create a tractable approximation for the full posterior. This approximation is exploited through three methodologies: firstly a Hamiltonian Monte Carlo algorithm...

  3. Variational Hamiltonian Monte Carlo via Score Matching

    Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai
    Traditionally, the field of computational Bayesian statistics has been divided into two main subfields: variational methods and Markov chain Monte Carlo (MCMC). In recent years, however, several methods have been proposed based on combining variational Bayesian inference and MCMC simulation in order to improve their overall accuracy and computational efficiency. This marriage of fast evaluation and flexible approximation provides a promising means of designing scalable Bayesian inference methods. In this paper, we explore the possibility of incorporating variational approximation into a state-of-the-art MCMC method, Hamiltonian Monte Carlo (HMC), to reduce the required expensive computation involved in the sampling procedure, which...

  4. Testing Un-Separated Hypotheses by Estimating a Distance

    Salomond, Jean-Bernard
    In this paper we propose a Bayesian answer to testing problems when the hypotheses are not well separated. The idea of the method is to study the posterior distribution of a discrepancy measure between the parameter and the model we want to test for. This is shown to be equivalent to a modification of the testing loss. An advantage of this approach is that it can easily be adapted to complex hypotheses testing which are in general difficult to test for. Asymptotic properties of the test can be derived from the asymptotic behaviour of the posterior distribution of the discrepancy...

  5. Efficient Model Comparison Techniques for Models Requiring Large Scale Data Augmentation

    Touloupou, Panayiota; Alzahrani, Naif; Neal, Peter; Spencer, Simon E. F.; McKinley, Trevelyan J.
    Selecting between competing statistical models is a challenging problem especially when the competing models are non-nested. In this paper we offer a simple solution by devising an algorithm which combines MCMC and importance sampling to obtain computationally efficient estimates of the marginal likelihood which can then be used to compare the models. The algorithm is successfully applied to a longitudinal epidemic data set, where calculating the marginal likelihood is made more challenging by the presence of large amounts of missing data. In this context, our importance sampling approach is shown to outperform existing methods for computing the marginal likelihood.

  6. Bayesian Analysis of RNA-Seq Data Using a Family of Negative Binomial Models

    Zhao, Lili; Wu, Weisheng; Feng, Dai; Jiang, Hui; Nguyen, XuanLong
    The analysis of RNA-Seq data has been focused on three main categories, including gene expression, relative exon usage and transcript expression. Methods have been proposed independently for each category using a negative binomial (NB) model. However, counts following a NB distribution on one feature (e.g., exon) do not guarantee a NB distribution for the other two features (e.g., gene/transcript). In this paper we propose a family of Negative Binomial models, which integrates the gene, exon and transcript analysis under a coherent NB model. The proposed model easily incorporates the uncertainty of assigning reads to transcripts and simplifies substantially the estimation...

  7. Sequential Bayesian Analysis of Multivariate Count Data

    Aktekin, Tevfik; Polson, Nick; Soyer, Refik
    We develop a new class of dynamic multivariate Poisson count models that allow for fast online updating. We refer to this class as multivariate Poisson-scaled beta (MPSB) models. The MPSB model allows for serial dependence in count data as well as dependence with a random common environment across time series. Notable features of our model are analytic forms for state propagation, predictive likelihood densities, and sequential updating via sufficient statistics for the static model parameters. Our approach leads to a fully adapted particle learning algorithm and a new class of predictive likelihoods and marginal distributions which we refer to as...

  8. On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression

    Ghosh, Joyee; Li, Yingbo; Mitra, Robin
    In logistic regression, separation occurs when a linear combination of the predictors can perfectly classify part or all of the observations in the sample, and as a result, finite maximum likelihood estimates of the regression coefficients do not exist. Gelman et al. (2008) recommended independent Cauchy distributions as default priors for the regression coefficients in logistic regression, even in the case of separation, and reported posterior modes in their analyses. As the mean does not exist for the Cauchy prior, a natural question is whether the posterior means of the regression coefficients exist under separation. We prove theorems that provide...

  9. A Comparison of Truncated and Time-Weighted Plackett–Luce Models for Probabilistic Forecasting of Formula One Results

    Henderson, Daniel A.; Kirrane, Liam J.
    We compare several variants of the Plackett–Luce model, a commonly-used model for permutations, in terms of their ability to accurately forecast Formula One motor racing results. A Bayesian approach to forecasting is adopted and a Gibbs sampler for sampling from the posterior distributions of the model parameters is described. Prediction of the results from the 2010 to 2013 Formula One seasons highlights clear strengths and weaknesses of the various models. We demonstrate by example that down weighting past results can improve forecasts, and that some of the models we consider are competitive with the forecasts implied by bookmakers odds.

  10. A New Monte Carlo Method for Estimating Marginal Likelihoods

    Wang, Yu-Bo; Chen, Ming-Hui; Kuo, Lynn; Lewis, Paul O.
    Evaluating the marginal likelihood in Bayesian analysis is essential for model selection. Estimators based on a single Markov chain Monte Carlo sample from the posterior distribution include the harmonic mean estimator and the inflated density ratio estimator. We propose a new class of Monte Carlo estimators based on this single Markov chain Monte Carlo sample. This class can be thought of as a generalization of the harmonic mean and inflated density ratio estimators using a partition weighted kernel (likelihood times prior). We show that our estimator is consistent and has better theoretical properties than the harmonic mean and inflated density...

  11. Computationally Efficient Multivariate Spatio-Temporal Models for High-Dimensional Count-Valued Data (with Discussion)

    Bradley, Jonathan R.; Holan, Scott H.; Wikle, Christopher K.
    We introduce a computationally efficient Bayesian model for predicting high-dimensional dependent count-valued data. In this setting, the Poisson data model with a latent Gaussian process model has become the de facto model. However, this model can be difficult to use in high dimensional settings, where the data may be tabulated over different variables, geographic regions, and times. These computational difficulties are further exacerbated by acknowledging that count-valued data are naturally non-Gaussian. Thus, many of the current approaches, in Bayesian inference, require one to carefully calibrate a Markov chain Monte Carlo (MCMC) technique. We avoid MCMC methods that require tuning by...

  12. Locally Adaptive Smoothing with Markov Random Fields and Shrinkage Priors

    Faulkner, James R.; Minin, Vladimir N.
    We present a locally adaptive nonparametric curve fitting method that operates within a fully Bayesian framework. This method uses shrinkage priors to induce sparsity in order- $k$ differences in the latent trend function, providing a combination of local adaptation and global control. Using a scale mixture of normals representation of shrinkage priors, we make explicit connections between our method and $k$ th order Gaussian Markov random field smoothing. We call the resulting processes shrinkage prior Markov random fields (SPMRFs). We use Hamiltonian Monte Carlo to approximate the posterior distribution of model parameters because this method provides superior performance in the...

  13. Optimal Gaussian Approximations to the Posterior for Log-Linear Models with Diaconis–Ylvisaker Priors

    Johndrow, James; Bhattacharya, Anirban
    In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis–Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. Here we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis–Ylvisaker priors, and provide convergence rate and finite-sample...

  14. Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data

    Hu, Jingchen; Reiter, Jerome P.; Wang, Quanli
    We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class. This structure allows the model to capture dependence among units in the same group. It also facilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the...

  15. Regularization and Confounding in Linear Regression for Treatment Effect Estimation

    Hahn, P. Richard; Carvalho, Carlos M.; Puelz, David; He, Jingyu
    This paper investigates the use of regularization priors in the context of treatment effect estimation using observational data where the number of control variables is large relative to the number of observations. First, the phenomenon of “regularization-induced confounding” is introduced, which refers to the tendency of regularization priors to adversely bias treatment effect estimates by over-shrinking control variable regression coefficients. Then, a simultaneous regression model is presented which permits regularization priors to be specified in a way that avoids this unintentional “re-confounding”. The new model is illustrated on synthetic and empirical data.

  16. Improving the Efficiency of Fully Bayesian Optimal Design of Experiments Using Randomised Quasi-Monte Carlo

    Drovandi, Christopher C.; Tran, Minh-Ngoc
    Optimal experimental design is an important methodology for most efficiently allocating resources in an experiment to best achieve some goal. Bayesian experimental design considers the potential impact that various choices of the controllable variables have on the posterior distribution of the unknowns. Optimal Bayesian design involves maximising an expected utility function, which is an analytically intractable integral over the prior predictive distribution. These integrals are typically estimated via standard Monte Carlo methods. In this paper, we demonstrate that the use of randomised quasi-Monte Carlo can bring significant reductions to the variance of the estimated expected utility. This variance reduction can...

  17. Real-Time Bayesian Parameter Estimation for Item Response Models

    Weng, Ruby Chiu-Hsing; Coad, D. Stephen
    Bayesian item response models have been used in modeling educational testing and Internet ratings data. Typically, the statistical analysis is carried out using Markov Chain Monte Carlo methods. However, these may not be computationally feasible when real-time data continuously arrive and online parameter estimation is needed. We develop an efficient algorithm based on a deterministic moment-matching method to adjust the parameters in real-time. The proposed online algorithm works well for two real datasets, achieving good accuracy but with considerably less computational time.

  18. Latent Marked Poisson Process with Applications to Object Segmentation

    Ghanta, Sindhu; Dy, Jennifer G.; Niu, Donglin; Jordan, Michael I.
    In difficult object segmentation tasks, utilizing image information alone is not sufficient; incorporation of object shape prior models is necessary to obtain competitive segmentation performance. Most formulations that incorporate both shape and image information are in the form of energy functional optimization problems. This paper introduces a Bayesian latent marked Poisson process for segmenting multiple objects in an image. The model takes both shape and image feature/appearance into account—it generates object locations from a spatial Poisson process, then generates shape parameters from a shape prior model as the latent marks. Inferentially, this partitions the image: pixels inside objects are assumed...

  19. Approximation of Bayesian Predictive $p$ -Values with Regression ABC

    Nott, David J.; Drovandi, Christopher C.; Mengersen, Kerrie; Evans, Michael
    In the Bayesian framework a standard approach to model criticism is to compare some function of the observed data to a reference predictive distribution. The result of the comparison can be summarized in the form of a $p$ -value, and computation of some kinds of Bayesian predictive $p$ -values can be challenging. The use of regression adjustment approximate Bayesian computation (ABC) methods is explored for this task. Two problems are considered. The first is approximation of distributions of prior predictive $p$ -values for the purpose of choosing weakly informative priors in the case where the model checking statistic is expensive...

  20. Bayesian Inference and Testing of Group Differences in Brain Networks

    Durante, Daniele; Dunson, David B.
    Network data are increasingly collected along with other variables of interest. Our motivation is drawn from neurophysiology studies measuring brain connectivity networks for a sample of individuals along with their membership to a low or high creative reasoning group. It is of paramount importance to develop statistical methods for testing of global and local changes in the structural interconnections among brain regions across groups. We develop a general Bayesian procedure for inference and testing of group differences in the network structure, which relies on a nonparametric representation for the conditional probability mass function associated with a network-valued random variable. By...

Aviso de cookies: Usamos cookies propias y de terceros para mejorar nuestros servicios, para análisis estadístico y para mostrarle publicidad. Si continua navegando consideramos que acepta su uso en los términos establecidos en la Política de cookies.