Recursos de colección
Clarke, Jennifer; Seo, David
We have developed a strategy for the analysis of newly available binary data to improve outcome predictions based on existing data (binary or non-binary). Our strategy involves two modeling approaches for the newly available data, one combining binary covariate selection via LASSO with logistic regression and one based on logic trees. The results of these models are then compared to the results of a model based on existing data with the objective of combining model results to achieve the most accurate predictions. The combination of model predictions is aided by the use of support vector machines to identify subspaces of...
Bhattacharya, Abhishek; Bhattacharya, Rabi
This article presents certain recent methodologies and some new results for the statistical analysis of probability distributions on manifolds. An important example considered in some detail here is the 2-D shape space of k-ads, comprising all configurations of k planar landmarks (k>2)-modulo translation, scaling and rotation.
Ishwaran, Hemant; Papana, Ariadni
Rescaled spike and slab models are a new Bayesian variable selection method for linear regression models. In high dimensional orthogonal settings such models have been shown to possess optimal model selection properties. We review background theory and discuss applications of rescaled spike and slab models to prediction problems involving orthogonal polynomials. We first consider global smoothing and discuss potential weaknesses. Some of these deficiencies are remedied by using local regression. The local regression approach relies on an intimate connection between local weighted regression and weighted generalized ridge regression. An important implication is that one can trace the effective degrees of...
Sen, Pranab K.
High-dimensional data models, often with low sample size, abound in many interdisciplinary studies, genomics and large biological systems being most noteworthy. The conventional assumption of multinormality or linearity of regression may not be plausible for such models which are likely to be statistically complex due to a large number of parameters as well as various underlying restraints. As such, parametric approaches may not be very effective. Anything beyond parametrics, albeit, having increased scope and robustness perspectives, may generally be baffled by the low sample size and hence unable to give reasonable margins of errors. Kendall’s tau statistic is exploited in...
Guo, Feng; Dey, Dipak K.; Holsinger, Kent E.
We propose a hierarchical Bayesian model to estimate the proportional contribution of source populations to a newly founded colony. Samples are derived from the first generation offspring in the colony, but mating may occur preferentially among migrants from the same source population. Genotypes of the newly founded colony and source populations are used to estimate the mixture proportions, and the mixture proportions are related to environmental and demographic factors that might affect the colonizing process. We estimate an assortative mating coefficient, mixture proportions, and regression relationships between environmental factors and the mixture proportions in a single hierarchical model. The first-stage...
Malec, Donald; Müller, Peter
In public health management there is a need to produce subnational estimates of health outcomes. Often, however, funds are not available to collect samples large enough to produce traditional survey sample estimates for each subnational area. Although parametric hierarchical methods have been successfully used to derive estimates from small samples, there is a concern that the geographic diversity of the U.S. population may be oversimplified in these models. In this paper, a semi-parametric model is used to describe the geographic variability component of the model. Specifically, we assume Dirichlet process mixtures of normals for county-specific random effects. Results are compared...
van der Vaart, A. W.; van Zanten, J. H.
We review definitions and properties of reproducing kernel Hilbert spaces attached to Gaussian variables and processes, with a view to applications in nonparametric Bayesian statistics using Gaussian priors. The rate of contraction of posterior distributions based on Gaussian priors can be described through a concentration function that is expressed in the reproducing Hilbert space. Absolute continuity of Gaussian measures and concentration inequalities play an important role in understanding and deriving this result. Series expansions of Gaussian variables and transformations of their reproducing kernel Hilbert spaces under linear maps are useful tools to compute the concentration function.
James, Lancelot F.
This paper explores large sample properties of the two-parameter (α, θ) Poisson–Dirichlet Process in two contexts. In a Bayesian context of estimating an unknown probability measure, viewing this process as a natural extension of the Dirichlet process, we explore the consistency and weak convergence of the the two-parameter Poisson–Dirichlet posterior process. We also establish the weak convergence of properly centered two-parameter Poisson–Dirichlet processes for large θ+nα. This latter result complements large θ results for the Dirichlet process and Poisson–Dirichlet sequences, and complements a recent result on large deviation principles for the two-parameter Poisson–Dirichlet process. A crucial component of our results...
Choi, Taeryon; Ramamoorthi, R. V.
In recent years, the literature in the area of Bayesian asymptotics has been rapidly growing. It is increasingly important to understand the concept of posterior consistency and validate specific Bayesian methods, in terms of consistency of posterior distributions. In this paper, we build up some conceptual issues in consistency of posterior distributions, and discuss panoramic views of them by comparing various approaches to posterior consistency that have been investigated in the literature. In addition, we provide interesting results on posterior consistency that deal with non-exponential consistency, improper priors and non i.i.d. (independent but not identically distributed) observations. We describe a...
Chakrabarti, Arijit; Samanta, Tapas
In this article we study the asymptotic predictive optimality of a model selection criterion based on the cross-validatory predictive density, already available in the literature. For a dependent variable and associated explanatory variables, we consider a class of linear models as approximations to the true regression function. One selects a model among these using the criterion under study and predicts a future replicate of the dependent variable by an optimal predictor under the chosen model. We show that for squared error prediction loss, this scheme of prediction performs asymptotically as well as an oracle, where the oracle here refers to...
Bunea, Florentina
In this article we investigate consistency of selection in regression models via the popular Lasso method. Here we depart from the traditional linear regression assumption and consider approximations of the regression function f with elements of a given dictionary of M functions. The target for consistency is the index set of those functions from this dictionary that realize the most parsimonious approximation to f among all linear combinations belonging to an L_{2} ball centered at f and of radius r^{2}_{n, M}. In this framework we show that a consistent estimate of this index set can be derived via ℓ_{1} penalized least...
Bayarri, M. J.; Berger, James O.; Datta, Gauri S.
The Poisson distribution is often used as a standard model for count data. Quite often, however, such data sets are not well fit by a Poisson model because they have more zeros than are compatible with this model. For these situations, a zero-inflated Poisson (ZIP) distribution is often proposed. This article addresses testing a Poisson versus a ZIP model, using Bayesian methodology based on suitable objective priors. Specific choices of objective priors are justified and their properties investigated. The methodology is extended to include covariates in regression models. Several applications are given.
Angers, Jean-François; Delampady, Mohan
A simple Bayesian approach to nonparametric regression is described using fuzzy sets and membership functions. Membership functions are interpreted as likelihood functions for the unknown regression function, so that with the help of a reference prior they can be transformed to prior density functions. The unknown regression function is decomposed into wavelets and a hierarchical Bayesian approach is employed for making inferences on the resulting wavelet coefficients.
Meeden, Glen
In the subjective Bayesian approach uncertainty is described by a prior distribution chosen by the statistician. Fuzzy set theory is another way of representing uncertainty. Here we give a decision theoretic approach which allows a Bayesian to convert their prior distribution into a fuzzy set membership function. This yields a formal relationship between these two different methods of expressing uncertainty.
Sweeting, Trevor J.
We revisit the question of priors that achieve approximate matching of Bayesian and frequentist predictive probabilities. Such priors may be thought of as providing frequentist calibration of Bayesian prediction or simply as devices for producing frequentist prediction regions. Here we analyse the O(n^{−1}) term in the expansion of the coverage probability of a Bayesian prediction region, as derived in [Ann. Statist. 28 (2000) 1414–1426]. Unlike the situation for parametric matching, asymptotic predictive matching priors may depend on the level α. We investigate uniformly predictive matching priors (UPMPs); that is, priors for which this O(n^{−1}) term is zero for all α....
Hall, W. J.; Ding, Keyue
Often in sequential trials additional data become available after a stopping boundary has been reached. A method of incorporating such information from overrunning is developed, based on the “adding weighted Zs” method of combining p-values. This yields a combined p-value for the primary test and a median-unbiased estimate and confidence bounds for the parameter under test. When the amount of overrunning information is proportional to the amount available upon terminating the sequential test, exact inference methods are provided; otherwise, approximate methods are given and evaluated. The context is that of observing a Brownian motion with drift, with either linear stopping...
Sun, Dongchu; Berger, James O.
Objective priors for sequential experiments are considered. Common priors, such as the Jeffreys prior and the reference prior, will typically depend on the stopping rule used for the sequential experiment. New expressions for reference priors are obtained in various contexts, and computational issues involving such priors are considered.
Clarke, Bertrand; Ghosal, Subhashis
Professor Jayanta Kumar Ghosh has contributed massively to various areas of Statistics over the last five decades. Here, we survey some of his most important contributions. In roughly chronological order, we discuss his major results in the areas of sequential analysis, foundations, asymptotics, and Bayesian inference. It is seen that he progressed from thinking about data points, to thinking about data summarization, to the limiting cases of data summarization in as they relate to parameter estimation, and then to more general aspects of modeling including prior and model selection.
Pranab K. Sen has contributed extensively to many areas of Statistics including order statistics, nonparametrics, robust inference, sequential methods, asymptotics, biostatistics, clinical trials, bioenvironmental studies and bioinformatics. His long list of over 600 publications and 22 books and volumes along with numerous citations during the past 5 decades bear testimony to his work.
¶
All three of us have had the good fortune of being associated with him in different capacities. He has given professional and personal advice on many occasions to all of us, and we feel that our lives have certainly been enriched by our association with him. He has...
Gupta, Mayetri
In this article we propose a maximal a posteriori (MAP) criterion for model selection in the motif discovery problem and investigate conditions under which the MAP asymptotically gives a correct prediction of model size. We also investigate robustness of the MAP to prior specification and provide guidelines for choosing prior hyper-parameters for motif models based on sensitivity considerations.