Recursos de colección
Project Euclid (Hosted at Cornell University Library) (203.669 recursos)
Statistical Science
Statistical Science
Bru, Marie-France; Bru, Bernard
Translated from the French by Glenn Shafer, the French text will appear as Chapter 1 of Volume 2 of Les jeux de l’infini et du hasard, by Marie-France and Bernard Bru, to be published by the Presses universitaires de Franche-Comté. The translation is published here with the permission of the publisher and the surviving author. The text has been edited to omit most references to other parts of the book. The authors extensive notes, which provide many additional references and historical details, have also been omitted.
Shafer, Glenn
This note introduces Marie-France and Bernard Bru’s forthcoming book on the history of probability, especially its chapter on dice games, translated in this issue of Statistical Science, and its commentary on the history of fair price in the settlement of contracts.
¶ As the Brus remind us, the traditions of counting chances in dice games and estimating fair price came together in the correspondence between Pascal and Fermat in 1654. To solve the problem of dividing the stakes in a prematurely halted game, Fermat used combinatorial principles that had been used for centuries to analyze dice games, while Pascal used principles...
Sun, Yifei; Qin, Jing; Huang, Chiung-Yu
It is well known that truncated survival data are subject to sampling bias, where the sampling weight depends on the underlying truncation time distribution. Recently, there has been a rising interest in developing methods to better exploit the information about the truncation time, thus the sampling weight function, to obtain more efficient estimation. In this paper, we propose to treat truncation and censoring as “missing data mechanism” and apply the missing information principle to develop a unified framework for analyzing left-truncated and right-censored data with unspecified or known truncation time distributions. Our framework is structured in a way that is...
Fithian, William; Mazumder, Rahul
We explore a general statistical framework for low-rank modeling of matrix-valued data, based on convex optimization with a generalized nuclear norm penalty. We study several related problems: the usual low-rank matrix completion problem with flexible loss functions arising from generalized linear models; reduced-rank regression and multi-task learning; and generalizations of both problems where side information about rows and columns is available, in the form of features or smoothing kernels. We show that our approach encompasses maximum a posteriori estimation arising from Bayesian hierarchical modeling with latent factors, and discuss ramifications of the missing-data mechanism in the context of matrix completion....
Ding, Peng; Li, Fan
Inferring causal effects of treatments is a central goal in many disciplines. The potential outcomes framework is a main statistical approach to causal inference, in which a causal effect is defined as a comparison of the potential outcomes of the same units under different treatment conditions. Because for each unit at most one of the potential outcomes is observed and the rest are missing, causal inference is inherently a missing data problem. Indeed, there is a close analogy in the terminology and the inferential framework between causal inference and missing data. Despite the intrinsic connection between the two subjects, statistical...
Linero, Antonio R.; Daniels, Michael J.
Missing data is almost always present in real datasets, and introduces several statistical issues. One fundamental issue is that, in the absence of strong uncheckable assumptions, effects of interest are typically not nonparametrically identified. In this article, we review the generic approach of the use of identifying restrictions from a likelihood-based perspective, and provide points of contact for several recently proposed methods. An emphasis of this review is on restrictions for nonmonotone missingness, a subject that has been treated sparingly in the literature. We also present a general, fully Bayesian, approach which is widely applicable and capable of handling a...
Seaman, Shaun R.; Vansteelandt, Stijn
Most methods for handling incomplete data can be broadly classified as inverse probability weighting (IPW) strategies or imputation strategies. The former model the occurrence of incomplete data; the latter, the distribution of the missing variables given observed variables in each missingness pattern. Imputation strategies are typically more efficient, but they can involve extrapolation, which is difficult to diagnose and can lead to large bias. Double robust (DR) methods combine the two approaches. They are typically more efficient than IPW and more robust to model misspecification than imputation. We give a formal introduction to DR estimation of the mean of a...
Audigier, Vincent; White, Ian R.; Jolani, Shahab; Debray, Thomas P. A.; Quartagno, Matteo; Carpenter, James; van Buuren, Stef; Resche-Rigon, Matthieu
We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show that these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes...
Murray, Jared S.
Multiple imputation is a straightforward method for handling missing data in a principled fashion. This paper presents an overview of multiple imputation, including important theoretical results and their practical implications for generating and using multiple imputations. A review of strategies for generating imputations follows, including recent developments in flexible joint modeling and sequential regression/chained equations/fully conditional specification approaches. Finally, we compare and contrast different methods for generating imputations on a range of criteria before identifying promising avenues for future research.
Josse, Julie; Reiter, Jerome P.
Zeitouni, Ofer
Sathamangalam Ranga Iyengar Srinivasa (Raghu) Varadhan was born in Chennai (then Madras). He received his Bachelor’s and Master’s degree from Presidency College, Madras, and his PhD from the Indian Statistical Institute in Kolkata, in 1963. That same year he came to the Courant Institute, New York University as a postdoc, and remained there as faculty member throughout his career. He has received numerous prizes and recognitions, including the Abel Prize in 2007, the US National Medal of Science in 2010 and honorary degrees from the Chennai Mathematical Institute, Duke University, the Indian Statistical Institute, Kolkata and the University of Paris.
¶...
Stigler, Stephen M.
Roughly half of Bayes’s famous essay was written by Richard Price, including the Appendix with all of the numerical examples. A study of this Appendix reveals Price (1) unusually for the time, felt it necessary to allow in his analysis for a hypothesis having been suggested by the same data used in its analysis, (2) was motivated (covertly in 1763, overtly in 1767) to undertake the study to refute David Hume on miracles, and (3) displayed a remarkable sense of collegiality in scientific controversy that should stand as a model for the present day. Price’s analysis of the posterior in...
Bai, Shuyang; Taqqu, Murad S.
Under long memory, the limit theorems for normalized sums of random variables typically involve a positive integer called “Hermite rank.” There is a different limit for each Hermite rank. From a statistical point of view, however, we argue that a rank other than one is unstable, whereas, a rank equal to one is stable. We provide empirical evidence supporting this argument. This has important consequences. Assuming a higher-order rank when it is not really there usually results in underestimating the order of the fluctuations of the statistic of interest. We illustrate this through various examples involving the sample variance, the...
Gustafson, Paul; McCandless, Lawrence C.
Sensitivity analysis is used widely in statistical work. Yet the notion and properties of sensitivity parameters are often left quite vague and intuitive. Working in the Bayesian paradigm, we present a definition of when a sensitivity parameter is “pure,” and we discuss the implications of a parameter meeting or not meeting this definition. We also present a diagnostic with which the extent of violations of purity can be visualized.
Kendall, Michelle; Ayabina, Diepreye; Xu, Yuanwei; Stimson, James; Colijn, Caroline
Reconstructing who infected whom is a central challenge in analysing epidemiological data. Recently, advances in sequencing technology have led to increasing interest in Bayesian approaches to inferring who infected whom using genetic data from pathogens. The logic behind such approaches is that isolates that are nearly genetically identical are more likely to have been recently transmitted than those that are very different. A number of methods have been developed to perform this inference. However, testing their convergence, examining posterior sets of transmission trees and comparing methods’ performance are challenged by the fact that the object of inference—the transmission tree—is a...
Bretó, Carles
Likelihood-based statistical inference has been considered in most scientific fields involving stochastic modeling. This includes infectious disease dynamics, where scientific understanding can help capture biological processes in so-called mechanistic models and their likelihood functions. However, when the likelihood of such mechanistic models lacks a closed-form expression, computational burdens are substantial. In this context, algorithmic advances have facilitated likelihood maximization, promoting the study of novel data-motivated mechanistic models over the last decade. Reviewing these models is the focus of this paper. In particular, we highlight statistical aspects of these models like overdispersion, which is key in the interface between nonlinear infectious...
Kypraios, Theodore; O’Neill, Philip D.
The vast majority of models for the spread of communicable diseases are parametric in nature and involve underlying assumptions about how the disease spreads through a population. In this article, we consider the use of Bayesian nonparametric approaches to analysing data from disease outbreaks. Specifically we focus on methods for estimating the infection process in simple models under the assumption that this process has an explicit time-dependence.
Birrell, Paul J.; De Angelis, Daniela; Presanis, Anne M.
In recent years, the role of epidemic models in informing public health policies has progressively grown. Models have become increasingly realistic and more complex, requiring the use of multiple data sources to estimate all quantities of interest. This review summarises the different types of stochastic epidemic models that use evidence synthesis and highlights current challenges.
Gibson, Gavin J.; Streftaris, George; Thong, David
Model criticism is a growing focus of research in stochastic epidemic modelling, following the successful addressing of model fitting and parameter estimation via powerful computationally intensive statistical methods in recent decades. In this paper, we consider a variety of stochastic representations of epidemic outbreaks, with emphasis on individual-based continuous-time models, and review the range of model comparison and assessment approaches currently applied. We highlight some of the factors that can serve to impede checking and criticism of epidemic models such as lack of replication, partial observation of processes, lack of prior knowledge on parameters in competing models, the nonnested nature...
McKinley, Trevelyan J.; Vernon, Ian; Andrianakis, Ioannis; McCreesh, Nicky; Oakley, Jeremy E.; Nsubuga, Rebecca N.; Goldstein, Michael; White, Richard G.
Approximate Bayesian Computation (ABC) and other simulation-based inference methods are becoming increasingly used for inference in complex systems, due to their relative ease-of-implementation. We briefly review some of the more popular variants of ABC and their application in epidemiology, before using a real-world model of HIV transmission to illustrate some of challenges when applying ABC methods to high-dimensional, computationally intensive models. We then discuss an alternative approach—history matching—that aims to address some of these issues, and conclude with a comparison between these different methodologies.