Recursos de colección
Project Euclid (Hosted at Cornell University Library) (192.674 recursos)
Statistical Science
Statistical Science
Etz, Alexander; Wagenmakers, Eric-Jan
This article brings attention to some historical developments that gave rise to the Bayes factor for testing a point null hypothesis against a composite alternative. In line with current thinking, we find that the conceptual innovation—to assign prior mass to a general law—is due to a series of three articles by Dorothy Wrinch and Sir Harold Jeffreys (1919, 1921, 1923a). However, our historical investigation also suggests that in 1932, J. B. S. Haldane made an important contribution to the development of the Bayes factor by proposing the use of a mixture prior comprising a point mass and a continuous probability...
Lohr, Sharon L.; Raghunathan, Trivellore E.
Collecting data using probability samples can be expensive, and response rates for many household surveys are decreasing. The increasing availability of large data sources opens new opportunities for statisticians to use the information in survey data more efficiently by combining survey data with information from these other sources. We review some of the work done to date on statistical methods for combining information from multiple data sources, discuss the limitations and challenges for different methods that have been proposed, and describe research that is needed for combining survey estimates.
Schonlau, Matthias; Couper, Mick P.
Web surveys can be conducted relatively fast and at relatively low cost. However, Web surveys are often conducted with nonprobability samples and, therefore, a major concern is generalizability. There are two main approaches to address this concern: One, find a way to conduct Web surveys on probability samples without losing most of the cost and speed advantages (e.g., by using mixed-mode approaches or probability-based panel surveys). Two, make adjustments (e.g., propensity scoring, post-stratification, GREG) to nonprobability samples using auxiliary variables. We review both of these approaches as well as lesser-known ones such as respondent-driven sampling. There are many different ways...
Lumley, Thomas; Scott, Alastair
Data from complex surveys are being used increasingly to build the same sort of explanatory and predictive models used in the rest of statistics. Although the assumptions underlying standard statistical methods are not even approximately valid for most survey data, analogues of most of the features of standard regression packages are now available for use with survey data. We review recent developments in the field and illustrate their use on data from NHANES.
Elliott, Michael R.; Valliant, Richard
Although selecting a probability sample has been the standard for decades when making inferences from a sample to a finite population, incentives are increasing to use nonprobability samples. In a world of “big data”, large amounts of data are available that are faster and easier to collect than are probability samples. Design-based inference, in which the distribution for inference is generated by the random mechanism used by the sampler, cannot be used for nonprobability samples. One alternative is quasi-randomization in which pseudo-inclusion probabilities are estimated based on covariates available for samples and nonsample units. Another is superpopulation modeling for the...
Chen, Qixuan; Elliott, Michael R.; Haziza, David; Yang, Ye; Ghosh, Malay; Little, Roderick J. A.; Sedransk, Joseph; Thompson, Mary
In sample surveys, the sample units are typically chosen using a complex design. This may lead to a selection effect and, if uncorrected in the analysis, may lead to biased inferences. To mitigate the effect on inferences of deviations from a simple random sample a common technique is to use survey weights in the analysis. This article reviews approaches to address possible inefficiency in estimation resulting from such weighting.
¶
To improve inferences we emphasize modifications of the basic design-based weight, that is, the inverse of a unit’s inclusion probability. These techniques include weight trimming, weight modelling and incorporating weights via models...
Haziza, David; Beaumont, Jean-François
Weighting is one of the central steps in surveys. The typical weighting process involves three major stages. At the first stage, each unit is assigned a base weight, which is defined as the inverse of its inclusion probability. The base weights are then modified to account for unit nonresponse. At the last stage, the nonresponse-adjusted weights are further modified to ensure consistency between survey estimates and known population totals. When needed, the weights undergo a last modification through weight trimming or weight smoothing methods in order to improve the efficiency of survey estimates. This article provides an overview of the...
Breidt, F. Jay; Opsomer, Jean D.
This paper reviews the design-based, model-assisted approach to using data from a complex survey together with auxiliary information to estimate finite population parameters. A general recipe for deriving model-assisted estimators is presented and design-based asymptotic analysis for such estimators is reviewed. The recipe allows for a very broad class of prediction methods, with examples from the literature including linear models, linear mixed models, nonparametric regression and machine learning techniques.
Tillé, Yves; Wilhelm, Matthieu
The aim of this paper is twofold. First, three theoretical principles are formalized: randomization, overrepresentation and restriction. We develop these principles and give a rationale for their use in choosing the sampling design in a systematic way. In the model-assisted framework, knowledge of the population is formalized by modelling the population and the sampling design is chosen accordingly. We show how the principles of overrepresentation and of restriction naturally arise from the modelling of the population. The balanced sampling then appears as a consequence of the modelling. Second, a review of probability balanced sampling is presented through the model-assisted framework....
Skinner, Chris; Wakefield, Jon
We give a brief overview of common sampling designs used in a survey setting, and introduce the principal inferential paradigms under which data from complex surveys may be analyzed. In particular, we distinguish between design-based, model-based and model-assisted approaches. Simple examples highlight the key differences between the approaches. We discuss the interplay between inferential approaches and targets of inference and the important issue of variance estimation.
Mukhopadhyay, Nitis
Lynne Billard was born in Toowomba, Australia. She earned her B.Sc. (Honors I) in 1966, and a Ph.D. degree in 1969, both from the University of New South Wales, Australia. She is perhaps best known for her ground breaking research in the areas of HIV/AIDS and Symbolic Data Analysis. Broadly put, Professor Billard’s research interests include epidemic theory, stochastic processes, sequential analysis, time series analysis and symbolic data. She has written extensively in all these areas and more through numerous fundamental contributions. She has published more than 200 research papers in some of the leading international journals including Australian Journal...
Habermann, Hermann; Kennedy, Courtney; Lahiri, Partha
Professor Robert M. Groves is among the world leaders in survey methodology and survey statistics over the last four decades. Groves’ research—particularly on survey nonresponse, survey errors and costs, and responsive design—helped to provide intellectual footing for a new academic discipline. In addition, Groves has had remarkable success building academic programs that integrate the social sciences with statistics and computer science. He was instrumental in the development of degree programs in survey methodology at the University of Michigan and the University of Maryland. Recently, as Provost of Georgetown University, he has championed the use of big data sets to increase...
Lerch, Sebastian; Thorarinsdottir, Thordis L.; Ravazzolo, Francesco; Gneiting, Tilmann
In public discussions of the quality of forecasts, attention typically focuses on the predictive performance in cases of extreme events. However, the restriction of conventional forecast evaluation methods to subsets of extreme observations has unexpected and undesired effects, and is bound to discredit skillful forecasts when the signal-to-noise ratio in the data generating process is low. Conditioning on outcomes is incompatible with the theoretical assumptions of established forecast evaluation methods, thereby confronting forecasters with what we refer to as the forecaster’s dilemma. For probabilistic forecasts, proper weighted scoring rules have been proposed as decision-theoretically justifiable alternatives for forecast evaluation with...
Flynn, Cheryl J.; Hurvich, Clifford M.; Simonoff, Jeffrey S.
The Lasso is a computationally efficient regression regularization procedure that can produce sparse estimators when the number of predictors $(p)$ is large. Oracle inequalities provide probability loss bounds for the Lasso estimator at a deterministic choice of the regularization parameter. These bounds tend to zero if $p$ is appropriately controlled, and are thus commonly cited as theoretical justification for the Lasso and its ability to handle high-dimensional settings. Unfortunately, in practice the regularization parameter is not selected to be a deterministic quantity, but is instead chosen using a random, data-dependent procedure. To address this shortcoming of previous theoretical work, we...
Chopin, Nicolas; Ridgway, James
Whenever a new approach to perform Bayesian computation is introduced, a common practice is to showcase this approach on a binary regression model and datasets of moderate size. This paper discusses to which extent this practice is sound. It also reviews the current state of the art of Bayesian computation, using binary regression as a running example. Both sampling-based algorithms (importance sampling, MCMC and SMC) and fast approximations (Laplace, VB and EP) are covered. Extensive numerical results are provided, and are used to make recommendations to both end users and Bayesian computation experts. Implications for other problems (variable selection) and...
Chen, Jiahua
The large-sample properties of likelihood-based statistical inference under mixture models have received much attention from statisticians. Although the consistency of the nonparametric MLE is regarded as a standard conclusion, many researchers ignore the precise conditions required on the mixture model. An incorrect claim of consistency can lead to false conclusions even if the mixture model under investigation seems well behaved. Under a finite normal mixture model, for instance, the consistency of the plain MLE is often erroneously assumed in spite of recent research breakthroughs. This paper streamlines the consistency results for the nonparametric MLE in general, and in particular for...
Simpson, Daniel; Rue, Håvard; Riebler, Andrea; Martins, Thiago G.; Sørbye, Sigrunn H.
Dunson, David B.
Robert, Christian P.; Rousseau, Judith
Hodges, James S.