Recursos de colección
Project Euclid (Hosted at Cornell University Library) (203.209 recursos)
The Annals of Statistics
The Annals of Statistics
Battey, Heather; Fan, Jianqing; Liu, Han; Lu, Junwei; Zhu, Ziwei
This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood-based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n/k$, where $n$ is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large $k$ can be, as $n$ grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with...
Lounici, Karim; Meziani, Katia; Peyré, Gabriel
In quantum optics, the quantum state of a light beam is represented through the Wigner function, a density on $\mathbb{R}^{2}$, which may take negative values but must respect intrinsic positivity constraints imposed by quantum physics. In the framework of noisy quantum homodyne tomography with efficiency parameter $1/2<\eta\leq1$, we study the theoretical performance of a kernel estimator of the Wigner function. We prove that it is minimax efficient, up to a logarithmic factor in the sample size, for the $\mathbb{L}_{\infty}$-risk over a class of infinitely differentiable functions. We also compute the lower bound for the $\mathbb{L}_{2}$-risk. We construct an adaptive estimator,...
Mukherjee, Rajarshi; Mukherjee, Sumit; Sen, Subhabrata
In this paper, we study sharp thresholds for detecting sparse signals in $\beta$-models for potentially sparse random graphs. The results demonstrate interesting interplay between graph sparsity, signal sparsity and signal strength. In regimes of moderately dense signals, irrespective of graph sparsity, the detection thresholds mirror corresponding results in independent Gaussian sequence problems. For sparser signals, extreme graph sparsity implies that all tests are asymptotically powerless, irrespective of the signal strength. On the other hand, sharp detection thresholds are obtained, up to matching constants, on denser graphs. The phase transitions mentioned above are sharp. As a crucial ingredient, we study a...
Tibshirani, Ryan J.; Rinaldo, Alessandro; Tibshirani, Rob; Wasserman, Larry
Recently, Tibshirani et al. [J. Amer. Statist. Assoc. 111 (2016) 600–620] proposed a method for making inferences about parameters defined by model selection, in a typical regression setting with normally distributed errors. Here, we study the large sample properties of this method, without assuming normality. We prove that the test statistic of Tibshirani et al. (2016) is asymptotically valid, as the number of samples $n$ grows and the dimension $d$ of the regression problem stays fixed. Our asymptotic result holds uniformly over a wide class of nonnormal error distributions. We also propose an efficient bootstrap version of this test that...
Gao, Fuqing; Xiong, Jie; Zhao, Xingqiu
This paper considers self-normalized limits and moderate deviations of nonparametric maximum likelihood estimators for monotone functions. We obtain their self-normalized Cramér-type moderate deviations and limit distribution theorems for the nonparametric maximum likelihood estimator in the current status model and the Grenander-type estimator. As applications of the results, we present a new procedure to construct asymptotical confidence intervals and asymptotical rejection regions of hypothesis testing for monotone functions. The theoretical results can guarantee that the new test has the probability of type II error tending to 0 exponentially. Simulation studies also show that the new nonparametric test works well for the...
Wu, Weichi; Zhou, Zhou
We consider structural change testing for a wide class of time series M-estimation with nonstationary predictors and errors. Flexible predictor-error relationships, including exogenous, state-heteroscedastic and autoregressive regressions and their mixtures, are allowed. New uniform Bahadur representations are established with nearly optimal approximation rates. A CUSUM-type test statistic based on the gradient vectors of the regression is considered. In this paper, a simple bootstrap method is proposed and is proved to be consistent for M-estimation structural change detection under both abrupt and smooth nonstationarity and temporal dependence. Our bootstrap procedure is shown to have certain asymptotically optimal properties in terms of...
Escanciano, Juan Carlos; Pardo-Fernández, Juan Carlos; Van Keilegom, Ingrid
This article proposes a new general methodology for constructing nonparametric and semiparametric Asymptotically Distribution-Free (ADF) tests for semiparametric hypotheses in regression models for possibly dependent data coming from a strictly stationary process. Classical tests based on the difference between the estimated distributions of the restricted and unrestricted regression errors are not ADF. In this article, we introduce a novel transformation of this difference that leads to ADF tests with well-known critical values. The general methodology is illustrated with applications to testing for parametric models against nonparametric or semiparametric alternatives, and semiparametric constrained mean–variance models. Several Monte Carlo studies and an...
Gregory, Karl B.; Lahiri, Soumendra N.; Nordman, Daniel J.
Quantile regression allows for broad (conditional) characterizations of a response distribution beyond conditional means and is of increasing interest in economic and financial applications. Because quantile regression estimators have complex limiting distributions, several bootstrap methods for the independent data setting have been proposed, many of which involve smoothing steps to improve bootstrap approximations. Currently, no similar advances in smoothed bootstraps exist for quantile regression with dependent data. To this end, we establish a smooth tapered block bootstrap procedure for approximating the distribution of quantile regression estimators for time series. This bootstrap involves two rounds of smoothing in resampling: individual observations...
Pan, Wenliang; Tian, Yuan; Wang, Xueqin; Zhang, Heping
In this paper, we first introduce Ball Divergence, a novel measure of the difference between two probability measures in separable Banach spaces, and show that the Ball Divergence of two probability measures is zero if and only if these two probability measures are identical without any moment assumption. Using Ball Divergence, we present a metric rank test procedure to detect the equality of distribution measures underlying independent samples. It is therefore robust to outliers or heavy-tail data. We show that this multivariate two sample test statistic is consistent with the Ball Divergence, and it converges to a mixture of $\chi^{2}$...
Kong, Xin-Bing
In this paper, we separate the integrated (spot) volatility of an individual Itô process into integrated (spot) systematic and idiosyncratic volatilities, and estimate them by aggregation of local factor analysis (localization) with large-dimensional high-frequency data. We show that, when both the sampling frequency $n$ and the dimensionality $p$ go to infinity and $p\geq C\sqrt{n}$ for some constant $C$, our estimators of High dimensional Itô process; common driving process; specific driving process, integrated High dimensional Itô process, common driving process, specific driving process, systematic and idiosyncratic volatilities are $\sqrt{n}$ ($n^{1/4}$ for spot estimates) consistent, the best rate achieved in estimating the...
Bai, Zhidong; Choi, Kwok Pui; Fujikoshi, Yasunori
In this paper, we study the problem of estimating the number of significant components in principal component analysis (PCA), which corresponds to the number of dominant eigenvalues of the covariance matrix of $p$ variables. Our purpose is to examine the consistency of the estimation criteria AIC and BIC based on the model selection criteria by Akaike [In 2nd International Symposium on Information Theory (1973) 267–281, Akadémia Kiado] and Schwarz [Estimating the dimension of a model 6 (1978) 461–464] under a high-dimensional asymptotic framework. Using random matrix theory techniques, we derive sufficient conditions for the criterion to be strongly consistent for...
Cai, T. Tony; Guntuboyina, Adityanand; Wei, Yuting
In this paper, we consider adaptive estimation of an unknown planar compact, convex set from noisy measurements of its support function. Both the problem of estimating the support function at a point and that of estimating the whole convex set are studied. For pointwise estimation, we consider the problem in a general nonasymptotic framework, which evaluates the performance of a procedure at each individual set, instead of the worst case performance over a large parameter space as in conventional minimax theory. A data-driven adaptive estimator is proposed and is shown to be optimally adaptive to every compact, convex set. For...
Fan, Jianqing; Shao, Qi-Man; Zhou, Wen-Xin
Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries from these data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions about the exogeneity of the covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given a certain number of predictors, namely, the distribution of the correlation of a response variable $Y$...
Cui, Hengjian; Guo, Wenwen; Zhong, Wei
Testing a hypothesis for high-dimensional regression coefficients is of fundamental importance in the statistical theory and applications. In this paper, we develop a new test for the overall significance of coefficients in high-dimensional linear regression models based on an estimated U-statistics of order two. With the aid of the martingale central limit theorem, we prove that the asymptotic distributions of the proposed test are normal under two different distribution assumptions. Refitted cross-validation (RCV) variance estimation is utilized to avoid the overestimation of the variance and enhance the empirical power. We examine the finite-sample performances of the proposed test via Monte...
Shi, Chengchun; Fan, Ailin; Song, Rui; Lu, Wenbin
Precision medicine is a medical paradigm that focuses on finding the most effective treatment decision based on individual patient information. For many complex diseases, such as cancer, treatment decisions need to be tailored over time according to patients’ responses to previous treatments. Such an adaptive strategy is referred as a dynamic treatment regime. A major challenge in deriving an optimal dynamic treatment regime arises when an extraordinary large number of prognostic factors, such as patient’s genetic information, demographic characteristics, medical history and clinical measurements over time are available, but not all of them are necessary for making treatment decision. This...
Qiu, Yumou; Chen, Song Xi; Nettleton, Dan
Motivated by the analysis of RNA sequencing (RNA-seq) data for genes differentially expressed across multiple conditions, we consider detecting rare and faint signals in high-dimensional response variables. We address the signal detection problem under a general framework, which includes generalized linear models for count-valued responses as special cases. We propose a test statistic that carries out a multi-level thresholding on maximum likelihood estimators (MLEs) of the signals, based on a new Cramér-type moderate deviation result for multidimensional MLEs. Based on the multi-level thresholding test, a multiple testing procedure is proposed for signal identification. Numerical simulations and a case study on...
Chen, Xi; Liu, Weidong
Testing independence among a number of (ultra) high-dimensional random samples is a fundamental and challenging problem. By arranging $n$ identically distributed $p$-dimensional random vectors into a $p\times n$ data matrix, we investigate the problem of testing independence among columns under the matrix-variate normal modeling of data. We propose a computationally simple and tuning-free test statistic, characterize its limiting null distribution, analyze the statistical power and prove its minimax optimality. As an important by-product of the test statistic, a ratio-consistent estimator for the quadratic functional of a covariance matrix from correlated samples is developed. We further study the effect of correlation...
Kaufmann, Emilie
This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem. Our main contribution is to prove that the Bayes-UCB algorithm, which relies on quantiles of posterior distributions, is asymptotically optimal when the reward distributions belong to a one-dimensional exponential family, for a large class of prior distributions. We also show that the Bayesian literature gives new insight on what kind of exploration rates could be used in frequentist, UCB-type algorithms. Indeed, approximations of the Bayesian optimal solution or the Finite-Horizon Gittins indices provide a justification for...
Fan, Jianqing; Liu, Han; Sun, Qiang; Zhang, Tong
We propose a computational framework named iterative local adaptive majorize-minimization (I-LAMM) to simultaneously control algorithmic complexity and statistical error when fitting high-dimensional models. I-LAMM is a two-stage algorithmic implementation of the local linear approximation to a family of folded concave penalized quasi-likelihood. The first stage solves a convex program with a crude precision tolerance to obtain a coarse initial estimator, which is further refined in the second stage by iteratively solving a sequence of convex programs with smaller precision tolerances. Theoretically, we establish a phase transition: the first stage has a sublinear iteration complexity, while the second stage achieves an...
Lv, Shaogao; Lin, Huazhen; Lian, Heng; Huang, Jian
This paper considers the estimation of the sparse additive quantile regression (SAQR) in high-dimensional settings. Given the nonsmooth nature of the quantile loss function and the nonparametric complexities of the component function estimation, it is challenging to analyze the theoretical properties of ultrahigh-dimensional SAQR. We propose a regularized learning approach with a two-fold Lasso-type regularization in a reproducing kernel Hilbert space (RKHS) for SAQR. We establish nonasymptotic oracle inequalities for the excess risk of the proposed estimator without any coherent conditions. If additional assumptions including an extension of the restricted eigenvalue condition are satisfied, the proposed method enjoys sharp oracle...