Mostrando recursos 1 - 20 de 94

  1. Distributed testing and estimation under sparse high dimensional models

    Battey, Heather; Fan, Jianqing; Liu, Han; Lu, Junwei; Zhu, Ziwei
    This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood-based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n/k$, where $n$ is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large $k$ can be, as $n$ grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with...

  2. Adaptive sup-norm estimation of the Wigner function in noisy quantum homodyne tomography

    Lounici, Karim; Meziani, Katia; Peyré, Gabriel
    In quantum optics, the quantum state of a light beam is represented through the Wigner function, a density on $\mathbb{R}^{2}$, which may take negative values but must respect intrinsic positivity constraints imposed by quantum physics. In the framework of noisy quantum homodyne tomography with efficiency parameter $1/2<\eta\leq1$, we study the theoretical performance of a kernel estimator of the Wigner function. We prove that it is minimax efficient, up to a logarithmic factor in the sample size, for the $\mathbb{L}_{\infty}$-risk over a class of infinitely differentiable functions. We also compute the lower bound for the $\mathbb{L}_{2}$-risk. We construct an adaptive estimator,...

  3. Detection thresholds for the $\beta$-model on sparse graphs

    Mukherjee, Rajarshi; Mukherjee, Sumit; Sen, Subhabrata
    In this paper, we study sharp thresholds for detecting sparse signals in $\beta$-models for potentially sparse random graphs. The results demonstrate interesting interplay between graph sparsity, signal sparsity and signal strength. In regimes of moderately dense signals, irrespective of graph sparsity, the detection thresholds mirror corresponding results in independent Gaussian sequence problems. For sparser signals, extreme graph sparsity implies that all tests are asymptotically powerless, irrespective of the signal strength. On the other hand, sharp detection thresholds are obtained, up to matching constants, on denser graphs. The phase transitions mentioned above are sharp. As a crucial ingredient, we study a...

  4. Uniform asymptotic inference and the bootstrap after model selection

    Tibshirani, Ryan J.; Rinaldo, Alessandro; Tibshirani, Rob; Wasserman, Larry
    Recently, Tibshirani et al. [J. Amer. Statist. Assoc. 111 (2016) 600–620] proposed a method for making inferences about parameters defined by model selection, in a typical regression setting with normally distributed errors. Here, we study the large sample properties of this method, without assuming normality. We prove that the test statistic of Tibshirani et al. (2016) is asymptotically valid, as the number of samples $n$ grows and the dimension $d$ of the regression problem stays fixed. Our asymptotic result holds uniformly over a wide class of nonnormal error distributions. We also propose an efficient bootstrap version of this test that...

  5. Moderate deviations and nonparametric inference for monotone functions

    Gao, Fuqing; Xiong, Jie; Zhao, Xingqiu
    This paper considers self-normalized limits and moderate deviations of nonparametric maximum likelihood estimators for monotone functions. We obtain their self-normalized Cramér-type moderate deviations and limit distribution theorems for the nonparametric maximum likelihood estimator in the current status model and the Grenander-type estimator. As applications of the results, we present a new procedure to construct asymptotical confidence intervals and asymptotical rejection regions of hypothesis testing for monotone functions. The theoretical results can guarantee that the new test has the probability of type II error tending to 0 exponentially. Simulation studies also show that the new nonparametric test works well for the...

  6. Gradient-based structural change detection for nonstationary time series M-estimation

    Wu, Weichi; Zhou, Zhou
    We consider structural change testing for a wide class of time series M-estimation with nonstationary predictors and errors. Flexible predictor-error relationships, including exogenous, state-heteroscedastic and autoregressive regressions and their mixtures, are allowed. New uniform Bahadur representations are established with nearly optimal approximation rates. A CUSUM-type test statistic based on the gradient vectors of the regression is considered. In this paper, a simple bootstrap method is proposed and is proved to be consistent for M-estimation structural change detection under both abrupt and smooth nonstationarity and temporal dependence. Our bootstrap procedure is shown to have certain asymptotically optimal properties in terms of...

  7. Asymptotic distribution-free tests for semiparametric regressions with dependent data

    Escanciano, Juan Carlos; Pardo-Fernández, Juan Carlos; Van Keilegom, Ingrid
    This article proposes a new general methodology for constructing nonparametric and semiparametric Asymptotically Distribution-Free (ADF) tests for semiparametric hypotheses in regression models for possibly dependent data coming from a strictly stationary process. Classical tests based on the difference between the estimated distributions of the restricted and unrestricted regression errors are not ADF. In this article, we introduce a novel transformation of this difference that leads to ADF tests with well-known critical values. The general methodology is illustrated with applications to testing for parametric models against nonparametric or semiparametric alternatives, and semiparametric constrained mean–variance models. Several Monte Carlo studies and an...

  8. A smooth block bootstrap for quantile regression with time series

    Gregory, Karl B.; Lahiri, Soumendra N.; Nordman, Daniel J.
    Quantile regression allows for broad (conditional) characterizations of a response distribution beyond conditional means and is of increasing interest in economic and financial applications. Because quantile regression estimators have complex limiting distributions, several bootstrap methods for the independent data setting have been proposed, many of which involve smoothing steps to improve bootstrap approximations. Currently, no similar advances in smoothed bootstraps exist for quantile regression with dependent data. To this end, we establish a smooth tapered block bootstrap procedure for approximating the distribution of quantile regression estimators for time series. This bootstrap involves two rounds of smoothing in resampling: individual observations...

  9. Ball Divergence: Nonparametric two sample test

    Pan, Wenliang; Tian, Yuan; Wang, Xueqin; Zhang, Heping
    In this paper, we first introduce Ball Divergence, a novel measure of the difference between two probability measures in separable Banach spaces, and show that the Ball Divergence of two probability measures is zero if and only if these two probability measures are identical without any moment assumption. Using Ball Divergence, we present a metric rank test procedure to detect the equality of distribution measures underlying independent samples. It is therefore robust to outliers or heavy-tail data. We show that this multivariate two sample test statistic is consistent with the Ball Divergence, and it converges to a mixture of $\chi^{2}$...

  10. On the systematic and idiosyncratic volatility with large panel high-frequency data

    Kong, Xin-Bing
    In this paper, we separate the integrated (spot) volatility of an individual Itô process into integrated (spot) systematic and idiosyncratic volatilities, and estimate them by aggregation of local factor analysis (localization) with large-dimensional high-frequency data. We show that, when both the sampling frequency $n$ and the dimensionality $p$ go to infinity and $p\geq C\sqrt{n}$ for some constant $C$, our estimators of High dimensional Itô process; common driving process; specific driving process, integrated High dimensional Itô process, common driving process, specific driving process, systematic and idiosyncratic volatilities are $\sqrt{n}$ ($n^{1/4}$ for spot estimates) consistent, the best rate achieved in estimating the...

  11. Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis

    Bai, Zhidong; Choi, Kwok Pui; Fujikoshi, Yasunori
    In this paper, we study the problem of estimating the number of significant components in principal component analysis (PCA), which corresponds to the number of dominant eigenvalues of the covariance matrix of $p$ variables. Our purpose is to examine the consistency of the estimation criteria AIC and BIC based on the model selection criteria by Akaike [In 2nd International Symposium on Information Theory (1973) 267–281, Akadémia Kiado] and Schwarz [Estimating the dimension of a model 6 (1978) 461–464] under a high-dimensional asymptotic framework. Using random matrix theory techniques, we derive sufficient conditions for the criterion to be strongly consistent for...

  12. Adaptive estimation of planar convex sets

    Cai, T. Tony; Guntuboyina, Adityanand; Wei, Yuting
    In this paper, we consider adaptive estimation of an unknown planar compact, convex set from noisy measurements of its support function. Both the problem of estimating the support function at a point and that of estimating the whole convex set are studied. For pointwise estimation, we consider the problem in a general nonasymptotic framework, which evaluates the performance of a procedure at each individual set, instead of the worst case performance over a large parameter space as in conventional minimax theory. A data-driven adaptive estimator is proposed and is shown to be optimally adaptive to every compact, convex set. For...

  13. Are discoveries spurious? Distributions of maximum spurious correlations and their applications

    Fan, Jianqing; Shao, Qi-Man; Zhou, Wen-Xin
    Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries from these data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions about the exogeneity of the covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given a certain number of predictors, namely, the distribution of the correlation of a response variable $Y$...

  14. Test for high-dimensional regression coefficients using refitted cross-validation variance estimation

    Cui, Hengjian; Guo, Wenwen; Zhong, Wei
    Testing a hypothesis for high-dimensional regression coefficients is of fundamental importance in the statistical theory and applications. In this paper, we develop a new test for the overall significance of coefficients in high-dimensional linear regression models based on an estimated U-statistics of order two. With the aid of the martingale central limit theorem, we prove that the asymptotic distributions of the proposed test are normal under two different distribution assumptions. Refitted cross-validation (RCV) variance estimation is utilized to avoid the overestimation of the variance and enhance the empirical power. We examine the finite-sample performances of the proposed test via Monte...

  15. High-dimensional $A$-learning for optimal dynamic treatment regimes

    Shi, Chengchun; Fan, Ailin; Song, Rui; Lu, Wenbin
    Precision medicine is a medical paradigm that focuses on finding the most effective treatment decision based on individual patient information. For many complex diseases, such as cancer, treatment decisions need to be tailored over time according to patients’ responses to previous treatments. Such an adaptive strategy is referred as a dynamic treatment regime. A major challenge in deriving an optimal dynamic treatment regime arises when an extraordinary large number of prognostic factors, such as patient’s genetic information, demographic characteristics, medical history and clinical measurements over time are available, but not all of them are necessary for making treatment decision. This...

  16. Detecting rare and faint signals via thresholding maximum likelihood estimators

    Qiu, Yumou; Chen, Song Xi; Nettleton, Dan
    Motivated by the analysis of RNA sequencing (RNA-seq) data for genes differentially expressed across multiple conditions, we consider detecting rare and faint signals in high-dimensional response variables. We address the signal detection problem under a general framework, which includes generalized linear models for count-valued responses as special cases. We propose a test statistic that carries out a multi-level thresholding on maximum likelihood estimators (MLEs) of the signals, based on a new Cramér-type moderate deviation result for multidimensional MLEs. Based on the multi-level thresholding test, a multiple testing procedure is proposed for signal identification. Numerical simulations and a case study on...

  17. Testing independence with high-dimensional correlated samples

    Chen, Xi; Liu, Weidong
    Testing independence among a number of (ultra) high-dimensional random samples is a fundamental and challenging problem. By arranging $n$ identically distributed $p$-dimensional random vectors into a $p\times n$ data matrix, we investigate the problem of testing independence among columns under the matrix-variate normal modeling of data. We propose a computationally simple and tuning-free test statistic, characterize its limiting null distribution, analyze the statistical power and prove its minimax optimality. As an important by-product of the test statistic, a ratio-consistent estimator for the quadratic functional of a covariance matrix from correlated samples is developed. We further study the effect of correlation...

  18. On Bayesian index policies for sequential resource allocation

    Kaufmann, Emilie
    This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem. Our main contribution is to prove that the Bayes-UCB algorithm, which relies on quantiles of posterior distributions, is asymptotically optimal when the reward distributions belong to a one-dimensional exponential family, for a large class of prior distributions. We also show that the Bayesian literature gives new insight on what kind of exploration rates could be used in frequentist, UCB-type algorithms. Indeed, approximations of the Bayesian optimal solution or the Finite-Horizon Gittins indices provide a justification for...

  19. I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error

    Fan, Jianqing; Liu, Han; Sun, Qiang; Zhang, Tong
    We propose a computational framework named iterative local adaptive majorize-minimization (I-LAMM) to simultaneously control algorithmic complexity and statistical error when fitting high-dimensional models. I-LAMM is a two-stage algorithmic implementation of the local linear approximation to a family of folded concave penalized quasi-likelihood. The first stage solves a convex program with a crude precision tolerance to obtain a coarse initial estimator, which is further refined in the second stage by iteratively solving a sequence of convex programs with smaller precision tolerances. Theoretically, we establish a phase transition: the first stage has a sublinear iteration complexity, while the second stage achieves an...

  20. Oracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert space

    Lv, Shaogao; Lin, Huazhen; Lian, Heng; Huang, Jian
    This paper considers the estimation of the sparse additive quantile regression (SAQR) in high-dimensional settings. Given the nonsmooth nature of the quantile loss function and the nonparametric complexities of the component function estimation, it is challenging to analyze the theoretical properties of ultrahigh-dimensional SAQR. We propose a regularized learning approach with a two-fold Lasso-type regularization in a reproducing kernel Hilbert space (RKHS) for SAQR. We establish nonasymptotic oracle inequalities for the excess risk of the proposed estimator without any coherent conditions. If additional assumptions including an extension of the restricted eigenvalue condition are satisfied, the proposed method enjoys sharp oracle...

Aviso de cookies: Usamos cookies propias y de terceros para mejorar nuestros servicios, para análisis estadístico y para mostrarle publicidad. Si continua navegando consideramos que acepta su uso en los términos establecidos en la Política de cookies.