Mostrando recursos 1 - 20 de 109

  1. Accuracy assessment for high-dimensional linear regression

    Cai, T. Tony; Guo, Zijian
    This paper considers point and interval estimation of the $\ell_{q}$ loss of an estimator in high-dimensional linear regression with random design. We establish the minimax rate for estimating the $\ell_{q}$ loss and the minimax expected length of confidence intervals for the $\ell_{q}$ loss of rate-optimal estimators of the regression vector, including commonly used estimators such as Lasso, scaled Lasso, square-root Lasso and Dantzig Selector. Adaptivity of confidence intervals for the $\ell_{q}$ loss is also studied. Both the setting of the known identity design covariance matrix and known noise level and the setting of unknown design covariance matrix and unknown noise...

  2. A Bayesian approach to the selection of two-level multi-stratum factorial designs

    Chang, Ming-Chung; Cheng, Ching-Shui
    In a multi-stratum factorial experiment, there are multiple error terms (strata) with different variances that arise from complicated structures of the experimental units. For unstructured experimental units, minimum aberration is a popular criterion for choosing regular fractional factorial designs. One difficulty in extending this criterion to multi-stratum factorial designs is that the formulation of a word length pattern based on which minimum aberration is defined requires an order of desirability among the relevant words, but a natural order is often lacking. Furthermore, a criterion based only on word length patterns does not account for the different stratum variances. Mitchell, Morris...

  3. Optimal shrinkage of eigenvalues in the spiked covariance model

    Donoho, David; Gavish, Matan; Johnstone, Iain
    We show that in a common high-dimensional covariance model, the choice of loss function has a profound effect on optimal estimation. ¶ In an asymptotic framework based on the spiked covariance model and use of orthogonally invariant estimators, we show that optimal estimation of the population covariance matrix boils down to design of an optimal shrinker $\eta$ that acts elementwise on the sample eigenvalues. Indeed, to each loss function there corresponds a unique admissible eigenvalue shrinker $\eta^{*}$ dominating all other shrinkers. The shape of the optimal shrinker is determined by the choice of loss function and, crucially, by inconsistency of both...

  4. Estimating variance of random effects to solve multiple problems simultaneously

    Yoshimori Hirose, Masayo; Lahiri, Partha
    The two-level normal hierarchical model (NHM) has played a critical role in statistical theory for the last several decades. In this paper, we propose random effects variance estimator that simultaneously (i) improves on the estimation of the related shrinkage factors, (ii) protects empirical best linear unbiased predictors (EBLUP) [same as empirical Bayes (EB)] of the random effects from the common overshrinkage problem, (iii) avoids complex bias correction in generating strictly positive second-order unbiased mean square error (MSE) (same as integrated Bayes risk) estimator either by the Taylor series or single parametric bootstrap method. The idea of achieving multiple desirable properties...

  5. Empirical Bayes estimates for a two-way cross-classified model

    Brown, Lawrence D.; Mukherjee, Gourab; Weinstein, Asaf
    We develop an empirical Bayes procedure for estimating the cell means in an unbalanced, two-way additive model with fixed effects. We employ a hierarchical model, which reflects exchangeability of the effects within treatment and within block but not necessarily between them, as suggested before by Lindley and Smith [J. R. Stat. Soc., B 34 (1972) 1–41]. The hyperparameters of this hierarchical model, instead of considered fixed, are to be substituted with data-dependent values in such a way that the point risk of the empirical Bayes estimator is small. Our method chooses the hyperparameters by minimizing an unbiased risk estimate and...

  6. Curvature and inference for maximum likelihood estimates

    Efron, Bradley
    Maximum likelihood estimates are sufficient statistics in exponential families, but not in general. The theory of statistical curvature was introduced to measure the effects of MLE insufficiency in one-parameter families. Here, we analyze curvature in the more realistic venue of multiparameter families—more exactly, curved exponential families, a broad class of smoothly defined nonexponential family models. We show that within the set of observations giving the same value for the MLE, there is a “region of stability” outside of which the MLE is no longer even a local maximum. Accuracy of the MLE is affected by the location of the observation...

  7. An MCMC approach to empirical Bayes inference and Bayesian sensitivity analysis via empirical processes

    Doss, Hani; Park, Yeonhee
    Consider a Bayesian situation in which we observe $Y\sim p_{\theta}$, where $\theta\in\Theta$, and we have a family $\{\nu_{h},h\in\mathcal{H}\}$ of potential prior distributions on $\Theta$. Let $g$ be a real-valued function of $\theta$, and let $I_{g}(h)$ be the posterior expectation of $g(\theta)$ when the prior is $\nu_{h}$. We are interested in two problems: (i) selecting a particular value of $h$, and (ii) estimating the family of posterior expectations $\{I_{g}(h),h\in\mathcal{H}\}$. Let $m_{y}(h)$ be the marginal likelihood of the hyperparameter $h$: $m_{y}(h)=\int p_{\theta}(y)\nu_{h}(d\theta)$. The empirical Bayes estimate of $h$ is, by definition, the value of $h$ that maximizes $m_{y}(h)$. It turns out that...

  8. Near-optimality of linear recovery in Gaussian observation scheme under $\Vert \cdot \Vert_{2}^{2}$-loss

    Juditsky, Anatoli; Nemirovski, Arkadi
    We consider the problem of recovering linear image $Bx$ of a signal $x$ known to belong to a given convex compact set $\mathcal{X}$ from indirect observation $\omega=Ax+\sigma\xi$ of $x$ corrupted by Gaussian noise $\xi$. It is shown that under some assumptions on $\mathcal{X}$ (satisfied, e.g., when $\mathcal{X}$ is the intersection of $K$ concentric ellipsoids/elliptic cylinders), an easy-to-compute linear estimate is near-optimal in terms of its worst case, over $x\in\mathcal{X}$, expected $\Vert \cdot \Vert_{2}^{2}$-loss. The main novelty here is that the result imposes no restrictions on $A$ and $B$. To the best of our knowledge, preceding results on optimality of linear...

  9. Convexified modularity maximization for degree-corrected stochastic block models

    Chen, Yudong; Li, Xiaodong; Xu, Jiaming
    The stochastic block model (SBM), a popular framework for studying community detection in networks, is limited by the assumption that all nodes in the same community are statistically equivalent and have equal expected degrees. The degree-corrected stochastic block model (DCSBM) is a natural extension of SBM that allows for degree heterogeneity within communities. To find the communities under DCSBM, this paper proposes a convexified modularity maximization approach, which is based on a convex programming relaxation of the classical (generalized) modularity maximization formulation, followed by a novel doubly-weighted $\ell_{1}$-norm $k$-medoids procedure. We establish nonasymptotic theoretical guarantees for approximate and perfect clustering,...

  10. Efficient and adaptive linear regression in semi-supervised settings

    Chakrabortty, Abhishek; Cai, Tianxi
    We consider the linear regression problem under semi-supervised settings wherein the available data typically consists of: (i) a small or moderate sized “labeled” data, and (ii) a much larger sized “unlabeled” data. Such data arises naturally from settings where the outcome, unlike the covariates, is expensive to obtain, a frequent scenario in modern studies involving large databases like electronic medical records (EMR). Supervised estimators like the ordinary least squares (OLS) estimator utilize only the labeled data. It is often of interest to investigate if and when the unlabeled data can be exploited to improve estimation of the regression parameter in...

  11. Pareto quantiles of unlabeled tree objects

    Sienkiewicz, Ela; Wang, Haonan
    In this paper, we consider a set of unlabeled tree objects with topological and geometric properties. For each data object, two curve representations are developed to characterize its topological and geometric aspects. We further define the notions of topological and geometric medians as well as quantiles based on both representations. In addition, we take a novel approach to define the Pareto medians and quantiles through a multi-objective optimization problem. In particular, we study two different objective functions which measure the topological variation and geometric variation, respectively. Analytical solutions are provided for topological and geometric medians and quantiles, and in general,...

  12. Consistency and convergence rate of phylogenetic inference via regularization

    Dinh, Vu; Ho, Lam Si Tung; Suchard, Marc A.; Matsen IV, Frederick A.
    It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct “gene tree.” Although the gene tree may deviate from the “species tree” due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to...

  13. Jump filtering and efficient drift estimation for Lévy-driven SDEs

    Gloter, Arnaud; Loukianova, Dasha; Mai, Hilmar
    The problem of drift estimation for the solution $X$ of a stochastic differential equation with Lévy-type jumps is considered under discrete high-frequency observations with a growing observation window. An efficient and asymptotically normal estimator for the drift parameter is constructed under minimal conditions on the jump behavior and the sampling scheme. In the case of a bounded jump measure density, these conditions reduce to $n\Delta_{n}^{3-\varepsilon}\rightarrow 0$, where $n$ is the number of observations and $\Delta_{n}$ is the maximal sampling step. This result relaxes the condition $n\Delta_{n}^{2}\rightarrow 0$ usually required for joint estimation of drift and diffusion coefficient for SDEs with...

  14. Current status linear regression

    Groeneboom, Piet; Hendrickx, Kim
    We construct $\sqrt{n}$-consistent and asymptotically normal estimates for the finite dimensional regression parameter in the current status linear regression model, which do not require any smoothing device and are based on maximum likelihood estimates (MLEs) of the infinite dimensional parameter. We also construct estimates, again only based on these MLEs, which are arbitrarily close to efficient estimates, if the generalized Fisher information is finite. This type of efficiency is also derived under minimal conditions for estimates based on smooth nonmonotone plug-in estimates of the distribution function. Algorithms for computing the estimates and for selecting the bandwidth of the smooth estimates...

  15. Large covariance estimation through elliptical factor models

    Fan, Jianqing; Liu, Han; Wang, Weichen
    We propose a general Principal Orthogonal complEment Thresholding (POET) framework for large-scale covariance matrix estimation based on the approximate factor model. A set of high-level sufficient conditions for the procedure to achieve optimal rates of convergence under different matrix norms is established to better understand how POET works. Such a framework allows us to recover existing results for sub-Gaussian data in a more transparent way that only depends on the concentration properties of the sample covariance matrix. As a new theoretical contribution, for the first time, such a framework allows us to exploit conditional sparsity covariance structure for the heavy-tailed...

  16. Distributed testing and estimation under sparse high dimensional models

    Battey, Heather; Fan, Jianqing; Liu, Han; Lu, Junwei; Zhu, Ziwei
    This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood-based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n/k$, where $n$ is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large $k$ can be, as $n$ grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with...

  17. Adaptive sup-norm estimation of the Wigner function in noisy quantum homodyne tomography

    Lounici, Karim; Meziani, Katia; Peyré, Gabriel
    In quantum optics, the quantum state of a light beam is represented through the Wigner function, a density on $\mathbb{R}^{2}$, which may take negative values but must respect intrinsic positivity constraints imposed by quantum physics. In the framework of noisy quantum homodyne tomography with efficiency parameter $1/2<\eta\leq1$, we study the theoretical performance of a kernel estimator of the Wigner function. We prove that it is minimax efficient, up to a logarithmic factor in the sample size, for the $\mathbb{L}_{\infty}$-risk over a class of infinitely differentiable functions. We also compute the lower bound for the $\mathbb{L}_{2}$-risk. We construct an adaptive estimator,...

  18. Detection thresholds for the $\beta$-model on sparse graphs

    Mukherjee, Rajarshi; Mukherjee, Sumit; Sen, Subhabrata
    In this paper, we study sharp thresholds for detecting sparse signals in $\beta$-models for potentially sparse random graphs. The results demonstrate interesting interplay between graph sparsity, signal sparsity and signal strength. In regimes of moderately dense signals, irrespective of graph sparsity, the detection thresholds mirror corresponding results in independent Gaussian sequence problems. For sparser signals, extreme graph sparsity implies that all tests are asymptotically powerless, irrespective of the signal strength. On the other hand, sharp detection thresholds are obtained, up to matching constants, on denser graphs. The phase transitions mentioned above are sharp. As a crucial ingredient, we study a...

  19. Uniform asymptotic inference and the bootstrap after model selection

    Tibshirani, Ryan J.; Rinaldo, Alessandro; Tibshirani, Rob; Wasserman, Larry
    Recently, Tibshirani et al. [J. Amer. Statist. Assoc. 111 (2016) 600–620] proposed a method for making inferences about parameters defined by model selection, in a typical regression setting with normally distributed errors. Here, we study the large sample properties of this method, without assuming normality. We prove that the test statistic of Tibshirani et al. (2016) is asymptotically valid, as the number of samples $n$ grows and the dimension $d$ of the regression problem stays fixed. Our asymptotic result holds uniformly over a wide class of nonnormal error distributions. We also propose an efficient bootstrap version of this test that...

  20. Moderate deviations and nonparametric inference for monotone functions

    Gao, Fuqing; Xiong, Jie; Zhao, Xingqiu
    This paper considers self-normalized limits and moderate deviations of nonparametric maximum likelihood estimators for monotone functions. We obtain their self-normalized Cramér-type moderate deviations and limit distribution theorems for the nonparametric maximum likelihood estimator in the current status model and the Grenander-type estimator. As applications of the results, we present a new procedure to construct asymptotical confidence intervals and asymptotical rejection regions of hypothesis testing for monotone functions. The theoretical results can guarantee that the new test has the probability of type II error tending to 0 exponentially. Simulation studies also show that the new nonparametric test works well for the...

Aviso de cookies: Usamos cookies propias y de terceros para mejorar nuestros servicios, para análisis estadístico y para mostrarle publicidad. Si continua navegando consideramos que acepta su uso en los términos establecidos en la Política de cookies.