Recursos de colección
Project Euclid (Hosted at Cornell University Library) (191.996 recursos)
Electronic Journal of Statistics
Electronic Journal of Statistics
Zhang, Yuchen; Wainwright, Martin J.; Jordan, Michael I.
For the problem of high-dimensional sparse linear regression, it is known that an $\ell_{0}$-based estimator can achieve a $1/n$ “fast” rate for prediction error without any conditions on the design matrix, whereas in the absence of restrictive conditions on the design matrix, popular polynomial-time methods only guarantee the $1/\sqrt{n}$ “slow” rate. In this paper, we show that the slow rate is intrinsic to a broad class of M-estimators. In particular, for estimators based on minimizing a least-squares cost function together with a (possibly nonconvex) coordinate-wise separable regularizer, there is always a “bad” local optimum such that the associated prediction error...
Niebuhr, Tobias; Kreiss, Jens-Peter; Paparoditis, Efstathios
We investigate properties of a hybrid bootstrap procedure for general, strictly stationary sequences, called the autoregressive-aided block bootstrap which combines a parametric autoregressive bootstrap with a nonparametric moving block bootstrap. The autoregressive-aided block bootstrap consists of two main steps, namely an autoregressive model fit and an ensuing (moving) block resampling of residuals. The linear parametric model-fit prewhitenes the time series so that the dependence structure of the remaining residuals gets closer to that of a white noise sequence, while the moving block bootstrap applied to these residuals captures nonlinear features that are not taken into account by the linear autoregressive...
Chichignoud, Michaël; Hoang, Van Ha; Pham Ngoc, Thanh Mai; Rivoirard, Vincent
In the multidimensional setting, we consider the errors-in- variables model. We aim at estimating the unknown nonparametric multivariate regression function with errors in the covariates. We devise an adaptive estimators based on projection kernels on wavelets and a deconvolution operator. We propose an automatic and fully data driven procedure to select the wavelet level resolution. We obtain an oracle inequality and optimal rates of convergence over anisotropic Hölder classes. Our theoretical results are illustrated by some simulations.
Liu, Hongmei; Rao, J. Sunil
Shrinkage estimators that possess the ability to produce sparse solutions have become increasingly important to the analysis of today’s complex datasets. Examples include the LASSO, the Elastic-Net and their adaptive counterparts. Estimation of penalty parameters still presents difficulties however. While variable selection consistent procedures have been developed, their finite sample performance can often be less than satisfactory. We develop a new strategy for variable selection using the adaptive LASSO and adaptive Elastic-Net estimators with $p_{n}$ diverging. The basic idea first involves using the trace paths of their LARS solutions to bootstrap estimates of maximum frequency (MF) models conditioned on dimension....
Strähl, Christof; Ziegel, Johanna
When providing probabilistic forecasts for uncertain future events, it is common to strive for calibrated forecasts, that is, the predictive distribution should be compatible with the observed outcomes. Often, there are several competing forecasters of different skill. We extend common notions of calibration where each forecaster is analyzed individually, to stronger notions of cross-calibration where each forecaster is analyzed with respect to the other forecasters. In particular, cross-calibration distinguishes forecasters with respect to increasing information sets. We provide diagnostic tools and statistical tests to assess cross-calibration. The methods are illustrated in simulation examples and applied to probabilistic forecasts for inflation...
Stephanou, Michael; Varughese, Melvin; Macdonald, Iain
Sequential quantile estimation refers to incorporating observations into quantile estimates in an incremental fashion thus furnishing an online estimate of one or more quantiles at any given point in time. Sequential quantile estimation is also known as online quantile estimation. This area is relevant to the analysis of data streams and to the one-pass analysis of massive data sets. Applications include network traffic and latency analysis, real time fraud detection and high frequency trading. We introduce new techniques for online quantile estimation based on Hermite series estimators in the settings of static quantile estimation and dynamic quantile estimation. In the...
Goldberg, Yair; Kosorok, Michael R.
We develop a unified approach for classification and regression support vector machines for when the responses are subject to right censoring. We provide finite sample bounds on the generalization error of the algorithm, prove risk consistency for a wide class of probability measures, and study the associated learning rates. We apply the general methodology to estimation of the (truncated) mean, median, quantiles, and for classification problems. We present a simulation study that demonstrates the performance of the proposed approach.
Kurtek, Sebastian
We present a Bayesian model for pairwise nonlinear registration of functional data. We use the Riemannian geometry of the space of warping functions to define appropriate prior distributions and sample from the posterior using importance sampling. A simple square-root transformation is used to simplify the geometry of the space of warping functions, which allows for computation of sample statistics, such as the mean and median, and a fast implementation of a $k$-means clustering algorithm. These tools allow for efficient posterior inference, where multiple modes of the posterior distribution corresponding to multiple plausible alignments of the given functions are found. We...
Liu, Jianxuan; Ma, Yanyuan; Zhu, Liping; Carroll, Raymond J.
We introduce a general single index semiparametric measurement error model for the case that the main covariate of interest is measured with error and modeled parametrically, and where there are many other variables also important to the modeling. We propose a semiparametric bias-correction approach to estimate the effect of the covariate of interest. The resultant estimators are shown to be root-$n$ consistent, asymptotically normal and locally efficient. Comprehensive simulations and an analysis of an empirical data set are performed to demonstrate the finite sample performance and the bias reduction of the locally efficient estimators.
Bardet, Jean-Marc; Boularouk, Yakoub; Djaballah, Khedidja
We prove the consistency and asymptotic normality of the Laplacian Quasi-Maximum Likelihood Estimator (QMLE) for a general class of causal time series including ARMA, AR($\infty$), GARCH, ARCH($\infty$), ARMA-GARCH, APARCH, ARMA-APARCH,..., processes. We notably exhibit the advantages (moment order and robustness) of this estimator compared to the classical Gaussian QMLE. Numerical simulations confirms the accuracy of this estimator.
Joly, Emilien; Lugosi, Gábor; Imbuzeiro Oliveira, Roberto
We study the problem of estimating the mean of a multivariate distribution based on independent samples. The main result is the proof of existence of an estimator with a non-asymptotic sub-Gaussian performance for all distributions satisfying some mild moment assumptions.
Barboza, Luis A.; Viens, Frederi G.
We consider the class of all stationary Gaussian process with explicit parametric spectral density. Under some conditions on the autocovariance function, we defined a GMM estimator that satisfies consistency and asymptotic normality, using the Breuer-Major theorem and previous results on ergodicity. This result is applied to the joint estimation of the three parameters of a stationary Ornstein-Uhlenbeck (fOU) process driven by a fractional Brownian motion. The asymptotic normality of its GMM estimator applies for any $H$ in $(0,1)$ and under some restrictions on the remaining parameters. A numerical study is performed in the fOU case, to illustrate the estimator’s practical...
Kukush, Alexander; Mishura, Yuliya; Ralchenko, Kostiantyn
We consider the fractional Ornstein–Uhlenbeck process with an unknown drift parameter and known Hurst parameter $H$. We propose a new method to test the hypothesis of the sign of the parameter and prove the consistency of the test. Contrary to the previous works, our approach is applicable for all $H\in(0,1)$.
Song, Rui; Luo, Shikai; Zeng, Donglin; Zhang, Hao Helen; Lu, Wenbin; Li, Zhiguo
Different from the standard treatment discovery framework which is used for finding single treatments for a homogenous group of patients, personalized medicine involves finding therapies that are tailored to each individual in a heterogeneous group. In this paper, we propose a new semiparametric additive single-index model for estimating individualized treatment strategy. The model assumes a flexible and nonparametric link function for the interaction between treatment and predictive covariates. We estimate the rule via monotone B-splines and establish the asymptotic properties of the estimators. Both simulations and an real data application demonstrate that the proposed method has a competitive performance.
Song, Yanglei; Fellouris, Georgios
Assuming that data are collected sequentially from independent streams, we consider the simultaneous testing of multiple binary hypotheses under two general setups; when the number of signals (correct alternatives) is known in advance, and when we only have a lower and an upper bound for it. In each of these setups, we propose feasible procedures that control, without any distributional assumptions, the familywise error probabilities of both type I and type II below given, user-specified levels. Then, in the case of i.i.d. observations in each stream, we show that the proposed procedures achieve the optimal expected sample size, under every...
Choi, Hee Min; Román, Jorge Carlos
We consider the intractable posterior density that results when the one-way logistic analysis of variance model is combined with a flat prior. We analyze Polson, Scott and Windle’s (2013) data augmentation (DA) algorithm for exploring the posterior. The Markov operator associated with the DA algorithm is shown to be trace-class.
González, Miguel; Minuesa, Carmen; del Puerto, Inés
Minimum disparity estimation in controlled branching processes is dealt with by assuming that the offspring law belongs to a general parametric family. Under some regularity conditions it is proved that the minimum disparity estimators proposed -based on the nonparametric maximum likelihood estimator of the offspring law when the entire family tree is observed- are consistent and asymptotic normally distributed. Moreover, the robustness of the estimators proposed is discussed. Through a simulated example, focusing on the minimum Hellinger and negative exponential disparity estimators, it is shown that both are robust against outliers, and the minimum negative exponential estimator is also robust...
Liu, Han; Wang, Lie
We propose a new procedure for optimally estimating high dimensional Gaussian graphical models. Our approach is asymptotically tuning-free and non-asymptotically tuning-insensitive: It requires very little effort to choose the tuning parameter in finite sample settings. Computationally, our procedure is significantly faster than existing methods due to its tuning-insensitive property. Theoretically, the obtained estimator simultaneously achieves minimax lower bounds for precision matrix estimation under different norms. Empirically, we illustrate the advantages of the proposed method using simulated and real examples. The R package camel implementing the proposed methods is also available on the Comprehensive R Archive Network.
Mukhopadhyay, Subhadeep
Bump-hunting or mode identification is a fundamental problem that arises in almost every scientific field of data-driven discovery. Surprisingly, very few data modeling tools are available for automatic (not requiring manual case-by-case investigation), objective (not subjective), and nonparametric (not based on restrictive parametric model assumptions) mode discovery, which can scale to large data sets. This article introduces LPMode–an algorithm based on a new theory for detecting multimodality of a probability density. We apply LPMode to answer important research questions arising in various fields from environmental science, ecology, econometrics, analytical chemistry to astronomy and cancer genomics.
Rosenthal, Jeffrey S.
This short note argues that 95% confidence intervals for MCMC estimates can be obtained even without establishing a CLT, by multiplying their widths by 2.3.