Recursos de colección
Project Euclid (Hosted at Cornell University Library) (202.106 recursos)
Electronic Journal of Statistics
Electronic Journal of Statistics
Camerlenghi, Federico; Villa, Elena
The mean density of a random closed set with integer Hausdorff dimension is a crucial notion in stochastic geometry, in fact it is a fundamental tool in a large variety of applied problems, such as image analysis, medicine, computer vision, etc. Hence the estimation of the mean density is a problem of interest both from a theoretical and computational standpoint. Nowadays different kinds of estimators are available in the literature, in particular here we focus on a kernel–type estimator, which may be considered as a generalization of the traditional kernel density estimator of random variables to the case of random...
Clarke De la Cerda, Jorge; Alegría, Alfredo; Porcu, Emilio
We study the regularity properties of Gaussian fields defined over spheres cross time. In particular, we consider two alternative spectral decompositions for a Gaussian field on $\mathbb{S}^{d}\times \mathbb{R}$. For each decomposition, we establish regularity properties through Sobolev and interpolation spaces. We then propose a simulation method and study its level of accuracy in the $L^{2}$ sense. The method turns to be both fast and efficient.
El Methni, Jonathan; Gardes, Laurent; Girard, Stéphane
The Regression Conditional Tail Moment (RCTM) is the risk measure defined as the moment of order $b\geq0$ of a loss distribution above the upper $\alpha$-quantile where $\alpha\in (0,1)$ and when a covariate information is available. The purpose of this work is first to establish the asymptotic properties of the RCTM in case of extreme losses, i.e when $\alpha\to 0$ is no longer fixed, under general extreme-value conditions on their distribution tail. In particular, no assumption is made on the sign of the associated extreme-value index. Second, the asymptotic normality of a kernel estimator of the RCTM is established, which allows...
Steinsaltz, David; Dahl, Andrew; Wachter, Kenneth W.
Random-effects models are a popular tool for analysing total narrow-sense heritability for quantitative phenotypes, on the basis of large-scale SNP data. Recently, there have been disputes over the validity of conclusions that may be drawn from such analysis. We derive some of the fundamental statistical properties of heritability estimates arising from these models, showing that the bias will generally be small. We show that the score function may be manipulated into a form that facilitates intelligible interpretations of the results. We go on to use this score function to explore the behavior of the model when certain key assumptions of...
Gautier, Eric; Le Pennec, Erwan
In the random coefficients binary choice model, a binary variable equals 1 iff an index $X^{\top}\beta$ is positive. The vectors $X$ and $\beta$ are independent and belong to the sphere $\mathbb{S}^{d-1}$ in $\mathbb{R}^{d}$. We prove lower bounds on the minimax risk for estimation of the density $f_{\beta}$ over Besov bodies where the loss is a power of the $\mathrm{L}^{p}(\mathbb{S}^{d-1})$ norm for $1\le p\le \infty$. We show that a hard thresholding estimator based on a needlet expansion with data-driven thresholds achieves these lower bounds up to logarithmic factors.
Wei, Susan; Panaretos, Victor M.
Evolution equations comprise a broad framework for describing the dynamics of a system in a general state space: when the state space is finite-dimensional, they give rise to systems of ordinary differential equations; for infinite-dimensional state spaces, they give rise to partial differential equations. Several modern statistical and machine learning methods concern the estimation of objects that can be formalized as solutions to evolution equations, in some appropriate state space, even if not stated as such. The corresponding equations, however, are seldom known exactly, and are empirically derived from data, often by means of non-parametric estimation. This induces uncertainties on...
Jia, Mofei; Taufer, Emanuele; Dickson, Maria Michela
Consider a distribution $F$ with regularly varying tails of index $-\alpha$. An estimation strategy for $\alpha$, exploiting the relation between the behavior of the tail at infinity and of the characteristic function at the origin, is proposed. A semi-parametric regression model does the job: a nonparametric component controls the bias and a parametric one produces the actual estimate. Implementation of the estimation strategy is quite simple as it can rely on standard software packages for generalized additive models. A generalized cross validation procedure is suggested in order to handle the bias-variance trade-off. Theoretical properties of the proposed method are derived...
Ćmiel, Bogdan; Szkutnik, Zbigniew; Wojdyła, Jakub
The stereological problem of unfolding the distribution of spheres radii from linear sections, known as the Spektor-Lord-Willis problem, is formulated as a Poisson inverse problem and an $L^{2}$-rate-minimax solution is constructed over some restricted Sobolev classes. The solution is a specialized kernel-type estimator with boundary correction. For the first time for this problem, non-parametric, asymptotic confidence bands for the unfolded function are constructed. Automatic bandwidth selection procedures based on empirical risk minimization are proposed. It is shown that a version of the Goldenshluger-Lepski procedure of bandwidth selection ensures adaptivity of the estimators to the unknown smoothness. The performance of the...
Zhu, Xiaolu; Qu, Annie
In this paper, we cluster profiles of longitudinal data using a penalized regression method. Specifically, we allow heterogeneous variation of longitudinal patterns for each subject, and utilize a pairwise-grouping penalization on coefficients of the nonparametric B-spline models to form subgroups. Consequently, we identify clusters based on different patterns of the predicted longitudinal curves. One advantage of the proposed method is that there is no need to pre-specify the number of clusters; instead the number of clusters is selected automatically through a model selection criterion. Our method is also applicable for unbalanced data where different subjects could have measurements at different...
van Delft, Anne; Eichler, Michael
The literature on time series of functional data has focused on processes of which the probabilistic law is either constant over time or constant up to its second-order structure. Especially for long stretches of data it is desirable to be able to weaken this assumption. This paper introduces a framework that will enable meaningful statistical inference of functional data of which the dynamics change over time. We put forward the concept of local stationarity in the functional setting and establish a class of processes that have a functional time-varying spectral representation. Subsequently, we derive conditions that allow for fundamental results...
Chernoyarov, Oleg V.; Kutoyants, Yury A.; Trifonov, Andrei P.
The problem of parameter estimation by the continuous time observations of a deterministic signal in white Gaussian noise is considered. The asymptotic properties of the maximum likelihood estimator are described in the asymptotic of small noise (large signal-to-noise ratio). We are interested in the situation when there is a misspecification in the regularity conditions. In particular it is supposed that the statistician uses a discontinuous (change-point type) model of signal, when the true signal is continuously differentiable function of the unknown parameter.
Fuentes, Claudio; Casella, George; Wells, Martin T.
Consider an experiment in which $p$ independent populations $\pi_{i}$ with corresponding unknown means $\theta_{i}$ are available, and suppose that for every $1\leq i\leq p$, we can obtain a sample $X_{i1},\ldots,X_{in}$ from $\pi_{i}$. In this context, researchers are sometimes interested in selecting the populations that yield the largest sample means as a result of the experiment, and then estimate the corresponding population means $\theta_{i}$. In this paper, we present a frequentist approach to the problem and discuss how to construct simultaneous confidence intervals for the means of the $k$ selected populations, assuming that the populations $\pi_{i}$ are independent and normally distributed...
Cao, Yang; Nemirovski, Arkadi; Xie, Yao; Guigues, Vincent; Juditsky, Anatoli
The goal of the paper is to develop a specific application of the convex optimization based hypothesis testing techniques developed in A. Juditsky, A. Nemirovski, “Hypothesis testing via affine detectors,” Electronic Journal of Statistics 10:2204–2242, 2016. Namely, we consider the Change Detection problem as follows: observing one by one noisy observations of outputs of a discrete-time linear dynamical system, we intend to decide, in a sequential fashion, on the null hypothesis that the input to the system is a nuisance, vs. the alternative that the input is a “nontrivial signal,” with both the nuisances and the nontrivial signals modeled as...
Dempsey, Walter; McCullagh, Peter
We study exchangeable, Markov survival processes – stochastic processes giving rise to infinitely exchangeable non-negative sequences $(T_{1},T_{2},\ldots)$. We show how these are determined by their characteristic index $\{\zeta_{n}\}_{n=1}^{\infty}$. We identify the harmonic process as the family of exchangeable, Markov survival processes that compose the natural set of statistical models for time-to-event data. In particular, this two-dimensional family comprises the set of exchangeable, Markov survival processes with weakly continuous predictive distributions. The harmonic process is easy to generate sequentially, and a simple expression exists for both the joint probability distribution and multivariate survivor function. We show a close connection with the...
Dempsey, Walter; McCullagh, Peter
We study exchangeable, Markov survival processes – stochastic processes giving rise to infinitely exchangeable non-negative sequences $(T_{1},T_{2},\ldots)$. We show how these are determined by their characteristic index $\{\zeta_{n}\}_{n=1}^{\infty}$. We identify the harmonic process as the family of exchangeable, Markov survival processes that compose the natural set of statistical models for time-to-event data. In particular, this two-dimensional family comprises the set of exchangeable, Markov survival processes with weakly continuous predictive distributions. The harmonic process is easy to generate sequentially, and a simple expression exists for both the joint probability distribution and multivariate survivor function. We show a close connection with the...
Jacobovic, Royi; Zuk, Or
The field of discrete event simulation and optimization techniques motivates researchers to adjust classic ranking and selection (R&S) procedures to the settings where the number of populations is large. We use insights from extreme value theory in order to reveal the asymptotic properties of R&S procedures. Namely, we generalize the asymptotic result of Robbins and Siegmund regarding selection from independent Gaussian populations with known constant variance by their means to the case of selecting a subset of varying size out of a given set of populations. In addition, we revisit the problem of selecting the population with the highest mean...
Jacobovic, Royi; Zuk, Or
The field of discrete event simulation and optimization techniques motivates researchers to adjust classic ranking and selection (R&S) procedures to the settings where the number of populations is large. We use insights from extreme value theory in order to reveal the asymptotic properties of R&S procedures. Namely, we generalize the asymptotic result of Robbins and Siegmund regarding selection from independent Gaussian populations with known constant variance by their means to the case of selecting a subset of varying size out of a given set of populations. In addition, we revisit the problem of selecting the population with the highest mean...
Nowzohour, Christopher; Maathuis, Marloes H.; Evans, Robin J.; Bühlmann, Peter
We consider the problem of structure learning for bow-free acyclic path diagrams (BAPs). BAPs can be viewed as a generalization of linear Gaussian DAG models that allow for certain hidden variables. We present a first method for this problem using a greedy score-based search algorithm. We also prove some necessary and some sufficient conditions for distributional equivalence of BAPs which are used in an algorithmic approach to compute (nearly) equivalent model structures. This allows us to infer lower bounds of causal effects. We also present applications to real and simulated datasets using our publicly available R-package.
Nowzohour, Christopher; Maathuis, Marloes H.; Evans, Robin J.; Bühlmann, Peter
We consider the problem of structure learning for bow-free acyclic path diagrams (BAPs). BAPs can be viewed as a generalization of linear Gaussian DAG models that allow for certain hidden variables. We present a first method for this problem using a greedy score-based search algorithm. We also prove some necessary and some sufficient conditions for distributional equivalence of BAPs which are used in an algorithmic approach to compute (nearly) equivalent model structures. This allows us to infer lower bounds of causal effects. We also present applications to real and simulated datasets using our publicly available R-package.
Liu, Ziqi; Smola, Alexander; Soska, Kyle; Wang, Yu-Xiang; Zheng, Qinghua; Zhou, Jun
In this paper we describe an algorithm for estimating the provenance of hacks on websites. That is, given properties of sites and the temporal occurrence of attacks, we are able to attribute individual attacks to joint causes and vulnerabilities, as well as estimate the evolution of these vulnerabilities over time. Specifically, we use hazard regression with a time-varying additive hazard function parameterized in a generalized linear form. The activation coefficients on each feature are continuous-time functions over time. We formulate the problem of learning these functions as a constrained variational maximum likelihood estimation problem with total variation penalty and show...