Recursos de colección
Project Euclid (Hosted at Cornell University Library) (198.174 recursos)
The Annals of Statistics
The Annals of Statistics
Zhu, Ying
We consider a two-step projection based Lasso procedure for estimating a partially linear regression model where the number of coefficients in the linear component can exceed the sample size and these coefficients belong to the $l_{q}$-“balls” for $q\in[0,1]$. Our theoretical results regarding the properties of the estimators are nonasymptotic. In particular, we establish a new nonasymptotic “oracle” result: Although the error of the nonparametric projection per se (with respect to the prediction norm) has the scaling $t_{n}$ in the first step, it only contributes a scaling $t_{n}^{2}$ in the $l_{2}$-error of the second-step estimator for the linear coefficients. This new...
Zhu, Ying
We consider a two-step projection based Lasso procedure for estimating a partially linear regression model where the number of coefficients in the linear component can exceed the sample size and these coefficients belong to the $l_{q}$-“balls” for $q\in[0,1]$. Our theoretical results regarding the properties of the estimators are nonasymptotic. In particular, we establish a new nonasymptotic “oracle” result: Although the error of the nonparametric projection per se (with respect to the prediction norm) has the scaling $t_{n}$ in the first step, it only contributes a scaling $t_{n}^{2}$ in the $l_{2}$-error of the second-step estimator for the linear coefficients. This new...
Atchadé, Yves A.
We study the contraction properties of a quasi-posterior distribution $\check{\Pi}_{n,d}$ obtained by combining a quasi-likelihood function and a sparsity inducing prior distribution on $\mathbb{R}^{d}$, as both $n$ (the sample size), and $d$ (the dimension of the parameter) increase. We derive some general results that highlight a set of sufficient conditions under which $\check{\Pi}_{n,d}$ puts increasingly high probability on sparse subsets of $\mathbb{R}^{d}$, and contracts toward the true value of the parameter. We apply these results to the analysis of logistic regression models, and binary graphical models, in high-dimensional settings. For the logistic regression model, we shows that for well-behaved design...
Atchadé, Yves A.
We study the contraction properties of a quasi-posterior distribution $\check{\Pi}_{n,d}$ obtained by combining a quasi-likelihood function and a sparsity inducing prior distribution on $\mathbb{R}^{d}$, as both $n$ (the sample size), and $d$ (the dimension of the parameter) increase. We derive some general results that highlight a set of sufficient conditions under which $\check{\Pi}_{n,d}$ puts increasingly high probability on sparse subsets of $\mathbb{R}^{d}$, and contracts toward the true value of the parameter. We apply these results to the analysis of logistic regression models, and binary graphical models, in high-dimensional settings. For the logistic regression model, we shows that for well-behaved design...
Kong, Weihao; Valiant, Gregory
We consider the problem of approximating the set of eigenvalues of the covariance matrix of a multivariate distribution (equivalently, the problem of approximating the “population spectrum”), given access to samples drawn from the distribution. We consider this recovery problem in the regime where the sample size is comparable to, or even sublinear in the dimensionality of the distribution. First, we propose a theoretically optimal and computationally efficient algorithm for recovering the moments of the eigenvalues of the population covariance matrix. We then leverage this accurate moment recovery, via a Wasserstein distance argument, to accurately reconstruct the vector of eigenvalues. Together,...
Kong, Weihao; Valiant, Gregory
We consider the problem of approximating the set of eigenvalues of the covariance matrix of a multivariate distribution (equivalently, the problem of approximating the “population spectrum”), given access to samples drawn from the distribution. We consider this recovery problem in the regime where the sample size is comparable to, or even sublinear in the dimensionality of the distribution. First, we propose a theoretically optimal and computationally efficient algorithm for recovering the moments of the eigenvalues of the population covariance matrix. We then leverage this accurate moment recovery, via a Wasserstein distance argument, to accurately reconstruct the vector of eigenvalues. Together,...
Li, Meng; Ghosal, Subhashis
Detecting boundary of an image based on noisy observations is a fundamental problem of image processing and image segmentation. For a $d$-dimensional image ($d=2,3,\ldots$), the boundary can often be described by a closed smooth $(d-1)$-dimensional manifold. In this paper, we propose a nonparametric Bayesian approach based on priors indexed by $\mathbb{S}^{d-1}$, the unit sphere in $\mathbb{R}^{d}$. We derive optimal posterior contraction rates for Gaussian processes or finite random series priors using basis functions such as trigonometric polynomials for 2-dimensional images and spherical harmonics for 3-dimensional images. For 2-dimensional images, we show a rescaled squared exponential Gaussian process on $\mathbb{S}^{1}$ achieves...
Li, Meng; Ghosal, Subhashis
Detecting boundary of an image based on noisy observations is a fundamental problem of image processing and image segmentation. For a $d$-dimensional image ($d=2,3,\ldots$), the boundary can often be described by a closed smooth $(d-1)$-dimensional manifold. In this paper, we propose a nonparametric Bayesian approach based on priors indexed by $\mathbb{S}^{d-1}$, the unit sphere in $\mathbb{R}^{d}$. We derive optimal posterior contraction rates for Gaussian processes or finite random series priors using basis functions such as trigonometric polynomials for 2-dimensional images and spherical harmonics for 3-dimensional images. For 2-dimensional images, we show a rescaled squared exponential Gaussian process on $\mathbb{S}^{1}$ achieves...
Jin, Jiashun; Ke, Zheng Tracy; Wang, Wanjie
Consider a two-class clustering problem where we observe $X_{i}=\ell_{i}\mu+Z_{i}$, $Z_{i}\stackrel{\mathit{i.i.d.}}{\sim}N(0,I_{p})$, $1\leq i\leq n$. The feature vector $\mu\in R^{p}$ is unknown but is presumably sparse. The class labels $\ell_{i}\in\{-1,1\}$ are also unknown and the main interest is to estimate them. ¶ We are interested in the statistical limits. In the two-dimensional phase space calibrating the rarity and strengths of useful features, we find the precise demarcation for the Region of Impossibility and Region of Possibility. In the former, useful features are too rare/weak for successful clustering. In the latter, useful features are strong enough to allow successful clustering. The results are...
Jin, Jiashun; Ke, Zheng Tracy; Wang, Wanjie
Consider a two-class clustering problem where we observe $X_{i}=\ell_{i}\mu+Z_{i}$, $Z_{i}\stackrel{\mathit{i.i.d.}}{\sim}N(0,I_{p})$, $1\leq i\leq n$. The feature vector $\mu\in R^{p}$ is unknown but is presumably sparse. The class labels $\ell_{i}\in\{-1,1\}$ are also unknown and the main interest is to estimate them. ¶ We are interested in the statistical limits. In the two-dimensional phase space calibrating the rarity and strengths of useful features, we find the precise demarcation for the Region of Impossibility and Region of Possibility. In the former, useful features are too rare/weak for successful clustering. In the latter, useful features are strong enough to allow successful clustering. The results are...
Su, Weijie; Bogdan, Małgorzata; Candès, Emmanuel
In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity—meaning that the fraction of variables with a nonvanishing effect tends to a constant, however small—this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are....
Su, Weijie; Bogdan, Małgorzata; Candès, Emmanuel
In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity—meaning that the fraction of variables with a nonvanishing effect tends to a constant, however small—this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are....
Feller, Chrystel; Schorning, Kirsten; Dette, Holger; Bermann, Georgina; Bornkamp, Björn
A common problem in Phase II clinical trials is the comparison of dose response curves corresponding to different treatment groups. If the effect of the dose level is described by parametric regression models and the treatments differ in the administration frequency (but not in the sort of drug), a reasonable assumption is that the regression models for the different treatments share common parameters. ¶ This paper develops optimal design theory for the comparison of different regression models with common parameters. We derive upper bounds on the number of support points of admissible designs, and explicit expressions for $D$-optimal designs are...
Feller, Chrystel; Schorning, Kirsten; Dette, Holger; Bermann, Georgina; Bornkamp, Björn
A common problem in Phase II clinical trials is the comparison of dose response curves corresponding to different treatment groups. If the effect of the dose level is described by parametric regression models and the treatments differ in the administration frequency (but not in the sort of drug), a reasonable assumption is that the regression models for the different treatments share common parameters. ¶ This paper develops optimal design theory for the comparison of different regression models with common parameters. We derive upper bounds on the number of support points of admissible designs, and explicit expressions for $D$-optimal designs are...
Gao, Chao; Ma, Zongming; Zhou, Harrison H.
Canonical correlation analysis is a classical technique for exploring the relationship between two sets of variables. It has important applications in analyzing high dimensional datasets originated from genomics, imaging and other fields. This paper considers adaptive minimax and computationally tractable estimation of leading sparse canonical coefficient vectors in high dimensions. Under a Gaussian canonical pair model, we first establish separate minimax estimation rates for canonical coefficient vectors of each set of random variables under no structural assumption on marginal covariance matrices. Second, we propose a computationally feasible estimator to attain the optimal rates adaptively under an additional sample size condition....
Gao, Chao; Ma, Zongming; Zhou, Harrison H.
Canonical correlation analysis is a classical technique for exploring the relationship between two sets of variables. It has important applications in analyzing high dimensional datasets originated from genomics, imaging and other fields. This paper considers adaptive minimax and computationally tractable estimation of leading sparse canonical coefficient vectors in high dimensions. Under a Gaussian canonical pair model, we first establish separate minimax estimation rates for canonical coefficient vectors of each set of random variables under no structural assumption on marginal covariance matrices. Second, we propose a computationally feasible estimator to attain the optimal rates adaptively under an additional sample size condition....
Metelkina, Asya; Pronzato, Luc
Covariate-adaptive treatment allocation is considered in the situation when a compromise must be made between information (about the dependency of the probability of success of each treatment upon influential covariates) and cost (in terms of number of subjects receiving the poorest treatment). Information is measured through a design criterion for parameter estimation, the cost is additive and is related to the success probabilities. Within the framework of approximate design theory, the determination of optimal allocations forms a compound design problem. We show that when the covariates are i.i.d. with a probability measure $\mu$, its solution possesses some similarities with the...
Metelkina, Asya; Pronzato, Luc
Covariate-adaptive treatment allocation is considered in the situation when a compromise must be made between information (about the dependency of the probability of success of each treatment upon influential covariates) and cost (in terms of number of subjects receiving the poorest treatment). Information is measured through a design criterion for parameter estimation, the cost is additive and is related to the success probabilities. Within the framework of approximate design theory, the determination of optimal allocations forms a compound design problem. We show that when the covariates are i.i.d. with a probability measure $\mu$, its solution possesses some similarities with the...
James, Lancelot F.
Statistical latent feature models, such as latent factor models, are models where each observation is associated with a vector of latent features. A general problem is how to select the number/types of features, and related quantities. In Bayesian statistical machine learning, one seeks (nonparametric) models where one can learn such quantities in the presence of observed data. The Indian Buffet Process (IBP), devised by Griffiths and Ghahramani (2005), generates a (sparse) latent binary matrix with columns representing a potentially unbounded number of features and where each row corresponds to an individual or object. Its generative scheme is cast in terms...
James, Lancelot F.
Statistical latent feature models, such as latent factor models, are models where each observation is associated with a vector of latent features. A general problem is how to select the number/types of features, and related quantities. In Bayesian statistical machine learning, one seeks (nonparametric) models where one can learn such quantities in the presence of observed data. The Indian Buffet Process (IBP), devised by Griffiths and Ghahramani (2005), generates a (sparse) latent binary matrix with columns representing a potentially unbounded number of features and where each row corresponds to an individual or object. Its generative scheme is cast in terms...