Recursos de colección
Project Euclid (Hosted at Cornell University Library) (198.174 recursos)
The Annals of Applied Statistics
The Annals of Applied Statistics
Zhu, Lingxue; Lei, Jing; Devlin, Bernie; Roeder, Kathryn
Scientists routinely compare gene expression levels in cases versus controls in part to determine genes associated with a disease. Similarly, detecting case-control differences in co-expression among genes can be critical to understanding complex human diseases; however, statistical methods have been limited by the high-dimensional nature of this problem. In this paper, we construct a sparse-Leading-Eigenvalue-Driven (sLED) test for comparing two high-dimensional covariance matrices. By focusing on the spectrum of the differential matrix, sLED provides a novel perspective that accommodates what we assume to be common, namely sparse and weak signals in gene expression data, and it is closely related with...
Wang, Jue; Luo, Sheng; Li, Liang
In many clinical trials studying neurodegenerative diseases such as Parkinson’s disease (PD), multiple longitudinal outcomes are collected to fully explore the multidimensional impairment caused by this disease. If the outcomes deteriorate rapidly, patients may reach a level of functional disability sufficient to initiate levodopa therapy for ameliorating disease symptoms. An accurate prediction of the time to functional disability is helpful for clinicians to monitor patients’ disease progression and make informative medical decisions. In this article, we first propose a joint model that consists of a semiparametric multilevel latent trait model (MLLTM) for the multiple longitudinal outcomes, and a survival model...
Jiang, Runchao; Lu, Wenbin; Song, Rui; Hudgens, Michael G.; Naprvavnik, Sonia
In many biomedical settings, assigning every patient the same treatment may not be optimal due to patient heterogeneity. Individualized treatment regimes have the potential to dramatically improve clinical outcomes. When the primary outcome is censored survival time, a main interest is to find optimal treatment regimes that maximize the survival probability of patients. Since the survival curve is a function of time, it is important to balance short-term and long-term benefit when assigning treatments. In this paper, we propose a doubly robust approach to estimate optimal treatment regimes that optimize a user specified function of the survival curve, including the...
Tang, Xiaoying; Miller, Michael I.; Younes, Laurent
We consider in this paper a statistical two-phase regression model in which the change point of a disease biomarker is measured relative to another point in time, such as the manifestation of the disease, which is subject to right-censoring (i.e., possibly unobserved over the entire course of the study). We develop point estimation methods for this model, based on maximum likelihood, and bootstrap validation methods. The effectiveness of our approach is illustrated by numerical simulations, and by the estimation of a change point for amygdalar atrophy in the context of Alzheimer’s disease, wherein it is related to the cognitive manifestation...
Hooghoudt, Jan-Otto; Barroso, Margarida; Waagepetersen, Rasmus
Förster resonance energy transfer (FRET) is a quantum-physical phenomenon where energy may be transferred from one molecule to a neighbor molecule if the molecules are close enough. Using fluorophore molecule marking of proteins in a cell, it is possible to measure in microscopic images to what extent FRET takes place between the fluorophores. This provides indirect information of the spatial distribution of the proteins. Questions of particular interest are whether (and if so to which extent) proteins of possibly different types interact or whether they appear independently of each other. In this paper we propose a new likelihood-based approach to...
Kuusela, Mikael; Stark, Philip B.
The high energy physics unfolding problem is an important statistical inverse problem in data analysis at the Large Hadron Collider (LHC) at CERN. The goal of unfolding is to make nonparametric inferences about a particle spectrum from measurements smeared by the finite resolution of the particle detectors. Previous unfolding methods use ad hoc discretization and regularization, resulting in confidence intervals that can have significantly lower coverage than their nominal level. Instead of regularizing using a roughness penalty or stopping iterative methods early, we impose physically motivated shape constraints: positivity, monotonicity, and convexity. We quantify the uncertainty by constructing a nonparametric...
Yan, Fangrong; Lin, Xiao; Huang, Xuelin
Patients’ biomarker data are repeatedly measured over time during their follow-up visits. Statistical models are needed to predict disease progression on the basis of these longitudinal biomarker data. Such predictions must be conducted on a real-time basis so that at any time a new biomarker measurement is obtained, the prediction can be updated immediately to reflect the patient’s latest prognosis and further treatment can be initiated as necessary. This is called dynamic prediction. The challenge is that longitudinal biomarker values fluctuate over time, and their changing patterns vary greatly across patients. In this article, we apply functional principal components analysis...
Maruotti, Antonello; Bulla, Jan; Lagona, Francesco; Picone, Marco; Martella, Francesca
The assessment of pollution exposure is based on the analysis of a multivariate time series that include the concentrations of several pollutants as well as the measurements of multiple atmospheric variables. It typically requires methods of dimensionality reduction that are capable of identifying potentially dangerous combinations of pollutants and simultaneously segmenting exposure periods according to air quality conditions. When the data are high-dimensional, however, efficient methods of dimensionality reduction are challenging because of the formidable structure of cross-correlations that arise from the dynamic interaction between weather conditions and natural/anthropogenic pollution sources. In order to assess pollution exposure in an urban...
Jarne, Ana; Commenges, Daniel; Villain, Laura; Prague, Mélanie; Lévy, Yves; Thiébaut, Rodolphe
Combination antiretroviral therapy successfully controls viral replication in most HIV infected patients. This is normally followed by a reconstitution of the $\mathrm{CD4}^{+}$ T cells pool, but not for all patients. For these patients, an immunotherapy based on injections of Interleukin 7 (IL-7) has been recently proposed in the hope of obtaining long-term reconstitution of the T cells pool. Several questions arise as to the long-term efficiency of this treatment and the best protocol to apply. Mathematical and statistical models can help answer these questions.
¶
We developed a model based on a system of ordinary differential equations and a statistical model of...
Zhu, Xiang; Stephens, Matthew
Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that...
Nicosia, Aurélien; Duchesne, Thierry; Rivest, Louis-Paul; Fortin, Daniel
A multi-state version of an animal movement analysis method based on conditional logistic regression, called Step Selection Function (SSF), is proposed. In ecology SSF is developed from a comparison between the observed location of an animal and randomly sampled locations at each time step. Interpretation of the parameters in the multi-state model and the impact of different sampling schemes for the random locations are discussed. We prove the relationship between the new model, called HMM-SSF, and a random walk model on the plane. This relationship allows one to use both movement characteristics and local discrete choice behaviors when identifying the...
Jiang, Bei; Petkova, Eva; Tarpey, Thaddeus; Ogden, R. Todd
Latent class models are widely used to identify unobserved subgroups (i.e., latent classes) based upon one or more manifest variables. The probability of belonging to each subgroup is typically modeled as a function of a set of measured covariates. In this paper, we extend existing latent class models to incorporate matrix covariates. This research is motivated by a randomized placebo-controlled depression clinical trial. One study goal is to identify a subgroup of subjects who experience symptoms improvement early on during antidepressant treatment, which is considered to be an indication of a placebo rather than a true pharmacological response. We want...
Liu, Binghui; Wu, Chong; Shen, Xiaotong; Pan, Wei
Next-generation sequencing studies on cancer somatic mutations have discovered that driver mutations tend to appear in most tumor samples, but they barely overlap in any single tumor sample, presumably because a single driver mutation can perturb the whole pathway. Based on the corresponding new concepts of coverage and mutual exclusivity, new methods can be designed for de novo discovery of mutated driver pathways in cancer. Since the computational problem is a combinatorial optimization with an objective function involving a discontinuous indicator function in high dimension, many existing optimization algorithms, such as a brute force enumeration, gradient descent and Newton’s methods,...
Wang, Y. Samuel; Matsueda, Ross L.; Erosheva, Elena A.
In this article, we consider modeling ranked responses from a heterogeneous population. Specifically, we analyze data from the Eurobarometer 34.1 survey regarding public policy preferences toward drugs, alcohol, and AIDS. Such policy preferences are likely to exhibit substantial differences within as well as across European nations reflecting a wide variety of cultures, political affiliations, ideological perspectives, and common practices. We use a mixed membership model to account for multiple subgroups with differing preferences and to allow each individual to possess partial membership in more than one subgroup. Previous methods for fitting mixed membership models to rank data in a univariate...
Yen, Tso-Jung; Lee, Zong-Rong; Chen, Yi-Hau; Yen, Yu-Min; Hwang, Jing-Shiang
In this paper we develop a statistical method for identifying links of a network from time to event data. This method models the hazard function of a node conditional on event time of other nodes, parameterizing the conditional hazard function with the links of the network. It then estimates the hazard function by maximizing a pseudo partial likelihood function with parameters subject to a user-specified penalty function and additional constraints. To make such estimation robust, it adopts a pre-specified risk control on the number of false discovered links by using the Stability Selection method. Simulation study shows that under this...
Chailan, Romain; Toulemonde, Gwladys; Bacro, Jean-Noel
Coastal hazards raise many concerns, as their assessment involves extremely high economic and ecological stakes. In particular, studies on rarely observed but damaging events are quite numerous. In order to anticipate upcoming events of this kind, specialists need to extrapolate the results of their studies to events that have not yet occurred. Such events might be more extreme than those already observed and could therefore severely impact the coast. It is therefore paramount to propose methodologies to simulate such extreme conditions. Parametric and nonparametric statistical methods have already been used to assess environmental extreme quantities, from univariate framework to spatial...
Hartmann, Marcelo; Hosack, Geoffrey R.; Hillary, Richard M.; Vanhatalo, Jarno
Density dependent population growth functions are of central importance to population dynamics modelling because they describe the theoretical rate of recruitment of new individuals to a natural population. Traditionally, these functions are described with a fixed functional form with temporally constant parameters and without species interactions. The Ricker stock-recruitment model is one such function that is commonly used in fisheries stock assessment. In recent years, there has been increasing interest in semiparametric and temporally varying population growth models. The former are related to the general statistical approach of using semiparametric discrepancy functions, such as Gaussian processes (GP), to model deviations...
Shi, Xu; Pashova, Hristina; Heagerty, Patrick J.
The linkage of electronic medical records (EMR) across clinics, hospitals, and healthcare systems is opening new opportunities to evaluate factors associated with both individual treatment benefit and potential harm. For example, the FDA Sentinel initiative seeks to create a surveillance network with over 100 million patient lives (Behrman et al. [N. Engl. J. Med. 364 (2011) 498–499]), while PCORnet has created multiple networks that include linked electronic medical records from geographic regions such as entire cities or states, with the ultimate goal of facilitating comparative effectiveness research (Collins et al. [Journal of the American Medical Informatics Association 4 (2014) 576–577])....
Tak, Hyungsuk; Mandel, Kaisey; van Dyk, David A.; Kashyap, Vinay L.; Meng, Xiao-Li; Siemiginowska, Aneta
The gravitational field of a galaxy can act as a lens and deflect the light emitted by a more distant object such as a quasar. Strong gravitational lensing causes multiple images of the same quasar to appear in the sky. Since the light in each gravitationally lensed image traverses a different path length from the quasar to the Earth, fluctuations in the source brightness are observed in the several images at different times. The time delay between these fluctuations can be used to constrain cosmological parameters and can be inferred from the time series of brightness data or light curves...
Chang, Lo-Bin; Borenstein, Eran; Zhang, Wei; Geman, Stuart
Most approaches to computer vision can be thought of as lying somewhere on a continuum between generative and discriminative. Although each approach has had its successes, recent advances have favored discriminative methods, most notably the convolutional neural network. Still, there is some doubt about whether this approach will scale to a human-level performance given the numbers of samples that are needed to train state-of-the-art systems. Here, we focus on the generative or Bayesian approach, which is more model based and, in theory, more efficient. Challenges include latent-variable modeling, computationally efficient inference, and data modeling. We restrict ourselves to the problem...