Mostrando recursos 1 - 20 de 109

  1. Empirical assessment of programs to promote collaboration: A network model approach

    McLaughlin, Katherine R.; EmBree, Joshua D.
    Collaboration networks are thought to be desirable to foster both individual and population productivity. Often programs are implemented to promote collaboration, for example, at academic institutions. However, few tools are available to assess the efficacy of these programs, and very few are data-driven. We carried out a survey at California State University, San Marcos during the 2012–2013 academic year to measure five types of collaboration ties among professors in five science departments at the university over time. During the time period of study, professors participated in NIH-sponsored curriculum development activities with members of other departments. It was hypothesized that participation...

  2. Modeling and estimation for self-exciting spatio-temporal models of terrorist activity

    Clark, Nicholas J.; Dixon, Philip M.
    Spatio-temporal hierarchical modeling is an extremely attractive way to model the spread of crime or terrorism data over a given region, especially when the observations are counts and must be modeled discretely. The spatio-temporal diffusion is placed, as a matter of convenience, in the process model allowing for straightforward estimation of the diffusion parameters through Bayesian techniques. However, this method of modeling does not allow for the existence of self-excitation, or a temporal data model dependency, that has been shown to exist in criminal and terrorism data. In this manuscript we will use existing theories on how violence spreads to...

  3. A unified statistical framework for single cell and bulk RNA sequencing data

    Zhu, Lingxue; Lei, Jing; Devlin, Bernie; Roeder, Kathryn
    Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the “dropout” events. A “dropout” happens when the RNA for a gene fails to be amplified prior to sequencing, producing a “false” zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for...

  4. Fast inference of individual admixture coefficients using geographic data

    Caye, Kevin; Jay, Flora; Michel, Olivier; François, Olivier
    Accurately evaluating the distribution of genetic ancestry across geographic space is one of the main questions addressed by evolutionary biologists. This question has been commonly addressed through the application of Bayesian estimation programs allowing their users to estimate individual admixture proportions and allele frequencies among putative ancestral populations. Following the explosion of high-throughput sequencing technologies, several algorithms have been proposed to cope with computational burden generated by the massive data in those studies. In this context, incorporating geographic proximity in ancestry estimation algorithms is an open statistical and computational challenge. In this study, we introduce new algorithms that use geographic...

  5. Powerful test based on conditional effects for genome-wide screening

    Liu, Yaowu; Xie, Jun
    This paper considers testing procedures for screening large genome-wide data, where we examine hundreds of thousands of genetic variants, for example, single nucleotide polymorphisms (SNP), on a quantitative phenotype. We screen the whole genome by SNP sets and propose a new test that is based on conditional effects from multiple SNPs. The test statistic is developed for weak genetic effects and incorporates correlations among genetic variables, which may be very high due to linkage disequilibrium. The limiting null distribution of the test statistic and the power of the test are derived. Under appropriate conditions, the test is shown to be...

  6. Kernel-penalized regression for analysis of microbiome data

    Randolph, Timothy W.; Zhao, Sen; Copeland, Wade; Hullar, Meredith; Shojaie, Ali
    The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized...

  7. MSIQ: Joint modeling of multiple RNA-seq samples for accurate isoform quantification

    Li, Wei Vivian; Zhao, Anqi; Zhang, Shihua; Li, Jingyi Jessica
    Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structures, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate isoform quantification from RNA-seq data is challenging due to the information loss in sequencing experiments. A recent accumulation of multiple RNA-seq data sets from the same tissue or cell type provides new opportunities to improve the accuracy of isoform quantification. However, existing statistical or computational methods for multiple RNA-seq samples either pool the samples into...

  8. Reducing storage of global wind ensembles with stochastic generators

    Jeong, Jaehong; Castruccio, Stefano; Crippa, Paola; Genton, Marc G.
    Wind has the potential to make a significant contribution to future energy resources. Locating the sources of this renewable energy on a global scale is however extremely challenging, given the difficulty to store very large data sets generated by modern computer models. We propose a statistical model that aims at reproducing the data-generating mechanism of an ensemble of runs via a Stochastic Generator (SG) of global annual wind data. We introduce an evolutionary spectrum approach with spatially varying parameters based on large-scale geographical descriptors such as altitude to better account for different regimes across the Earth’s orography. We consider a...

  9. A multi-resolution model for non-Gaussian random fields on a sphere with application to ionospheric electrostatic potentials

    Fan, Minjie; Paul, Debashis; Lee, Thomas C. M.; Matsuo, Tomoko
    Gaussian random fields have been one of the most popular tools for analyzing spatial data. However, many geophysical and environmental processes often display non-Gaussian characteristics. In this paper, we propose a new class of spatial models for non-Gaussian random fields on a sphere based on a multi-resolution analysis. Using a special wavelet frame, named spherical needlets, as building blocks, the proposed model is constructed in the form of a sparse random effects model. The spatial localization of needlets, together with carefully chosen random coefficients, ensure the model to be non-Gaussian and isotropic. The model can also be expanded to include...

  10. Stochastic simulation of predictive space–time scenarios of wind speed using observations and physical model outputs

    Bessac, Julie; Constantinescu, Emil; Anitescu, Mihai
    We propose a statistical space–time model for predicting atmospheric wind speed based on deterministic numerical weather predictions and historical measurements. We consider a Gaussian multivariate space–time framework that combines multiple sources of past physical model outputs and measurements in order to produce a probabilistic wind speed forecast within the prediction window. We illustrate this strategy on wind speed forecasts during several months in 2012 for a region near the Great Lakes in the United States. The results show that the prediction is improved in the mean-squared sense relative to the numerical forecasts as well as in probabilistic scores. Moreover, the...

  11. Multivariate integer-valued time series with flexible autocovariances and their application to major hurricane counts

    Livsey, James; Lund, Robert; Kechagias, Stefanos; Pipiras, Vladas
    This paper examines a bivariate count time series with some curious statistical features: Saffir–Simpson Category 3 and stronger annual hurricane counts in the North Atlantic and eastern Pacific Ocean Basins. As land and ocean temperatures on our planet warm, an intense climatological debate has arisen over whether hurricanes are becoming more numerous, or whether the strengths of the individual storms are increasing. Recent literature concludes that an increase in hurricane counts occurred in the Atlantic Basin circa 1994. This increase persisted through 2012; moreover, the 1994–2012 period was one of relative inactivity in the eastern Pacific Basin. When Atlantic activity...

  12. A spatio-temporal modeling framework for weather radar image data in tropical Southeast Asia

    Liu, Xiao; Gopal, Vikneswaran; Kalagnanam, Jayant
    Tropical storms are known to be highly chaotic and extremely difficult to predict. In tropical countries such as Singapore, the official lead time for the warnings of heavy storms is usually between 15 and 45 minutes because weather systems develop quickly and are of very short lifespan. A single thunderstorm cell, for example, typically lives for less than an hour. Weather radar echoes, correlated in both space and time, provide a rich source of information for short-term precipitation nowcasting. Based on a large dataset of 276 tropical storms events, this paper investigates a spatio-temporal modeling approach for two-dimensional radar reflectivity...

  13. Two-level structural sparsity regularization for identifying lattices and defects in noisy images

    Li, Xin; Belianinov, Alex; Dyck, Ondrej; Jesse, Stephen; Park, Chiwoo
    This paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. Furthermore, real atomic scale images contain defects and vacancies, so atomic identification based solely on a lattice group may result in false positives and false negatives. To...

  14. Design of vaccine trials during outbreaks with and without a delayed vaccination comparator

    Dean, Natalie E.; Halloran, M. Elizabeth; Longini, Ira M.
    Conducting vaccine efficacy trials during outbreaks of emerging pathogens poses particular challenges. The “Ebola ça suffit” trial in Guinea used a novel ring vaccination cluster randomized design to target populations at highest risk of infection. Another key feature of the trial was the use of a delayed vaccination arm as a comparator, in which clusters were randomized to immediate vaccination or vaccination 21 days later. This approach, chosen to improve ethical acceptability of the trial, complicates the statistical analysis as participants in the comparison arm are eventually protected by vaccine. Furthermore, for infectious diseases, we observe time of illness onset...

  15. Automated threshold selection for extreme value analysis via ordered goodness-of-fit tests with adjustment for false discovery rate

    Bader, Brian; Yan, Jun; Zhang, Xuebin
    Threshold selection is a critical issue for extreme value analysis with threshold-based approaches. Under suitable conditions, exceedances over a high threshold have been shown to follow the generalized Pareto distribution (GPD) asymptotically. In practice, however, the threshold must be chosen. If the chosen threshold is too low, the GPD approximation may not hold and bias can occur. If the threshold is chosen too high, reduced sample size increases the variance of parameter estimates. To process batch analyses, commonly used selection methods such as graphical diagnostics are subjective and cannot be automated. We develop an efficient technique to evaluate and apply...

  16. Time-varying extreme value dependence with application to leading European stock markets

    Castro-Camilo, Daniela; de Carvalho, Miguel; Wadsworth, Jennifer
    Extremal dependence between international stock markets is of particular interest in today’s global financial landscape. However, previous studies have shown this dependence is not necessarily stationary over time. We concern ourselves with modeling extreme value dependence when that dependence is changing over time, or other suitable covariate. Working within a framework of asymptotic dependence, we introduce a regression model for the angular density of a bivariate extreme value distribution that allows us to assess how extremal dependence evolves over a covariate. We apply the proposed model to assess the dynamics governing extremal dependence of some leading European stock markets over...

  17. Extreme value modelling of water-related insurance claims

    Rohrbeck, Christian; Eastoe, Emma F.; Frigessi, Arnoldo; Tawn, Jonathan A.
    This paper considers the dependence between weather events, for example, rainfall or snow-melt, and the number of water-related property insurance claims. Weather events which cause severe damages are of general interest; decision makers want to take efficient actions against them while the insurance companies want to set adequate premiums. The modelling is challenging since the underlying dynamics vary across geographical regions due to differences in topology, construction designs and climate. We develop new methodology to improve the existing models which fail to model high numbers of claims. The statistical framework is based on both mixture and extremal mixture modelling, with...

  18. How Gaussian mixture models might miss detecting factors that impact growth patterns

    Heggeseth, Brianna C.; Jewell, Nicholas P.
    Longitudinal studies play a prominent role in biological, social, and behavioral sciences. Repeated measurements over time facilitate the study of an outcome level, how individuals change over time, and the factors that may impact either or both. A standard approach to modeling childhood growth over time is to use multilevel or mixed effects models to study factors that might play a role in the level and growth over time. However, there has been increased interest in using mixture models, which have inherent grouping structure to more flexibly explain heterogeneity in the longitudinal outcomes, to study growth patterns. While several possible...

  19. Adjustment of nonconfounding covariates in case-control genetic association studies

    Zhang, Hong; Chatterjee, Nilanjan; Rader, Daniel; Chen, Jinbo
    It has recently been reported that adjustment of nonconfounding covariates in case-control genetic association analyses may lead to decreased power when the phenotype is rare. This observation contrasts a well-known result for clinical trials where adjustment of baseline variables always leads to increased power for testing randomized treatment effects. In this paper, we propose a unified solution that guarantees increased power through covariate adjustment regardless of whether the phenotype is rare or common. Our method exploits external phenotype prevalence data through a profile likelihood function, and can be applied to fit any commonly used penetrance models including the logistic and...

  20. Integrative exploration of large high-dimensional datasets

    Pardy, Christopher; Galbraith, Sally; Wilson, Susan R.
    Large, high-dimensional datasets containing different types of variables are becoming increasingly common. For exploring such data, there is a need for integrated methods. For example, a single genomic experiment can contain large quantities of different types of data (including clinical data) that make it a challenge to coherently describe the patterns of variability within and between the inter-related datasets. Mutual information (MI) is a widely used information theoretic dependency measure that also can identify nonlinear and nonmonotonic associations. First, we develop a computationally efficient implementation of MI between a discrete and a continuous variable. This implementation allows us to apply...

Aviso de cookies: Usamos cookies propias y de terceros para mejorar nuestros servicios, para análisis estadístico y para mostrarle publicidad. Si continua navegando consideramos que acepta su uso en los términos establecidos en la Política de cookies.