Mostrando recursos 1 - 20 de 923

  1. Exploiting TIMSS and PIRLS combined data: Multivariate multilevel modelling of student achievement

    Grilli, Leonardo; Pennoni, Fulvia; Rampichini, Carla; Romeo, Isabella
    We illustrate how to perform a multivariate multilevel analysis in the complex setting of large-scale assessment surveys, dealing with plausible values and accounting for the survey design. In particular, we consider the Italian sample of the TIMSS&PIRLS 2011 Combined International Database on fourth grade students. The multivariate approach jointly considers educational achievement in Reading, Mathematics and Science, thus allowing us to test for differential associations of the covariates with the three outcomes, and to estimate the residual correlations among pairs of outcomes within and between classes. Multilevel modelling allows us to disentangle student and contextual factors affecting achievement. We also...

  2. A phylogenetic latent feature model for clonal deconvolution

    Marass, Francesco; Mouliere, Florent; Yuan, Ke; Rosenfeld, Nitzan; Markowetz, Florian
    Tumours develop in an evolutionary process, in which the accumulation of mutations produces subpopulations of cells with distinct mutational profiles, called clones. This process leads to the genetic heterogeneity widely observed in tumour sequencing data, but identifying the genotypes and frequencies of the different clones is still a major challenge. Here, we present Cloe, a phylogenetic latent feature model to deconvolute tumour sequencing data into a set of related genotypes. Our approach extends latent feature models by placing the features as nodes in a latent tree. The resulting model can capture both the acquisition and the loss of mutations, as...

  3. Bootstrap aggregating continual reassessment method for dose finding in drug-combination trials

    Lin, Ruitao; Yin, Guosheng
    Phase I drug-combination trials are becoming commonplace in oncology. Most of the current dose-finding designs aim to quantify the toxicity probability space using certain prespecified yet complicated models. These models need to characterize not only each individual drug’s toxicity profile, but also their interaction effects, which often leads to multi-parameter models. We propose a novel Bayesian adaptive design for drug-combination trials based on a robust dimension-reduction method. We continuously update the order of dose combinations and reduce the two-dimensional searching space to a one-dimensional line based on the estimated order. As a result, the common approaches to single-agent dose finding,...

  4. A lag functional linear model for prediction of magnetization transfer ratio in multiple sclerosis lesions

    Pomann, Gina-Maria; Staicu, Ana-Maria; Lobaton, Edgar J.; Mejia, Amanda F.; Dewey, Blake E.; Reich, Daniel S.; Sweeney, Elizabeth M.; Shinohara, Russell T.
    We propose a lag functional linear model to predict a response using multiple functional predictors observed at discrete grids with noise. Two procedures are proposed to estimate the regression parameter functions: (1) an approach that ensures smoothness for each value of time using generalized cross-validation; and (2) a global smoothing approach using a restricted maximum likelihood framework. Numerical studies are presented to analyze predictive accuracy in many realistic scenarios. The methods are employed to estimate a magnetic resonance imaging (MRI)-based measure of tissue damage (the magnetization transfer ratio, or MTR) in multiple sclerosis (MS) lesions, a disease that causes damage...

  5. Bayesian inference for the Brown–Resnick process, with an application to extreme low temperatures

    Thibaud, Emeric; Aalto, Juha; Cooley, Daniel S.; Davison, Anthony C.; Heikkinen, Juha
    The Brown–Resnick max-stable process has proven to be well suited for modeling extremes of complex environmental processes, but in many applications its likelihood function is intractable and inference must be based on a composite likelihood, thereby preventing the use of classical Bayesian techniques. In this paper we exploit a case in which the full likelihood of a Brown–Resnick process can be calculated, using componentwise maxima and their partitions in terms of individual events, and we propose two new approaches to inference. The first estimates the partitions using declustering, while the second uses random partitions in a Markov chain Monte Carlo...

  6. Improving ice sheet model calibration using paleoclimate and modern data

    Chang, Won; Haran, Murali; Applegate, Patrick; Pollard, David
    Human-induced climate change may cause significant ice volume loss from the West Antarctic Ice Sheet (WAIS). Projections of ice volume change from ice sheet models and corresponding future sea-level rise have large uncertainties due to poorly constrained input parameters. In most future applications to date, model calibration has utilized only modern or recent (decadal) observations, leaving input parameters that control the long-term behavior of WAIS largely unconstrained. Many paleo-observations are in the form of localized time series, while modern observations are non-Gaussian spatial data; combining information across these types poses nontrivial statistical challenges. Here we introduce a computationally efficient calibration...

  7. A statistical model to assess (allele-specific) associations between gene expression and epigenetic features using sequencing data

    Rashid, Naim U.; Sun, Wei; Ibrahim, Joseph G.
    Sequencing techniques have been widely used to assess gene expression (i.e., RNA-seq) or the presence of epigenetic features (e.g., DNase-seq to identify open chromatin regions). In contrast to traditional microarray platforms, sequencing data are typically summarized in the form of discrete counts, and they are able to delineate allele-specific signals, which are not available from microarrays. The presence of epigenetic features are often associated with gene expression, both of which have been shown to be affected by DNA polymorphisms. However, joint models with the flexibility to assess interactions between gene expression, epigenetic features and DNA polymorphisms are currently lacking. In...

  8. Estimating odds ratios under a case-background design with an application to a study of Sorafenib accessibility

    Spivack, John H.; Cheng, Bin
    In certain epidemiologic studies such as those involving stress disorders, sexual harassment, alcohol addiction or epidemiological criminology, exposure data are readily available from cases but not from controls because it is socially inconvenient or even unethical to determine who qualifies as a true control subject. Consequently, it is impractical or even infeasible to use a case-control design to establish the case-exposure association in such situations. To address this issue, we propose a case-background design where in addition to a sample of exposure information from cases, an independent sample of exposure information from the background population is taken, without knowing the...

  9. Locally adaptive dynamic networks

    Durante, Daniele; Dunson, David B.
    Our focus is on realistically modeling and forecasting dynamic networks of face-to-face contacts among individuals. Important aspects of such data that lead to problems with current methods include the tendency of the contacts to move between periods of slow and rapid changes, and the dynamic heterogeneity in the actors’ connectivity behaviors. Motivated by this application, we develop a novel method for Locally Adaptive DYnamic (LADY) network inference. The proposed model relies on a dynamic latent space representation in which each actor’s position evolves in time via stochastic differential equations. Using a state-space representation for these stochastic processes and Pólya-gamma data...

  10. Dynamic social networks based on movement

    Scharf, Henry R.; Hooten, Mevin B.; Fosdick, Bailey K.; Johnson, Devin S.; London, Josh M.; Durban, John W.
    Network modeling techniques provide a means for quantifying social structure in populations of individuals. Data used to define social connectivity are often expensive to collect and based on case-specific, ad hoc criteria. Moreover, in applications involving animal social networks, collection of these data is often opportunistic and can be invasive. Frequently, the social network of interest for a given population is closely related to the way individuals move. Thus, telemetry data, which are minimally invasive and relatively inexpensive to collect, present an alternative source of information. We develop a framework for using telemetry data to infer social relationships among animals....

  11. Bayesian nonparametric multiresolution estimation for the American Community Survey

    Savitsky, Terrance D.
    Bayesian hierarchical methods implemented for small area estimation focus on reducing the noise variation in published government official statistics by borrowing information among dependent response values. Even the most flexible models confine parameters defined at the finest scale to link to each data observation in a one-to-one construction. We propose a Bayesian multiresolution formulation that utilizes an ensemble of observations at a variety of coarse scales in space and time to additively nest parameters we define at a finer scale, which serve as our focus for estimation. Our construction is motivated by and applied to the estimation of 1-year period...

  12. Cox regression with exclusion frequency-based weights to identify neuroimaging markers relevant to Huntington’s disease onset

    Garcia, Tanya P.; Müller, Samuel
    Biomedical studies of neuroimaging and genomics collect large amounts of data on a small subset of subjects so as to not miss informative predictors. An important goal is identifying those predictors that provide better visualization of the data and that could serve as cost-effective measures for future clinical trials. Identifying such predictors is challenging, however, when the predictors are naturally interrelated and the response is a failure time prone to censoring. We propose to handle these challenges with a novel variable selection technique. Our approach casts the problem into several smaller dimensional settings and extracts from this intermediary step the...

  13. The screening and ranking algorithm for change-points detection in multiple samples

    Song, Chi; Min, Xiaoyi; Zhang, Heping
    The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare...

  14. Modelling the effect of the El Niño-Southern Oscillation on extreme spatial temperature events over Australia

    Winter, Hugo C.; Tawn, Jonathan A.; Brown, Simon J.
    When assessing the risk posed by high temperatures, it is necessary to consider not only the temperature at separate sites but also how many sites are expected to be hot at the same time. Hot events that cover a large area have the potential to put a great strain on health services and cause devastation to agriculture, leading to high death tolls and much economic damage. Southeastern Australia experienced a severe heatwave in early 2009; 374 people died in the state of Victoria and Melbourne recorded its highest temperature since records began in 1859 [Nairn and Fawcett (2013)]. One area...

  15. Inferring rooted population trees using asymmetric neighbor joining

    Zhai, Yongliang; Bouchard-Côté, Alexandre
    We introduce a new inference method to estimate evolutionary distances for any two populations to their most recent common ancestral population using single-nucleotide polymorphism allele frequencies. Our model takes fixation into consideration, making it nonreversible, and guarantees that the distribution of reconstructed ancestral frequencies is contained on the interval $[0,1]$. To scale this method to large numbers of populations, we introduce the asymmetric neighbor joining algorithm, an efficient method for reconstructing rooted bifurcating nonclock trees. Asymmetric neighbor joining provides a scalable rooting method applicable to any nonreversible evolutionary modeling setups. We explore the statistical properties of asymmetric neighbor joining, and...

  16. Modeling concurrency and selective mixing in heterosexual partnership networks with applications to sexually transmitted diseases

    Admiraal, Ryan; Handcock, Mark S.
    Network-based models for sexually transmitted disease transmission rely on initial partnership networks incorporating structures that may be related to risk of infection. In particular, initial networks should reflect the level of concurrency and attribute-based selective mixing observed in the population of interest. We consider momentary degree distributions as measures of concurrency and propensities for people of certain types to form partnerships with each other as a measure of attribute-based selective mixing. Estimation of momentary degree distributions and mixing patterns typically relies on cross-sectional survey data, and, in the context of heterosexual networks, we describe how this results in two sets...

  17. Maximizing the information content of a balanced matched sample in a study of the economic performance of green buildings

    Kilcioglu, Cinar; Zubizarreta, José R.
    Buildings have a major impact on the environment through excessive use of resources, such as energy and water, and large carbon dioxide emissions. In this paper we revisit a previously published study about the economics of environmentally sustainable buildings and estimate the effect of green building practices on market rents. For this, we use new matching methods that take advantage of the clustered structure of the buildings data. We propose a general framework for matching in observational studies and specific matching methods within this framework that simultaneously achieve three goals: (i) maximize the information content of a matched sample (and,...

  18. Predicting Melbourne ambulance demand using kernel warping

    Zhou, Zhengyi; Matteson, David S.
    Predicting ambulance demand accurately in fine resolutions in space and time is critical for ambulance fleet management and dynamic deployment. Typical challenges include data sparsity at high resolutions and the need to respect complex urban spatial domains. To provide spatial density predictions for ambulance demand in Melbourne, Australia, as it varies over hourly intervals, we propose a predictive spatio-temporal kernel warping method. To predict for each hour, we build a kernel density estimator on a sparse set of the most similar data from relevant past time periods (labeled data), but warp these kernels to a larger set of past data...

  19. Improving covariate balance in 2 K factorial designs via rerandomization with an application to a New York City Department of Education High School Study

    Branson, Zach; Dasgupta, Tirthankar; Rubin, Donald B.
    A few years ago, the New York Department of Education (NYDE) was planning to conduct an experiment involving five new intervention programs for a selected set of New York City high schools. The goal was to estimate the causal effects of these programs and their interactions on the schools’ performance. For each of the schools, about 50 premeasured covariates were available. The schools could be randomly assigned to the 32 treatment combinations of this $2^{5}$ factorial experiment, but such an allocation could have resulted in a huge covariate imbalance across treatment groups. Standard methods used to prevent confounding of treatment...

  20. Investigating differences in brain functional networks using hierarchical covariate-adjusted independent component analysis

    Shi, Ran; Guo, Ying
    Human brains perform tasks via complex functional networks consisting of separated brain regions. A popular approach to characterize brain functional networks in fMRI studies is independent component analysis (ICA), which is a powerful method to reconstruct latent source signals from their linear mixtures. In many fMRI studies, an important goal is to investigate how brain functional networks change according to specific clinical and demographic variabilities. Existing ICA methods, however, cannot directly incorporate covariate effects in ICA decomposition. Heuristic post-ICA analysis to address this need can be inaccurate and inefficient. In this paper, we propose a hierarchical covariate-adjusted ICA (hc-ICA) model...

Aviso de cookies: Usamos cookies propias y de terceros para mejorar nuestros servicios, para análisis estadístico y para mostrarle publicidad. Si continua navegando consideramos que acepta su uso en los términos establecidos en la Política de cookies.