Mostrando recursos 1 - 20 de 77

  1. Gamma shape mixtures for heavy-tailed distributions

    Venturini, Sergio; Dominici, Francesca; Parmigiani, Giovanni
    An important question in health services research is the estimation of the proportion of medical expenditures that exceed a given threshold. Typically, medical expenditures present highly skewed, heavy tailed distributions, for which (a) simple variable transformations are insufficient to achieve a tractable low-dimensional parametric form and (b) nonparametric methods are not efficient in estimating exceedance probabilities for large thresholds. Motivated by this context, in this paper we propose a general Bayesian approach for the estimation of tail probabilities of heavy-tailed distributions, based on a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides a flexible and novel approach for modeling heavy-tailed distributions, it is computationally efficient,...

  2. Quantitative magnetic resonance image analysis via the EM algorithm with stochastic variation

    Zhang, Xiaoxi; Johnson, Timothy D.; Little, Roderick J. A.; Cao, Yue
    Quantitative Magnetic Resonance Imaging (qMRI) provides researchers insight into pathological and physiological alterations of living tissue, with the help of which researchers hope to predict (local) therapeutic efficacy early and determine optimal treatment schedule. However, the analysis of qMRI has been limited to ad-hoc heuristic methods. Our research provides a powerful statistical framework for image analysis and sheds light on future localized adaptive treatment regimes tailored to the individual’s response. We assume in an imperfect world we only observe a blurred and noisy version of the underlying pathological/physiological changes via qMRI, due to measurement errors or unpredictable influences. We use a hidden Markov random field to model the spatial dependence in the...

  3. Unsupervised empirical Bayesian multiple testing with external covariates

    Ferkingstad, Egil; Frigessi, Arnoldo; Rue, Håvard; Thorleifsson, Gudmar; Kong, Augustine
    In an empirical Bayesian setting, we provide a new multiple testing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We measure the posterior significance of each test conditionally on the covariate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a list of significant hypotheses which differs in length and order from the list obtained by methods not taking covariate-information into account. Covariate-modulated posterior probabilities of each null hypothesis are estimated using a fast approximate algorithm. The new method is applied to expression quantitative trait loci (eQTL) data.

  4. Hidden Markov models for the assessment of chromosomal alterations using high-throughput SNP arrays

    Scharpf, Robert B.; Parmigiani, Giovanni; Pevsner, Jonathan; Ruczinski, Ingo
    Chromosomal DNA is characterized by variation between individuals at the level of entire chromosomes (e.g., aneuploidy in which the chromosome copy number is altered), segmental changes (including insertions, deletions, inversions, and translocations), and changes to small genomic regions (including single nucleotide polymorphisms). A variety of alterations that occur in chromosomal DNA, many of which can be detected using high density single nucleotide polymorphism (SNP) microarrays, are linked to normal variation as well as disease and are therefore of particular interest. These include changes in copy number (deletions and duplications) and genotype (e.g., the occurrence of regions of homozygosity). Hidden Markov models (HMM) are particularly useful for detecting such alterations, modeling the spatial dependence...

  5. Bayesian models to adjust for response bias in survey data for estimating rape and domestic violence rates from the NCVS

    Yu, Qingzhao; Stasny, Elizabeth A.; Li, Bin
    It is difficult to accurately estimate the rates of rape and domestic violence due to the sensitive nature of these crimes. There is evidence that bias in estimating the crime rates from survey data may arise because some women respondents are “gagged” in reporting some types of crimes by the use of a telephone rather than a personal interview, and by the presence of a spouse during the interview. On the other hand, as data on these crimes are collected every year, it would be more efficient in data analysis if we could identify and make use of information from previous data. In this paper we propose a model to...

  6. A study of pre-validation

    Höfling, Holger; Tibshirani, Robert
    Given a predictor of outcome derived from a high-dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows one to compare a newly derived predictor for disease outcome to standard clinical predictors on the same dataset. We study pre-validation analytically to determine if the inferences drawn from it are valid. We show that while pre-validation generally works well, the straightforward “one degree of freedom” analytical test from pre-validation can be biased and we propose a permutation test to remedy this problem. In simulation studies, we show that the permutation test has the nominal level and achieves roughly the same...

  7. On spatial extremes: With application to a rainfall problem

    Buishand, T. A.; de Haan, L.; Zhou, C.
    We consider daily rainfall observations at 32 stations in the province of North Holland (the Netherlands) during 30 years. Let T be the total rainfall in this area on one day. An important question is: what is the amount of rainfall T that is exceeded once in 100 years? This is clearly a problem belonging to extreme value theory. Also, it is a genuinely spatial problem. ¶ Recently, a theory of extremes of continuous stochastic processes has been developed. Using the ideas of that theory and much computer power (simulations), we have been able to come up with a reasonable answer to the question above.

  8. Forecasting time series of inhomogeneous Poisson processes with application to call center workforce management

    Shen, Haipeng; Huang, Jianhua Z.
    We consider forecasting the latent rate profiles of a time series of inhomogeneous Poisson processes. The work is motivated by operations management of queueing systems, in particular, telephone call centers, where accurate forecasting of call arrival rates is a crucial primitive for efficient staffing of such centers. Our forecasting approach utilizes dimension reduction through a factor analysis of Poisson variables, followed by time series modeling of factor score series. Time series forecasts of factor scores are combined with factor loadings to yield forecasts of future Poisson rate profiles. Penalized Poisson regressions on factor loadings guided by time series forecasts of factor scores are used to generate dynamic within-process rate updating. Methods are...

  9. Multi-center clinical trials: Randomization and ancillary statistics

    Zheng, Lu; Zelen, Marvin
    The purpose of this paper is to investigate and develop methods for analysis of multi-center randomized clinical trials which only rely on the randomization process as a basis of inference. Our motivation is prompted by the fact that most current statistical procedures used in the analysis of randomized multi-center studies are model based. The randomization feature of the trials is usually ignored. An important characteristic of model based analysis is that it is straightforward to model covariates. Nevertheless, in nearly all model based analyses, the effects due to different centers and, in general, the design of the clinical trials are ignored. An alternative to a model based analysis is to have...

  10. Conservative statistical post-election audits

    Stark, Philip B.
    There are many sources of error in counting votes: the apparent winner might not be the rightful winner. Hand tallies of the votes in a random sample of precincts can be used to test the hypothesis that a full manual recount would find a different outcome. This paper develops a conservative sequential test based on the vote-counting errors found in a hand tally of a simple or stratified random sample of precincts. The procedure includes a natural escalation: If the hypothesis that the apparent outcome is incorrect is not rejected at stage s, more precincts are audited. Eventually, either the hypothesis is rejected—and the apparent outcome is confirmed—or all precincts have...

  11. Should the Democrats move to the left on economic policy?

    Gelman, Andrew; Cai, Cexun Jeffrey
    Could John Kerry have gained votes in the 2004 Presidential election by more clearly distinguishing himself from George Bush on economic policy? At first thought, the logic of political preferences would suggest not: the Republicans are to the right of most Americans on economic policy, and so in a one-dimensional space with party positions measured with no error, the optimal strategy for the Democrats would be to stand infinitesimally to the left of the Republicans. The median voter theorem suggests that each party should keep its policy positions just barely distinguishable from the opposition. ¶ In a multidimensional setting, however, or when voters vary in their perceptions of the parties’ positions,...

  12. Stochastic modeling in nanoscale biophysics: Subdiffusion within proteins

    Kou, S. C.
    Advances in nanotechnology have allowed scientists to study biological processes on an unprecedented nanoscale molecule-by-molecule basis, opening the door to addressing many important biological problems. A phenomenon observed in recent nanoscale single-molecule biophysics experiments is subdiffusion, which largely departs from the classical Brownian diffusion theory. In this paper, by incorporating fractional Gaussian noise into the generalized Langevin equation, we formulate a model to describe subdiffusion. We conduct a detailed analysis of the model, including (i) a spectral analysis of the stochastic integro-differential equations introduced in the model and (ii) a microscopic derivation of the model from a system of interacting particles. In addition to its analytical tractability and clear physical underpinning, the model is...

  13. Rejoinder of: Treelets—An adaptive multi-scale basis for spare unordered data

    Lee, Ann B.; Nadler, Boaz; Wasserman, Larry

  14. Discussion of: Treelets—An adaptive multi-scale basis for sparse unordered data

    Tuglus, Catherine; van der Laan, Mark J.
    We would like to congratulate Lee, Nadler and Wasserman on their contribution to clustering and data reduction methods for high p and low n situations. A composite of clustering and traditional principal components analysis, treelets is an innovative method for multi-resolution analysis of unordered data. It is an improvement over traditional PCA and an important contribution to clustering methodology. Their paper presents theory and supporting applications addressing the two main goals of the treelet method: (1) Uncover the underlying structure of the data and (2) Data reduction prior to statistical learning methods. We will organize our discussion into two main parts to address their methodology in terms of each of these two...

  15. Discussion of: Treelets—An adaptive multi-scale basis for sparse unordered data

    Qiu, Xing
    This is a discussion of paper “Treelets—An adaptive multi-scale basis for sparse unordered data” by Ann B. Lee, Boaz Nadler and Larry Wasserman. In this paper the authors defined a new type of dimension reduction algorithm, namely, the treelet algorithm. The treelet method has the merit of being completely data driven, and its decomposition is easier to interpret as compared to PCR. It is suitable in some certain situations, but it also has its own limitations. I will discuss both the strength and the weakness of this method when applied to microarray data analysis.

  16. Discussion of: Treelets—An adaptive multi-scale basis for sparse unordered data

    Tibshirani, Robert

  17. Discussion of: Treelets—An adaptive multi-scale basis for sparse unordered data

    Meinshausen, Nicolai; Bühlmann, Peter
    We congratulate Lee, Nadler and Wasserman (henceforth LNW) on a very interesting paper on new methodology and supporting theory. Treelets seem to tackle two important problems of modern data analysis at once. For datasets with many variables, treelets give powerful predictions even if variables are highly correlated and redundant. Maybe more importantly, interpretation of the results is intuitive. Useful insights about relevant groups of variables can be gained. ¶ Our comments and questions include: (i) Could the success of treelets be replicated by a combination of hierarchical clustering and PCA? (ii) When choosing a suitable basis, treelets seem to be largely an unsupervised method. Could the results be even more interpretable and...

  18. Discussion of: Treelets—An adaptive multi-scale basis for sparse unordered data

    Bickel, Peter J.; Ritov, Ya’acov

  19. Discussion of: Treelets—An adaptive multi-scale basis for sparse unordered data

    Murtagh, Fionn

  20. Treelets—An adaptive multi-scale basis for sparse unordered data

    Lee, Ann B.; Nadler, Boaz; Wasserman, Larry
    In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered—with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this paper we present treelets—a novel construction of multi-scale bases that extends wavelets to nonsmooth signals. The method is fully adaptive, as it returns a hierarchical tree and an orthonormal basis which both reflect the internal structure of the data. Treelets are especially well-suited as a dimensionality reduction and feature selection tool prior...

Aviso de cookies: Usamos cookies propias y de terceros para mejorar nuestros servicios, para análisis estadístico y para mostrarle publicidad. Si continua navegando consideramos que acepta su uso en los términos establecidos en la Política de cookies.