Recursos de colección
Caltech Authors (142.336 recursos)
Repository of works by Caltech published authors.
Status = Submitted
Repository of works by Caltech published authors.
Status = Submitted
Ludlam, R. M.; Miller, J. M.; Cackett, E. C.; Degenaar, N.; Bostrom, A. C.
We perform the first reflection study of the soft X-ray transient and Type 1
burst source XTE J1709-267 using NuSTAR observations during its 2016 June
outburst. There was an increase in flux near the end of the observations, which
corresponds to an increase from $\sim$0.04 L$_{\mathrm{Edd}}$ to $\sim$0.06
L$_{\mathrm{Edd}}$ assuming a distance of 8.5 kpc. We have separately examined
spectra from the low and high flux intervals, which were soft and show evidence
of a broad Fe K line. Fits to these intervals with relativistic disk reflection
models have revealed an inner disk radius of $13.8_{-1.8}^{+3.0}\ R_{g}$ (where
$R_{g} = GM/c^{2}$) for the low flux spectrum and $23.4_{-5.4}^{+15.6}\...
Matsuoka, Kenta; Ueda, Yoshihiro
We investigate the nature of far-infrared (70 um) and hard X-ray (3-24 keV)
selected galaxies in the COSMOS field detected with both Spitzer and Nuclear
Spectroscopic Telescope Array (NuSTAR). By matching the Spitzer-COSMOS catalog
against the NuSTAR-COSMOS catalog, we obtain a sample consisting of a
hyperluminous infrared galaxy with log(L_IR/L_sun) > 13, 12 ultraluminous
infrared galaxies with 12 < log(L_IR/L_sun) < 13, and 10 luminous infrared
galaxies with 11 < log(L_IR/L_sun) < 12, i.e., 23 Hy/U/LIRGs in total. Using
their X-ray hardness ratios, we find that 12 sources are obscured active
galactic nuclei (AGNs) with absorption column densities of N_H > 10^22 cm^-2,
including several Compton-thick (N_H ~ 10^24...
Hensinger, W. K.; Utami, D. W.; Goan, H.-S.; Schwab, K.; Monroe, C.; Milburn, G. J.
An enduring challenge for contemporary physics is to experimentally observe and control quantum behavior in macroscopic systems. We show that a single trapped atomic ion could be used to probe the quantum nature of a mesoscopic mechanical oscillator precooled to 4K, and furthermore, to cool the oscillator with high efficiency to its quantum ground state. The proposed experiment could be performed using currently available technology.
Pachter, Lior
RNA-Seq is rapidly becoming the standard technology for transcriptome analysis. Fundamental to many of the applications of RNA-Seq is the quantification problem, which is the accurate measurement of relative transcript abundances from the sequenced reads. We focus on this problem, and review many recently published models that are used to estimate the relative abundances. In addition to describing the models and the different approaches to inference, we also explain how methods are related to each other. A key result is that we show how inference with many of the models results in identical estimates of relative abundances, even though model...
Tambe, Akshay; Doudna, Jennifer; Pachter, Lior
In a recent paper Siegfried et al. published a new sequence-based structural RNA assay that utilizes mutational profiling to detect base pairing (MaP). Output from MaP provides information about both pairing (via reactivities) and contact (via correlations). Reactivities can be coupled to partition function folding models for structural inference, while correlations can reveal pairs of sites that may be in structural proximity. The possibility for inference of 3D contacts via MaP suggests a novel approach to structural prediction for RNA analogous to covariance structural prediction for proteins. We explore this approach and show that partial correlation analysis outperforms na\"ive correlation...
Bray, Nicolas; Pachter, Lior
Ward and Kellis (Reports, September 5 2012) identify regulatory regions in the human genome exhibiting lineage-specific constraint and estimate the extent of purifying selection. There is no statistical rationale for the examples they highlight, and their estimates of the fraction of the genome under constraint are biased by arbitrary designations of completely constrained regions.
Guigó, Roderic; Birney, Ewan; Brent, Michael; Dermitzakis, Emmanouil; Pachter, Lior; Crollius, Hugues Roest; Solovyev, Victor; Zhang, Michael Q.
With the sponsorship of "Fundacio La Caixa" we met in Barcelona, November 21st and 22nd, to analyze the reasons why, after the completion of the human genome sequence, the identification all protein coding genes and their variants remains a distant goal. Here we report on our discussions and summarize some of the major challenges that need to be overcome in order to complete the human gene catalog.
Schwartz, Ariel S.; Myers, Eugene W.; Pachter, Lior
We propose a metric for the space of multiple sequence alignments that can be used to compare two alignments to each other. In the case where one of the alignments is a reference alignment, the resulting accuracy measure improves upon previous approaches, and provides a balanced assessment of the fidelity of both matches and gaps. Furthermore, in the case where a reference alignment is not available, we provide empirical evidence that the distance from an alignment produced by one program to predicted alignments from other programs can be used as a control for multiple alignment experiments. In particular, we show...
Levy, Dan; Yoshida, Ruriko; Pachter, Lior
The Neighbor-Joining algorithm is a recursive procedure for reconstructing trees that is based on a transformation of pairwise distances between leaves. We present a generalization of the neighbor-joining transformation, which uses estimates of phylogenetic diversity rather than pairwise distances in the tree. This leads to an improved neighbor-joining algorithm whose total running time is still polynomial in the number of taxa. On simulated data, the method outperforms other distance-based methods.
We have implemented neighbor-joining for subtree weights in a
program called MJOIN which is freely available under the Gnu Public License at http://bio.math.berkeley.edu/mjoin/
McAuliffe, Jon D.; Jordan, Michael I.; Pachter, Lior
Sequence comparison across multiple organisms aids in the detection of regions under selection. However, resource limitations require a prioritization of genomes to be sequenced. This prioritization should be grounded in two considerations: the lineal scope encompassing the biological phenomena of interest, and the optimal species within that scope for detecting functional elements. We introduce a statistical framework for optimal species subset selection, based on maximizing power to detect conserved sites. Analysis of a phylogenetic star topology shows theoretically that the optimal species subset is not in general the most evolutionarily diverged subset. We then demonstrate this finding empirically in a...
Pachter, Lior; Sturmfels, Bernd
One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of...
Pachter, L.; Speyer, D.
The tree-metric theorem provides a necessary and sufficient condition for a dissimilarity matrix to be a tree metric, and has served as the foundation for numerous distance-based reconstruction methods in phylogenetics. Our main result is an extension of the tree-metric theorem to more general dissimilarity maps. In particular, we show that a tree with n leaves is reconstructible from the weights of the m-leaf subtrees provided that n ≥ 2m - 1.
Bray, Nicolas; Pachter, Lior
We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein-based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region, which consists of 1.8...
Pachter, Lior; Sturmfels, Bernd
This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model...
Morton, Jason; Pachter, Lior; Shiu, Anne; Sturmfels, Bernd
The problem of finding periodically expressed genes from time course microarray experiments is at the center of numerous efforts to identify the molecular components of biological clocks. We present a new approach to this problem based on the cyclohedron test, which is a rank test inspired by recent advances in algebraic combinatorics. The test has the advantage of being robust to measurement errors, and can be used to ascertain the significance of top-ranked genes. We apply the test to recently published measurements of gene expression during mouse somitogenesis and find 32 genes that collectively are significant. Among these are previously...
Fu, Audrey Qiuyan; Pachter, Lior
Gene expression is stochastic and displays variation ("noise") both within and between cells. Intracellular (intrinsic) variance can be distinguished from extracellular (extrinsic) variance by applying the law of total variance to data from two-reporter assays that probe expression of identical gene pairs in single-cells. We examine established formulas for the estimation of intrinsic and extrinsic noise and provide interpretations of them in terms of a hierarchical model. This allows us to derive corrections that minimize the mean squared error, an objective that may be important when sample sizes are small. The statistical framework also highlights the need for quantile normalization,...
Huggins, Peter; Pachter, Lior
Polyhedral geometry can be used to quantitatively assess the dependence of rankings on personal preference, and provides a tool for both students and universities to assess US News and World Report rankings.
Eickmeyer, Kord; Huggins, Peter; Pachter, Lior; Yoshida, Ruriko
The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this point of view, NJ is "optimal" when the algorithm outputs the tree which minimizes the balanced minimum evolution criterion. We use the fact that the NJ tree topology and the BME tree topology are determined by polyhedral subdivisions of the spaces of dissimilarity maps ℛ^(^n _2)_+ to study the optimality of the neighbor-joining algorithm. In particular, we investigate and compare the polyhedral subdivisions for n ≤ 8. This requires the measurement of volumes...
Pimentel, Harold; Conboy, John G.; Pachter, Lior
We present a tool, keepme around (kma), a suite of python scripts and an R package that finds retained introns in RNA-Seq experiments and incorporates biological replicates to reduce the number of false positives when detecting retention events. kma uses the results of existing quantification tools that probabilistically assign multi-mapping reads, thus interfacing easily with transcript quantification pipelines. The data is represented in a convenient, database style format that allows for easy aggregation across introns, genes, samples, and conditions to allow for further exploratory analysis.
Schaeffer, Lorian; Pimentel, Harold; Bray, Nicolas; Melsted, Páll; Pachter, Lior
We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data. In particular, we show that the recent idea of pseudoalignment introduced in the RNA-Seq context is suitable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software.