Recursos de colección

Caltech Authors (147.820 recursos)

Repository of works by Caltech published authors.

Group = Parallel and Distributed Systems Group

Mostrando recursos 1 - 20 de 133

  1. Noise and Uncertainty in String-Duplication Systems

    Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
    Duplication mutations play a critical role in the generation of biological sequences. Simultaneously, they have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into account the presence of other types of mutations. In this work, we consider the capacity of duplication mutations in the presence of point-mutation noise, and so quantify the generation power of these mutations. We show that if the number of point mutations is vanishingly small compared to the number of duplication mutations...

  2. Duplication Distance to the Root for Binary Sequences

    Alon, Noga; Bruck, Jehoshua; Farnoud, Farzad; Jain, Siddharth
    We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form x = abc → y = abbc, where x and y are sequences and a, b, and c are their substrings, needed to generate a binary sequence of length n starting from a square-free sequence from the set {0, 1, 01, 10, 010, 101}. This problem is a restricted case of finding the duplication/deduplication distance between two sequences, defined as the minimum number of duplication and deduplication operations required to transform one sequence to the other....

  3. Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms

    Jain, Siddharth; Farnoud, Farzad; Schwartz, Moshe; Bruck, Jehoshua
    The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original. In particular, we present two families of codes for...

  4. Secure RAID Schemes for Distributed Storage

    Huang, Wentao; Bruck, Jehoshua
    We propose secure RAID, i.e., low-complexity schemes to store information in a distributed manner that is resilient to node failures and resistant to node eavesdropping. We generalize the concept of systematic encoding to secure RAID and show that systematic schemes have significant advantages in the efficiencies of encoding, decoding and random access. For the practical high rate regime, we construct three XOR-based systematic secure RAID schemes with optimal or almost optimal encoding and decoding complexities, from the EVENODD codes and B codes, which are array codes widely used in the RAID architecture. The schemes can tolerate up to two node failures...

  5. Communication Efficient Secret Sharing

    Huang, Wentao; Langberg, Michael; Kliewer, Joerg; Bruck, Jehoshua
    A secret sharing scheme is a method to store information securely and reliably. Particularly, in the threshold secret sharing scheme (due to Shamir), a secret is divided into shares, encoded and distributed to parties, such that any large enough collection of parties can decode the secret, and a smaller (then threshold) set of parties cannot collude to deduce any information about the secret. While Shamir’s scheme was studied for more than 35 years, the question of minimizing its communication bandwidth was not considered. Specifically, assume that a user (or a collection of parties) wishes to decode the secret by receiving information from a...

  6. A Stochastic Model for Genomic Interspersed Duplication

    Farnoud, Farzad; Schwartz, Moshe; Bruck, Jehoshua
    Mutation processes such as point mutation, insertion, deletion, and duplication (including tandem and interspersed duplication) have an important role in evolution, as they lead to genomic diversity, and thus to phenotypic variation. In this work, we study the expressive power of interspersed duplication, i.e., its ability to generate diversity, via a simple but fundamental stochastic model, where the length and the location of the subsequence that is duplicated and the point of insertion of the copy are chosen randomly. In contrast to combinatorial models, where the goal is to determine the set of possible outcomes regardless of their likelihood, in stochastic systems, we investigate...

  7. Rewriting Flash Memories by Message Passing

    En Gad, Eyal; Huang, Wentao; Li, Yue; Bruck, Jehoshua
    This paper constructs WOM codes that combine rewriting and error correction for mitigating the reliability and the endurance problems in flash memory.We consider a rewriting model that is of practical interest to flash applications where only the second write uses WOM codes. Our WOM code construction is based on binary erasure quantization with LDGM codes, where the rewriting uses message passing and has potential to share the efficient hardware implementations with LDPC codes in practice. We show that the coding scheme achieves the capacity of the rewriting model. Extensive simulations show that the rewriting performance of our scheme compares favorably with that...

  8. Capacity and Expressiveness of Genomic Tandem Duplication

    Jain, Siddharth; Farnoud, Farzad; Bruck, Jehoshua
    The majority of the human genome consists of repeated sequences. An important type of repeats common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTCTGTGC, TGTG is a tandem repeat, namely, it was generated from AGTCTGC by tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a small initial string (called the seed) by tandem duplication of length bounded by a constant. Our results include exact capacity values for certain tandem duplication string systems with alphabet sizes 2;...

  9. Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes

    En Gad, Eyal; Li, Yue; Kliewer, Joerg; Langberg, Michael; Jiang, Anxiao (Andrew); Bruck, Jehoshua
    We propose efficient coding schemes for two communication settings: 1. asymmetric channels, and 2. channels with an informed encoder. These settings are important in non-volatile memories, as well as optical and broadcast communication. The schemes are based on non-linear polar codes, and they build on and improve recent work on these settings. In asymmetric channels, we tackle the exponential storage requirement of previously known schemes, that resulted from the use of large Boolean functions. We propose an improved scheme, that achieves the capacity of asymmetric channels with polynomial computational complexity and storage requirement. The proposed non-linear scheme is then generalized to the setting of channel coding...

  10. Systematic Error-Correcting Codes for Rank Modulation

    Zhou, Hongchao; Schwartz, Moshe; Jiang, Anxiao (Andrew); Bruck, Jehoshua
    The rank-modulation scheme has been recently proposed for efficiently storing data in nonvolatile memories. In this paper, we explore [n, k, d] systematic error-correcting codes for rank modulation. Such codes have length n, k information symbols, and minimum distance d. Systematic codes have the benefits of enabling efficient information retrieval in conjunction with memory-scrubbing schemes. We study systematic codes for rank modulation under Kendall's T-metric as well as under the ℓ∞-metric. In Kendall's T-metric, we present [k + 2, k, 3] systematic codes for correcting a single error, which have optimal rates, unless systematic perfect codes exist. We also study...

  11. The Capacity of String-Replication Systems

    Farnoud, Farzad; Schwartz, Moshe; Bruck, Jehoshua
    It is known that the majority of the human genome consists of repeated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from repeated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence and simple replication rules, including those resembling genomic replication processes. In other words, our goal is to find out the capacity, or the expressive power, of these string-replication systems. Our results include exact capacities, and bounds on the capacities, of four fundamental string-replication...

  12. Rate-Distortion for Ranking with Incomplete Information

    Farnoud, Farzad; Schwartz, Moshe; Bruck, Jehoshua
    We study the rate-distortion relationship in the set of permutations endowed with the Kendall t-metric and the Chebyshev metric. Our study is motivated by the application of permutation rate-distortion to the average-case and worst-case analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall t-metric we provide bounds for small, medium, and large distortion regimes, while for the Chebyshev metric we present bounds that are valid for all distortions and are especially accurate for small distortions. In addition, for the Chebyshev metric, we provide a construction for covering codes.

  13. Sequence Reconstruction for Grassmann Graphs and Permutations

    Yaakobi, Eitan; Schwartz, Moshe; Langberg, Michael; Bruck, Jehoshua
    The sequence-reconstruction problem was first proposed by Levenshtein in 2001. This problem studies the model where the same word is transmitted over multiple channels. If the transmitted word belongs to some code of minimum distance d and there are at most r errors in every channel, then the minimum number of channels that guarantees a successful decoder (under the assumption that all channel outputs are distinct) has to be greater than the largest intersection of two balls of radius r and with distance at least d between their centers. This paper studies the combinatorial problem of computing the largest intersection of two balls for two cases. In the first part we...

  14. Error-Correcting Codes for Multipermutations

    Buzaglo, Sarit; Yaakobi, Eitan; Etzion, Tuvi; Bruck, Jehoshua
    THIS PAPER IS ELIGIBLE FOR THE STUDENT PAPER AWARD. Multipermutations appear in various applications in information theory. New applications such as rank modulation for flash memories and voting have suggested the need to consider error-correcting codes for multipermutations. The construction of codes is challenging when permutations are considered and it becomes even a harder problem for multipermutations. In this paper we discuss the general problem of error-correcting codes for multipermutations. We present some tight bounds on the size of error-correcting codes for several families of multipermutations. We find the capacity of the channels of multipermutations and characterize families of perfect codes in this metric which we believe are the only such perfect...

  15. Building Consensus via Iterative Voting

    Farnoud, Farzad; Yaakobi, Eitan; Touri, Behrouz; Milenkovic, Olgica; Bruck, Jehoshua
    In networked systems comprised of many agents, it is often required to reach a common operating point of all agents, termed the network consensus. We consider two iterative methods for reaching a ranking (ordering) consensus over a voter network, where the initial preference of every voter is of the form of a full ordering of candidates. The voters are allowed, one at a time and based on some random scheme, to change their vote to bring them “closer” to the opinions of selected subsets of peers. The first consensus method is based on changing votes one adjacent swap at a time; the second method is based on changing a vote...

  16. Information-Theoretic Study of Voting Systems

    Yaakobi, Eitan; Langberg, Michael; Bruck, Jehoshua
    The typical paradigm in voting theory involves n voters and m candidates. Every voter ranks the candidates resulting in a permutation of the m candidates. A key problem is to derive the aggregate result of the voting. A popular method for vote aggregation is based on the Condorcet criterion. The Condorcet winner is the candidate who wins every other candidate by pairwise majority. However, the main disadvantage of this approach, known as the Condorcet paradox, is that such a winner does not necessarily exist since this criterion does not admit transitivity. This paradox is mathematically likely (if voters assign rankings uniformly at random, then with probability approaching one with the number...

  17. In-Memory Computing of Akers Logic Array

    Yaakobi, Eitan; Jiang, Anxiao (Andrew); Bruck, Jehoshua
    This work studies memories from a different perspective, while the goal is to explore the concept of in-memory computing. Our point of departure is an old study of logic arrays by Akers in 1972. We demonstrate how these arrays can simultaneously store information and perform logic operations. We first extend the structure of these arrays for non-binary alphabets. We then show how a special structure of these arrays can both store elements and output a sorted version of them. We also study other examples of the in-memory computing concept. In this setup, it is shown how information can be stored and computed with, and the array can tolerate or detect...

  18. Codes for Network Switches

    Wang, Zhiying; Shaked, Omer; Cassuto, Yuval; Bruck, Jehoshua
    A network switch routes data packets between its multiple input and output ports. Packets from input ports are stored upon arrival in a switch fabric comprising multiple memory banks. This can result in memory contention when distinct output ports request packets from the same memory bank, resulting in a degraded switching bandwidth. To solve this problem, we propose to add redundant memory banks for storing the incoming packets. The problem we address is how to minimize the number of redundant memory banks given some guaranteed contention resolution capability. We present constructions of new switch memory architectures based on different coding techniques. The codes allow decreasing the redundancy by 1/2 or 2/3, depending...

  19. Rank-Modulation Rewriting Codes for Flash Memories

    En Gad, Eyal; Yaakobi, Eitan; Jiang, Anxiao (Andrew); Bruck, Jehoshua
    Current flash memory technology is focused on cost minimization of the stored capacity. However, the resulting approach supports a relatively small number of write-erase cycles. This technology is effective for consumer devices (smartphones and cameras) where the number of write-erase cycles is small, however, it is not economical for enterprise storage systems that require a large number of lifetime writes. Our proposed approach for alleviating this problem consists of the efficient integration of two key ideas: (i) improving reliability and endurance by representing the information using relative values via the rank modulation scheme and (ii) increasing the overall (lifetime) capacity of the flash device via rewriting codes, namely, performing multiple writes per...

  20. Long MDS Codes for Optimal Repair Bandwidth

    Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
    MDS codes are erasure-correcting codes that can correct the maximum number of erasures given the number of redundancy or parity symbols. If an MDS code has r parities and no more than r erasures occur, then by transmitting all the remaining data in the code one can recover the original information. However, it was shown that in order to recover a single symbol erasure, only a fraction of 1/r of the information needs to be transmitted. This fraction is called the repair bandwidth (fraction). Explicit code constructions were given in previous works. If we view each symbol in the code as a vector or a column, then the code forms...

Aviso de cookies: Usamos cookies propias y de terceros para mejorar nuestros servicios, para análisis estadístico y para mostrarle publicidad. Si continua navegando consideramos que acepta su uso en los términos establecidos en la Política de cookies.