Recursos de colección
Caltech Authors (143.226 recursos)
Repository of works by Caltech published authors.
Group = Parallel and Distributed Systems Group
Repository of works by Caltech published authors.
Group = Parallel and Distributed Systems Group
Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Duplication mutations play a critical role in the
generation of biological sequences. Simultaneously, they
have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into account the presence of other types of mutations. In this work, we consider the capacity of duplication mutations in the presence of point-mutation
noise, and so quantify the generation power of these mutations. We show that if the number of point mutations is vanishingly small compared to the number of duplication mutations...
Alon, Noga; Bruck, Jehoshua; Farnoud, Farzad; Jain, Siddharth
We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form x = abc → y = abbc, where
x and y are sequences and a, b, and c are their substrings, needed to generate a binary sequence of length
n starting from a square-free sequence from the set
{0, 1, 01, 10, 010, 101}. This problem is a restricted case of finding the duplication/deduplication
distance between two sequences, defined as the minimum number of duplication and deduplication
operations required to transform one sequence to the other....
Jain, Siddharth; Farnoud, Farzad; Schwartz, Moshe; Bruck, Jehoshua
The ability to store data in the DNA of a living
organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next
to the original. In particular, we present two families of codes for...
Huang, Wentao; Bruck, Jehoshua
We propose secure RAID, i.e., low-complexity schemes to store information in a distributed manner that is resilient to node failures and resistant to node eavesdropping. We generalize the concept of systematic encoding to secure RAID and show that systematic schemes have significant advantages in the efficiencies of encoding, decoding and random access. For the practical high rate regime, we construct three XOR-based systematic secure RAID schemes with optimal or almost optimal encoding and
decoding complexities, from the EVENODD codes and B codes, which are array codes widely used in the RAID architecture. The schemes can tolerate up to two node failures...
Huang, Wentao; Langberg, Michael; Kliewer, Joerg; Bruck, Jehoshua
A secret sharing scheme is a method to store information securely and reliably. Particularly, in the threshold secret sharing scheme (due to Shamir), a secret is divided
into shares, encoded and distributed to parties, such that any large enough collection of parties can decode the secret, and a smaller (then threshold) set of parties cannot
collude to deduce any information about the secret. While Shamir’s scheme was studied for more than 35 years, the question of minimizing its communication bandwidth was
not considered. Specifically, assume that a user (or a collection of parties) wishes to decode the secret by receiving information from a...
Farnoud, Farzad; Schwartz, Moshe; Bruck, Jehoshua
Mutation processes such as point mutation, insertion,
deletion, and duplication (including tandem and interspersed
duplication) have an important role in evolution, as
they lead to genomic diversity, and thus to phenotypic variation. In this work, we study the expressive power of interspersed duplication, i.e., its ability to generate diversity, via a simple but fundamental stochastic model, where the length and the location of the subsequence that is duplicated and the point of insertion of the copy are chosen randomly. In contrast to combinatorial models, where the goal is to determine the set of possible outcomes regardless of their likelihood, in stochastic
systems, we investigate...
En Gad, Eyal; Huang, Wentao; Li, Yue; Bruck, Jehoshua
This paper constructs WOM codes that combine
rewriting and error correction for mitigating the reliability and the endurance problems in flash memory.We consider a rewriting model that is of practical interest to flash applications where only the second write uses WOM codes. Our WOM code construction is based on binary erasure quantization with LDGM codes, where the rewriting uses message passing and has potential to share the
efficient hardware implementations with LDPC codes in practice. We show that the coding scheme achieves the capacity of the rewriting model. Extensive simulations show that the rewriting performance of our scheme compares favorably with that...
Jain, Siddharth; Farnoud, Farzad; Bruck, Jehoshua
The majority of the human genome consists of repeated sequences. An important type of repeats common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTCTGTGC, TGTG is a tandem repeat, namely, it was generated from AGTCTGC by tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a small initial string (called
the seed) by tandem duplication of length bounded by a constant. Our results include exact capacity values for certain tandem duplication string systems with alphabet sizes 2;...
En Gad, Eyal; Li, Yue; Kliewer, Joerg; Langberg, Michael; Jiang, Anxiao (Andrew); Bruck, Jehoshua
We propose efficient coding schemes for two communication settings: 1. asymmetric channels, and 2. channels
with an informed encoder. These settings are important in non-volatile memories, as well as optical and broadcast
communication. The schemes are based on non-linear polar codes, and they build on and improve recent work
on these settings. In asymmetric channels, we tackle the exponential storage requirement of previously known
schemes, that resulted from the use of large Boolean functions. We propose an improved scheme, that achieves the
capacity of asymmetric channels with polynomial computational complexity and storage requirement.
The proposed non-linear scheme is then generalized to the setting of channel coding...
Zhou, Hongchao; Schwartz, Moshe; Jiang, Anxiao (Andrew); Bruck, Jehoshua
The rank-modulation scheme has been recently proposed for efficiently storing data in nonvolatile memories. In this paper, we explore [n, k, d] systematic error-correcting codes for rank modulation. Such codes have length n, k information symbols, and minimum distance d. Systematic codes have the benefits of enabling efficient information retrieval in conjunction with memory-scrubbing schemes. We study systematic codes for rank modulation under Kendall's T-metric as well as under the ℓ∞-metric. In Kendall's T-metric, we present [k + 2, k, 3] systematic codes for correcting a single error, which have optimal rates, unless systematic perfect codes exist. We also study...
Farnoud, Farzad; Schwartz, Moshe; Bruck, Jehoshua
It is known that the majority of the human genome
consists of repeated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from repeated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence and simple replication rules, including those resembling genomic replication processes. In other words, our goal is to find out the capacity, or the expressive power, of these string-replication
systems. Our results include exact capacities, and
bounds on the capacities, of four fundamental string-replication...
Farnoud, Farzad; Schwartz, Moshe; Bruck, Jehoshua
We study the rate-distortion relationship in the set
of permutations endowed with the Kendall t-metric and the
Chebyshev metric. Our study is motivated by the application of permutation rate-distortion to the average-case and worst-case analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall t-metric we provide bounds for small, medium, and large distortion regimes, while for the Chebyshev metric we present bounds that are valid for all distortions and are especially accurate for small
distortions. In addition, for the Chebyshev metric, we provide a construction for covering codes.
Yaakobi, Eitan; Schwartz, Moshe; Langberg, Michael; Bruck, Jehoshua
The sequence-reconstruction problem was first proposed
by Levenshtein in 2001. This problem studies the model
where the same word is transmitted over multiple channels. If
the transmitted word belongs to some code of minimum distance
d and there are at most r errors in every channel, then the minimum
number of channels that guarantees a successful decoder
(under the assumption that all channel outputs are distinct) has
to be greater than the largest intersection of two balls of radius
r and with distance at least d between their centers.
This paper studies the combinatorial problem of computing
the largest intersection of two balls for two cases. In the first
part we...
Buzaglo, Sarit; Yaakobi, Eitan; Etzion, Tuvi; Bruck, Jehoshua
THIS PAPER IS ELIGIBLE FOR THE STUDENT
PAPER AWARD. Multipermutations appear in various applications
in information theory. New applications such as rank
modulation for flash memories and voting have suggested the need
to consider error-correcting codes for multipermutations. The construction
of codes is challenging when permutations are considered
and it becomes even a harder problem for multipermutations. In
this paper we discuss the general problem of error-correcting codes
for multipermutations. We present some tight bounds on the size of
error-correcting codes for several families of multipermutations.
We find the capacity of the channels of multipermutations and
characterize families of perfect codes in this metric which we
believe are the only such perfect...
Farnoud, Farzad; Yaakobi, Eitan; Touri, Behrouz; Milenkovic, Olgica; Bruck, Jehoshua
In networked systems comprised of
many agents, it is often required to reach a common operating
point of all agents, termed the network consensus. We consider
two iterative methods for reaching a ranking (ordering) consensus
over a voter network, where the initial preference of
every voter is of the form of a full ordering of candidates. The
voters are allowed, one at a time and based on some random
scheme, to change their vote to bring them “closer” to the opinions
of selected subsets of peers. The first consensus method is
based on changing votes one adjacent swap at a time; the second
method is based on changing a vote...
Yaakobi, Eitan; Langberg, Michael; Bruck, Jehoshua
The typical paradigm in voting theory involves n
voters and m candidates. Every voter ranks the candidates resulting
in a permutation of the m candidates. A key problem is
to derive the aggregate result of the voting. A popular method
for vote aggregation is based on the Condorcet criterion. The
Condorcet winner is the candidate who wins every other candidate
by pairwise majority. However, the main disadvantage of
this approach, known as the Condorcet paradox, is that such a
winner does not necessarily exist since this criterion does not admit
transitivity. This paradox is mathematically likely (if voters
assign rankings uniformly at random, then with probability approaching
one with the number...
Yaakobi, Eitan; Jiang, Anxiao (Andrew); Bruck, Jehoshua
This work studies memories from a different perspective,
while the goal is to explore the concept of in-memory
computing. Our point of departure is an old study of logic arrays
by Akers in 1972. We demonstrate how these arrays can
simultaneously store information and perform logic operations.
We first extend the structure of these arrays for non-binary
alphabets. We then show how a special structure of these arrays
can both store elements and output a sorted version of them. We
also study other examples of the in-memory computing concept.
In this setup, it is shown how information can be stored and
computed with, and the array can tolerate or detect...
Wang, Zhiying; Shaked, Omer; Cassuto, Yuval; Bruck, Jehoshua
A network switch routes data packets between its
multiple input and output ports. Packets from input ports are
stored upon arrival in a switch fabric comprising multiple
memory banks. This can result in memory contention when
distinct output ports request packets from the same memory
bank, resulting in a degraded switching bandwidth. To solve this
problem, we propose to add redundant memory banks for storing
the incoming packets. The problem we address is how to minimize
the number of redundant memory banks given some guaranteed
contention resolution capability. We present constructions of
new switch memory architectures based on different coding
techniques. The codes allow decreasing the redundancy by 1/2
or 2/3, depending...
En Gad, Eyal; Yaakobi, Eitan; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Current flash memory technology is focused on
cost minimization of the stored capacity. However, the resulting
approach supports a relatively small number of write-erase
cycles. This technology is effective for consumer devices (smartphones
and cameras) where the number of write-erase cycles is
small, however, it is not economical for enterprise storage systems
that require a large number of lifetime writes.
Our proposed approach for alleviating this problem consists of
the efficient integration of two key ideas: (i) improving reliability
and endurance by representing the information using relative
values via the rank modulation scheme and (ii) increasing the
overall (lifetime) capacity of the flash device via rewriting codes,
namely, performing multiple writes per...
Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
MDS codes are erasure-correcting codes that can
correct the maximum number of erasures given the number of
redundancy or parity symbols. If an MDS code has r parities
and no more than r erasures occur, then by transmitting all
the remaining data in the code one can recover the original
information. However, it was shown that in order to recover a
single symbol erasure, only a fraction of 1/r of the information
needs to be transmitted. This fraction is called the repair
bandwidth (fraction). Explicit code constructions were given in
previous works. If we view each symbol in the code as a vector
or a column, then the code forms...