  1. Efficient Dynamic Automatic Memory Management And Concurrent Kernel Execution For General-Purpose Programs On Graphics Processing Units

    Pai, Sreepathi
    Modern supercomputers now use accelerators to achieve their performance with the most widely used accelerator being the Graphics Processing Unit (GPU). However, achieving the performance potential of systems that combine a GPU and CPU is an arduous task which could be made easier with the assistance of the compiler or runtime. In particular, exploiting two features of GPU architectures -- distributed memory and concurrent kernel execution -- is critical to achieve good performance, but in current GPU programming systems, programmers must exploit them manually. This can lead to poor performance. In this thesis, we propose automatic techniques that: i) perform...

  2. Efficient Execution Of AMR Computations On GPU Systems

    Raghavan, Hari K
    Adaptive Mesh Refinement (AMR) is a method which dynamically varies the spatio-temporal resolution of localized mesh regions in numerical simulations, based on the strength of the solution features. Due to high resolution discretization of localized regions of interests into rectangular mesh units called patches, AMR provides low cost of computations and high degree of accuracy. General purpose graphics processing units (GPGPUs) with their support for fine-grained parallelism, offer an attractive option for obtaining high performance for AMR applications. The data parallel computations of the finite difference schemes of AMR can be efficiently performed on GPGPUs. This research deals with challenges...

  3. Prediction Of Queue Waiting Times For Metascheduling On Parallel Batch Systems

    Rajath Kumar, *
    Production parallel systems are space-shared and employ batch queues in which the jobs submitted to the systems are made to wait before execution. Thus, jobs submitted to parallel batch systems incur queue waiting times in addition to the execution times. Prediction of these queue waiting times is important to provide overall estimates to the users and can also help meta-schedulers make scheduling decisions. In the first part of our research, we have developed an integrated framework PQStar for identification and prediction of jobs with short queue waiting times. Analyses of the job traces of supercomputers reveal that about 56 to...

  4. Compiling For Coarse-Grained Reconfigurable Architectures Based On Dataflow Execution Paradigm

    Alle, Mythri
    Coarse-Grained Reconfigurable Architectures(CGRAs) can be employed for accelerating computational workloads that demand both flexibility and performance. CGRAs comprise a set of computation elements interconnected using a network and this interconnection of computation elements is referred to as a reconfigurable fabric. The size of application that can be accommodated on the reconfigurable fabric is limited by the size of instruction buffers associated with each Compute element. When an application cannot be accommodated entirely, application is partitioned such that each of these partitions can be executed on the reconfigurable fabric. These partitions are scheduled by an orchestrator. The orchestrator employs dynamic dataflow...

  5. Memory Efficient Regular Expression Pattern Matching Architecture For Network Intrusion Detection Systems

    Kumar, Pawan
    The rampant growth of the Internet has been coupled with an equivalent growth in cyber crime over the Internet. With our increased reliance on the Internet for commerce, social networking, information acquisition, and information exchange, intruders have found financial, political, and military motives for their actions. Network Intrusion Detection Systems (NIDSs) intercept the traffic at an organization’s periphery and try to detect intrusion attempts. Signature-based NIDSs compare the packet to a signature database consisting of known attacks and malicious packet fingerprints. The signatures use regular expressions to model these intrusion activities. This thesis presents a memory efficient pattern matching system...

  6. Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterogeneous Processors

    Prasad, Ashwin
    MATLAB is an array language, initially popular for rapid prototyping, but is now being in-creasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program’s execution time. Today’s com-puter systems have tremendous computing power in the form of traditional CPU cores and also throughput-oriented accelerators such as graphics processing units (GPUs). Thus, an approach that maps the control flow dominated regions of a MATLAB program to the CPU and the data parallel regions to the GPU can...

  7. A Coarse Grained Reconfigurable Architecture Framework Supporting Macro-Dataflow Execution

    Varadarajan, Keshavan
    A Coarse-Grained Reconfigurable Architecture (CGRA) is a processing platform which constitutes an interconnection of coarse-grained computation units (viz. Function Units (FUs), Arithmetic Logic Units (ALUs)). These units communicate directly, viz. send-receive like primitives, as opposed to the shared memory based communication used in multi-core processors. CGRAs are a well-researched topic and the design space of a CGRA is quite large. The design space can be represented as a 7-tuple (C, N, T, P, O, M, H) where each of the terms have the following meaning: C -choice of computation unit, N -choice of interconnection network, T -Choice of number of...

  8. Long-Running Multi-Component Climate Applications On Grids

    Sundari, Sivagama M
    Climate science or climatology is the scientific study of the earth’s climate, where climate is the term representing weather conditions averaged over a period of time. Climate models are mathematical models used to quantitatively describe, simulate and study the interactions among the components of the climate system -atmosphere, ocean, land and sea-ice. CCSM (Community Climate System Model) is a state-of-the-art climate model, and a long-running coupled multicomponent parallel application involving component models for simulating the components of the climate system. Each of the component models is a large-scale parallel application, and the parallel components exchange climate data through a specialized...

  9. Efficient Fault Tolerance In Chip Multiprocessors Using Critical Value Forwarding

    Subramanyan, Pramod
    Relentless CMOS scaling coupled with lower design tolerances is making ICs increasingly susceptible to transient faults, wear-out related permanent faults and process variations. Decreasing CMOS reliability implies that high-availability systems which were previously restricted to the domain of mainframe computers or specially designed fault-tolerant systems may be come important for the commodity market as well. In this thesis we tackle the problem of enabling efficient, low cost and configurable fault-tolerance using Chip Multiprocessors (CMPs). Our work studies architectural fault detection methods based on redundant execution, specifically focusing on “leader-follower” architectures. In such architectures redundant execution is performed on two cores/threads of...

  10. An Extension Of Multi Layer IPSec For Supporting Dynamic QoS And Security Requirements

    Kundu, Arnab
    Governments, military, corporations, financial institutions and others exchange a great deal of confidential information using Internet these days. Protecting such confidential information and ensuring their integrity and origin authenticity are of paramount importance. There exist protocols and solutions at different layers of the TCP/IP protocol stack to address these security requirements. Application level encryption viz. PGP for secure mail transfer, TLS based secure TCP communication, IPSec for providing IP layer security are among these security solutions. Due to scalability, wide acceptance of the IP protocol, and its application independent character, the IPSec protocol has become a standard for providing Internet...

  11. Emulating Variable Block Size Caches

    Muthulaxmi, S

  12. Search-Optimized Disk Layouts For Suffix-Tree Genomic Indexes

    Bhavsar, Rajul D
    Over the last decade, biological sequence repositories have been growing at an exponential rate. Sophisticated indexing techniques are required to facilitate efficient searching through these humongous genetic repositories. A particularly attractive index structure for such sequence processing is the classical suffix-tree, a vertically compressed trie structure built over the set of all suffixes of a sequence. Its attractiveness stems from its linearity properties -- suffix-tree construction times are linear in the size of the indexed sequences, while search times are linear in the size of the query strings. In practice, however, the promise of suffix-trees is not realized for extremely long...

  13. Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture

    Biswas, Prasenjit
    Application domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are in Numerical Linear Algebra (NLA) kernels. Direct solvers are predominantly required in the domains like DSP, estimation algorithms like Kalman Filter etc, where the matrices on which operations need to be performed are either small or medium sized, but dense. Faddeev's Algorithm is often used for solving dense linear system of equations. Modified Faddeev's algorithm (MFA) is a general algorithm on which LU decomposition, QR...

  14. Computational Studies Of Uncertainty In Intra-Cellular Biochemical Reaction Systems

    Dana, Saswati
    With an increased popularity for systems-based approaches in biology, a wide spectrum of techniques has been applied to the simulation and analysis of biochemical systems which involves uncertainty and stochasticity. It is particularly concerned with modelling and analysis of metabolic pathways, regulatory and signal transduction networks for understanding intra-cellular pathway behaviour. Typically, parameter estimation in ordinary differential equations(ODEs) models is used for this purpose when there is large number of molecules involved in the reaction system. However this approach is correct when the system is large enough to be deterministic in nature. But there are uncertainty involved in the system...

  15. A Computational Study Of Ion Crystals In Paul Traps

    Kotana, Appala Naidu
    In this thesis we present a computational study of “ion crystals”, the interesting patterns in which ions arrange themselves in ion traps such as Paul and Penning traps. In ion crystals the ions are in equilibrium due to the balance of the repulsive forces between the ions and the overall tendency of the ion trap to pull ions towards the trap centre. We have carried out a detailed investigation of ion crystals in Paul traps by solving their equations of motion numerically. We also propose a model called the spring–mass model to explain the formation of ion crystals. This model...

  16. Structural Studies On Bovine Pancreatic Phospholipase A2 And Proteins Involved In Molybdenum Cofactor Biosynthesis

    Kanaujia, Shankar Prasad
    We have carried out structural studies on bovine pancreatic phospholipase A2 (BPLA2) and two proteins involved in molybdenum cofactor (Moco) biosynthesis pathway. In addition, molecular-dynamics simulations and other analyses have been performed to corroborate the findings obtained from the crystal structures. Crystal structures of the three active-site mutants (H48N, D49N and D49K) of BPLA2 were determined to understand the mechanism by which the mutant H48N is able to catalyze the reaction of phospholipid hydrolysis and to see the effect of the loss of Ca 2+ ion in the active site of D49N and D49K mutants. We found that Asp49 could...

  17. Commit Processing In Distributed On-Line And Real-Time Transaction Processing Systems

    Gupta, Ramesh Kumar

  18. Combining Conditional Constant Propagation And Interprocedural Alias Analysis

    Nandakumar, K S

  19. Novel Energy Transfer Computation Techniques For Radiosity Based Realistic Image Synthesis

    Sidhu, Reetinder P S

  20. Parallel Voxelization Algorithms For Volume Rendering Of Unstructured Grids

    Prakash, C Edmond

