About the Course

Welcome to CSV880, Special Topics in Parallel Computing.


Parallelism has been employed for many years, mainly in high-performance computing. Interest in it has grown lately due to the physical constraints preventing frequency scaling. Parallel Computers can be roughly classified as either (i) shared-memory systems: multi-core/multi-processor computers having multiple processing elements in a single machine; or (ii) distributed-memory systems: typically using multiple computers to work on the same task. Typically, practical systems employ hybrid systems comprising of distributed memory systems with each computer comprising of shared memory system.


Distributed-memory systems are more scalable than shared-memory systems and are therefore more frequently used in Supercomputers. Such large scale systems typically comprise of a massive number of dedicated processors placed in close proximity to each other and connected using a special dedicated network such as a mesh or hypercube architecture. This saves considerable time in moving data around and makes it possible for the processors to work together. Supercomputers play an important role in computational science and are used in a wide range of fields, including weather forecasting, climate research, oil and gas exploration, and physical simulations.


Developing parallel programs involves aspects related to designing parallel algorithms, performing efficient communication and synchronization amongst the processors. In this course, we will be studying some basic computational routines that are used in Supercomputing (solving system of linear equations, fast Fourier transforms and graph problems). We will study algorithms for solving these problems in parallel and the underlying communication patterns. We will study various network topologies (mesh, hypercube, dragonfly) and relate the above communication patterns to these topologies.



  • January 21, 2014

    Organizational class in 501

              Course Contents

The tentative contents for the course are outlined below:

  • 1. Introduction to High Performance Computing
    •    - Focus on Distributed Memory Systems (Architecture)
    •    - Network Interconnects and message passing paradigm
    •    - Applications and computation kernels (FFT, LU)
    •    - Basic Data Decomposition Strategies (Block, Cyclic, Block-Cyclic)

  • 2. Introduction to network interconnects
    •    - Hypercube
    •    - Mesh and Torus networks
    •    - Dragonfly networks
    •    - Communication collectives and primitives

  • 3. Algorithms for Computation Kernels
    •    - Solving System of Linear Equations (and associated communication patterns)
    •    - Fast Fourier Transforms (and associated communication patterns)
    •    - Graph Algorithms (BFS) (and associated communication patterns)

  • 4. Communication Primitives and Collectives
    •    - Blocking and Non-blocking sends/receives
    •    - Broadcasts, Reduce, All-to-all, Gather/Scatter
    •       (with Algorithms for different topologies)

  • + Simulator based assignment
  •       (implementing communication patterns for different topologies)