Sorav Bansal's Homepage

Assistant Professor,
Department of Computer Science,
IIT Delhi.

Brief Bio

Sorav Bansal is an Assistant Professor at the CS department in IIT Delhi, and works in the areas of programming languages and operating systems. His recent research efforts have been around redefining OS abstractions for modern computing environments, investigating superoptimization-based compiler design patterns, and improving virtualization performance. Sorav obtained his B.Tech. from IIT Delhi, and Ph.D. from Stanford University.


Here are the courses I am teaching in the current semester. See all my courses here. See OS lecture videos here
See a nice panel discussion on systems research here

Reading group

We try and meet regularly to discuss recent/classic papers in PL/OS areas. See details here
I spent a sabbatical for one year (2015-16) at IISc Bangalore, and we ran a systems reading group in Bangalore.


Please see some of our publicly-released software at


Please take a look at some discussion on our research at

Representative Publications

(see full publication list at
With each paper, I also provide an informal summary/comment to provide some context on the paper.

[ APLAS 2017 ] Black-box equivalence checking across compiler optimizations
The fundamental problem of checking equivalence of two computer programs is as old as computing itself. The problem is undecidable. This paper presents an algorithm that attempts to compute equivalence across end-to-end optimizations produced by GCC, LLVM, ICC and CompCert compilers. The experiments reported in this paper are exhaustive and first of its kind and provide evidence on the robustness and limitations of our algorithm. The paper also provides insight into the power of modern-day compilers.

[ HVC 2017 ] Modeling undefined behaviour semantics for checking equivalence across compiler optimizations
Undefined behaviour semantics in programming languages are typically introduced for better optimization opportunity. Indeed we find that C compilers make extensive use of undefined behaviour semantics to produce faster code. Yet, previous work on translation validation has ignored such optimizations that rely on undefined behaviour. Section 2 of this paper contains a nice example explaining the subtlety of the problem. Our solution which involves extending the simulation relation with assumptions that remove undefined-behaviour seems straight-forward in retrospect. Our alias analysis algorithm for LLVM and assembly code is another important contribution of this work. The experimental results show the importance of different types of undefined behaviour on compiler optimization.

[ TPDS 2017 ] The Unicorn Runtime: Efficient distributed shared memory programming for hybrid CPU-GPU clusters
How can a programmer harness the combined computing power of CPUs and GPUs in a computing cluster without dealing with all the nitty gritties of the underlying system and its optimizations? The paper presents an intuitive programming model (which is perhaps not entirely new). The contribution of the paper is in algorithms to automatically map a program written in this programming model on a cluster of CPUs and GPUs through compile-time and run-time support in a system called Unicorn.

[ SOSP 2013 ] Fast Dynamic Binary Translation for the Kernel [ talk video ]
Dynamic binary translation (DBT) is a technique with a wide variety of applications; innovative ways to employ DBT appear almost regularly in the research literature. This paper presents a rather innovative way to perform DBT for an OS kernel efficiently even in the presence of high rates of interrupt and exception activity. The capability of turning DBT on and off during system execution is quite cool, and did not exist in any previous DBT system previously. The orchestration of translated instructions in the processor's instruction cache to improve the performance of the system (i.e., the translated system can be faster than the native system) is quite interesting. We use our DBT implementation to monitor the sharing behaviour of the Linux kernel.

This was also the first SOSP paper ever from India.

[ ASPLOS 2013 ] Efficient Virtualization on Embedded Power Architecture Platforms
I quote one of our ASPLOS 2013 reviewers:

---begin quote---
I find it extremely interesting to observe how a small set of seemingly innocuous architectural differences between Power and x86 leads to completely different "locally optimal" BT virtualization approaches (comparing this approach with the one outlined in the Adams paper [4]). These differences are: x86 has segmentation, Power has software-loaded TLB and variable page sizes, Power has orthogonal rwx page permissions, and x86 has variable length instructions.

It is almost like asking, how would the universe differ if the fine structure constant had a value of 0.00829 instead of 0.00729. Answer: a lot!

For this reason, I think the paper is a very good fit for ASPLOS. Even if architects do their best to think carefully about ramifications of their choices, this paper is an eye opener into a world of, probably, unforeseen consequences. As such, like it very much. And I think it should be a must-read for current and future architects (and not just narrowly from a virtualization point of view).
---end quote---

Thank you dear (anonymous) reviewer for liking our work!

[ OSDI 2008 ] Binary Translation Using Peephole Superoptimizers
Can we automatically infer translations from one architecture to another using synthesis and superoptimization techniques? This paper demonstrates such capability for PowerPC to x86 translation. The performance of the translated code is better than the widely-used commercial Rosetta translator (shipped with Apple's OS X) at that time. The paper provides techniques to dynamically map the state of one machine to the state of another machine in tandem with synthesis-based code generation, for better overall quality of the generated code.

[ ASPLOS 2006 ] Automatic Generation of Peephole Superoptimizers
How to automatically generate a peephole optimizer for x86 using brute-force superoptimization? This paper has received significant attention since it was published. Our current research is targetted towards generalizing and scaling the ideas presented in this paper.

[ WINET 2006 ] Energy Efficiency and Capacity of TCP Traffic in Multi-Hop Wireless Networks
How does the TCP protocol interact with the underlying congestion-control algorithms in a multi-hop wireless network? What are the implications on throughput (capacity) and energy efficiency?

[ FAST 2004 ] CAR: Clock with Adaptive Replacement
This paper extends the well-known CLOCK algorithm for cache replacement (used in most operating systems) with ideas presented in previous work on Adaptive Replacement Cache (ARC). As mentioned on ARC's wikipedia page, variants of this approach have been implemented and deployed in several systems. The primary advantage of CAR (CLOCK with Adaptive Replacement) over ARC is that it gets rid of the LRU lock, while retaining all the benefits of ARC.