Prathmesh Kallurkar

Comparing Open Source Operating Systems

In this work, we perform an in-depth analysis of the execution of three popular open-source operating systems: Linux, FreeBSD, and OpenBSD for a suite of 7 server-class workloads. We use a full system simulation framework (Tejas) to analyze the execution of the entire system (application+OS) from an architectural perspective. This work seeks to bring out insights that can serve as directives to OS designers. For example, we show that owing to smaller i-cache and d-cache footprints (measured as number of unique cache lines accessed) of filesystem related system call and interrupt handlers, FreeBSD outperforms Linux and OpenBSD for file intensive workloads. We also compare two versions of these operating systems, one released in March-2014 and another released in March-2016 to analyze the net impact of all the OS modifications on the execution of selected workloads. This is an ongoing work. We will submit it to the ACM Operating Systems Review (OSR) journal in the next 2-3 months.

Scheduler for OS Intensive Applications

Traditional OS schedulers typically execute all types of tasks — user applications, system call and interrupt handlers — on the same core, thereby leading to i-cache pollution. This reduces the performance of OS intensive applications such as web servers and database servers by up to 50%. We propose SchedTask, a hardwareassisted OS scheduler that executes dissimilar tasks on separate cores. The primary contributions of the scheduler include a novel approach of using hardware Bloom filters to quantify the instruction similarity between tasks at run-time, and a work stealing algorithm that increases instruction throughput by scheduling suitable tasks on idle cores. For a suite of 8 OS intensive applications, the performance gap between SchedTask and the state of the art OS scheduling proposals is up to 27% (mean: 12.7%). This work has been accepted for publication at the IEEE/ACM International Symposium on Microarchitecture (MICRO), Boston, USA, 2017.

Instruction Prefetcher for OS Intensive Applications

Computer architects have long used instruction prefetching to improve the performance of operating system (OS) intensive workloads. Sophisticated instruction prefetchers are implemented mostly in hardware; they record the execution history of a program in dedicated structures and use this information for prefetching if a known execution pattern is repeated. The storage overheads of these structures are prohibitively high (64- 200 KB per core). We show that in the case of OS intensive applications, the i-cache misses are mostly clustered in small execution blocks that follow an OS event. We propose a sophisticated technique to identify and prefetch these execution blocks from within the software. Our technique uses only 4 additional registers per core, and it still gives a performance improvement of up to 14% (mean: 7%) over the state of the art instruction prefetchers for a suite of 8 OS intensive applications. This work was accepted for publication at the IEEE/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan, 2016.

Ethical Hacking of License Managers

This work proposes a novel strategy to circumvent the license validation of a proprietary software. We first collect the execution trace of a licensed software using an emulator such as Intel PIN or QEMU. These traces are then analyzed to identify regions of code that are responsible for the license checking. Such regions of code are then skipped in the next execution in a way that the software’s data state remains valid, and it continues along the ideal execution path. As a proof of concept, we use these techniques to crack six applications protected using license managers. This work was awarded the best poster award at the Security and Privacy Symposium (SPS), 2015.

Heterogeneous Processor for Clouds

In this work, we propose the design of TriKon a manycore processor for clouds. The primary contribution of the work is a novel cache called Triangle cache that replaces the existing instruction cache. Triangle cache stores the instructions of all codes in the software stack — application, OS and VMM — in separate memory elements and allows these memory elements to intelligently migrate lines amongst themselves depending on the memory requirement of each code. A core with a Triangle cache is more suited to execute I/O intensive workloads. We also propose the design of cores that cater to CPU intensive and memory intensive workloads. The area of the TriKon processor is within 2% of a baseline processor, and with such a system, we could achieve a performance gain of 12% for a suite of 16 applications. This work was accepted at the IEEE International Conference on High Performance Computing (HIPC), Goa, India, 2014.

OS Jitter Mitigation

For a suite of 17 multi-threaded applications, we show that the OS activity (system call and interrupt handlers, and kernel threads) can cause timing variations of up to 35.22% in large multi-core systems. This timing variation is defined as Jitter. We propose to reduce the OS-induced jitter by using two hardware units: (1) a Jitter unit that uses a distributed protocol to keep track of the time lost to the OS and subsequently tries to compensate it by using DVFS, and (2) a dedicated cache to store the OS cache lines at the L2 level in memory hierarchy. The area overhead of our scheme is limited to 1% and is shown to reduce the overall jitter of a suite of 17 multi-threaded applications by an average 8%. This work was accepted in IEEE Transactions on Parallel and Distributed Systems (TPDS) journal (Volume 25, Issue 5), 2014.