Comparing Open Source Operating Systems
In this work, we perform an in-depth analysis of the execution of three popular
open-source operating systems: Linux, FreeBSD, and OpenBSD for a suite of 7
server-class workloads. We use a full system simulation framework (Tejas) to
analyze the execution of the entire system (application+OS) from an architectural
perspective. This work seeks to bring out insights that can serve as directives
to OS designers. For example, we show that owing to smaller i-cache and
d-cache footprints (measured as number of unique cache lines accessed) of
filesystem related system call and interrupt handlers, FreeBSD outperforms Linux
and OpenBSD for file intensive workloads. We also compare two versions of
these operating systems, one released in March-2014 and another released in
March-2016 to analyze the net impact of all the OS modifications on the execution
of selected workloads. This is an ongoing work. We will submit it to the ACM
Operating Systems Review (OSR) journal in the next 2-3 months.
Scheduler for OS Intensive Applications
Traditional OS schedulers typically execute all types of tasks — user applications,
system call and interrupt handlers — on the same core, thereby leading to i-cache
pollution. This reduces the performance of OS intensive applications such as web
servers and database servers by up to 50%. We propose SchedTask, a hardwareassisted
OS scheduler that executes dissimilar tasks on separate cores. The primary
contributions of the scheduler include a novel approach of using hardware
Bloom filters to quantify the instruction similarity between tasks at run-time, and
a work stealing algorithm that increases instruction throughput by scheduling
suitable tasks on idle cores. For a suite of 8 OS intensive applications, the performance
gap between SchedTask and the state of the art OS scheduling proposals is
up to 27% (mean: 12.7%).
This work has been accepted for publication at the IEEE/ACM International Symposium on Microarchitecture (MICRO), Boston, USA, 2017.
Instruction Prefetcher for OS Intensive Applications
Computer
architects have long used instruction prefetching to improve the
performance
of operating system (OS) intensive workloads. Sophisticated instruction
prefetchers are implemented mostly in hardware; they record the
execution history
of a program in dedicated structures and use this information for
prefetching
if a known execution pattern is repeated. The storage overheads of
these structures
are prohibitively high (64- 200 KB per core). We show that in the case
of OS
intensive applications, the i-cache misses are mostly clustered in
small execution
blocks that follow an OS event. We propose a sophisticated technique to
identify
and prefetch these execution blocks from within the software. Our
technique uses only 4 additional registers per core, and it still gives
a performance
improvement of up to 14% (mean: 7%) over the state of the art
instruction
prefetchers for a suite of 8 OS intensive applications. This work was accepted for publication at the IEEE/ACM International Symposium on Microarchitecture (MICRO),
Taipei, Taiwan, 2016.
Ethical Hacking of License Managers
This work proposes a novel strategy to circumvent the license validation of a
proprietary software. We first collect the execution trace of a licensed software
using an emulator such as Intel PIN or QEMU. These traces are then analyzed to
identify regions of code that are responsible for the license checking. Such regions
of code are then skipped in the next execution in a way that the software’s data
state remains valid, and it continues along the ideal execution path. As a proof of
concept, we use these techniques to crack six applications protected using license
managers. This work was awarded the best poster award at the Security and Privacy
Symposium (SPS), 2015.
Heterogeneous Processor for Clouds
In this work, we propose the design of TriKon a manycore processor for clouds.
The primary contribution of the work is a novel cache called Triangle cache that
replaces the existing instruction cache. Triangle cache stores the instructions of all
codes in the software stack — application, OS and VMM — in separate memory elements
and allows these memory elements to intelligently migrate lines amongst
themselves depending on the memory requirement of each code. A core with a
Triangle cache is more suited to execute I/O intensive workloads. We also propose
the design of cores that cater to CPU intensive and memory intensive workloads.
The area of the TriKon processor is within 2% of a baseline processor, and
with such a system, we could achieve a performance gain of 12% for a suite of 16
applications. This work was accepted at the IEEE International Conference on High Performance Computing
(HIPC), Goa, India, 2014.
OS Jitter Mitigation
For a suite of 17 multi-threaded applications, we show that the OS activity (system
call and interrupt handlers, and kernel threads) can cause timing variations of up
to 35.22% in large multi-core systems. This timing variation is defined as Jitter. We
propose to reduce the OS-induced jitter by using two hardware units: (1) a Jitter
unit that uses a distributed protocol to keep track of the time lost to the OS and
subsequently tries to compensate it by using DVFS, and (2) a dedicated cache to
store the OS cache lines at the L2 level in memory hierarchy. The area overhead of
our scheme is limited to 1% and is shown to reduce the overall jitter of a suite of
17 multi-threaded applications by an average 8%. This work was accepted in IEEE Transactions
on Parallel and Distributed Systems (TPDS) journal (Volume 25, Issue 5), 2014.