The decreasing feature size of transistors with advancements in the chip-manufacturing technology has led to higher number of features being packed into a single chip. This growth in complexity has rendered traditional pre-silicon verification techniques such as simulations and emulations inadequate. Therefore, an increased number of functional bugs are escaping into the silicon. Such bugs often leave security backdoors that people with malicious intent can exploit to wreak havoc on the customer through exploits such as the recently discovered Spectre and Meltdown. This would tarnish the reputation of the chip-manufacturer and additionally, the affected chips may have to be recalled and replaced.
To alleviate this, initial samples of the chip are vetted by the manufacturer to expose any functional bugs that may have escaped into the silicon. The near native execution speeds offered by the initial samples allows complex tests to be exercised on these samples. However, the visibility into the internal functioning of the chip is significantly compromised in the initial samples. Thus, it becomes comparatively easy to uncover the presence of functional bugs, but extremely hard to localize and debug.
Design-for-Debug (DFD) hardware is inserted into the chip to increase the visibility into the internal functioning of the chip. Such hardware operates under severe resource constraints because they become vestigial once the chip is cleared for production. The twin goals of maximizing visibility while minimizing area are orthogonal to each other, and hence striking a balance between the two is difficult.
As part of my PhD thesis, I proposed two novel DFD hardware designs that target two popular debugging paradigms: run-stop debugging and at-speed debugging. These designs significantly improved upon the state-of-the-art methods significantly in an area-sensitive manner. Another highlight of the proposed hardware was its flexibility to adapt to the requirements of the bug scenarios under consideration.
In the past, I have looked at two issues that hampers performance and predictability of high-performance computing systems: (i) operating system jitter, and (ii) thread-synchronization. Operating system jitter induces variance in runtime of applications across multiple executions of the same sequence of instructions arising from user applications. This prevents commodity operating systems such as Linux from being deployed on servers targeted for high-performance stream processing applications such as base-stations of mobile networks. To alleviate this, we proposed novel hardware counters that estimate the time lost to an intervening operating system execution, and compensate for it by throttling the operating frequency.
Similarly, we determined that synchronization constructs such as barriers tend to become performance bottlenecks as the number of threads increase. We then proposed a distributed co-ordination scheme between the participating threads that communicate over on-chip optical interconnects that scales well to hundreds of cores, even when multiple software barriers are active. Such fast synchronization constructs are relevant on many-core chips with very high core-counts such as Intel's Single-chip Cloud Computer (SCC).