# Power Reduction in VLIW Processor with Compiler Driven Bypass Network

Neeraj Goel Dr Anshul Kumar Dr Preeti Ranjan Panda Department of Computer Science, IIT Delhi

#### Outline

- Introduction: VLIW, bypass
- Approach
- Techniques
  - Compiler support
  - Micro-architectural modifications
- Experiments
- Conclusion

## Introduction

- VLIW Processor
  - Multiple FUs
  - Static scheduled
  - Multi-ported RF
- Bypass
- Bypass complexity Time

   2\*k\*N<sup>2</sup> comparators
   I<sup>F</sup>
   2N Mux with <sup>E</sup>N+
   W<sup>B</sup>
   A Mux with <sup>E</sup>N+
   EX

   Instruction Fetch

   IF
   ID
   ID



### **Motivation**

- Conventional pipeline and register file read
   Read in 2<sup>nd</sup> stage and bypass in 3<sup>rd</sup> stage or later
- When reading from bypass, RF read is not necessary
- Most values are short lived
  - Majority of operands are read from bypass network
- RF read is costlier in multi-port RF

Read energy increases as N<sup>3</sup>, N is no. of ports

#### **Avoidance of redundant RF reads**

#### Hardware

- Bypass control need to be before RF read
  - Increase in clock cycle or
  - Increase in number of pipeline stages
- Software
  - Static scheduling in VLIW processor accurate bypass calculations in compiler
  - Bypass Hints with each operand => increase in instruction width

#### **Bypass Control Representation**

- Compiler visible bypass registers
- Addresses are assigned to bypass registers
- An example organization



### **Multi-stage Bypass**

- Result is kept in pipeline for more stages
- Hierarchy of multiplexer for performance
- More energy efficient



#### **Avoidance of Redundant Writes**

- Values read from bypass are written back to RF as
  - May be required in future
- Values read from bypass, need not to written to RF if:
  - They are not required in future
- Most registers have short life time => more advantageous

### **Compiler Support**

• Analysis

 Liveness analysis to find if an operand is a "last use" or not

- Code generation
  - Reduce available registers in processor description
  - Register renaming

add r13, r14, r8 sub r6, r13, r9



add bly3l,,m1141,m88 sub r6, by1, r9

R13 is 'Host' lase' use's in bindination tion

#### **Micro-Architectural Modification**

Bypass control in tronapilappaoarch approach m is number of bypass sources n is number of bypass destinations



#### **Experiments: Bypass reads**

- Machine model
  - 2 Integer Unit, 2 Memory Unit, each single latency



#### **Experiments: Avoided writes**

- Machine model
  - 2 Integer Unit, 2 Memory Unit, each single latency



### **Experiments: Energy Saved in RF**

- Register File energy model
  - Cacti 4.0
  - 0.18µm, 64 bit, 64 registers



#### **Issues and future work**

- Approach is limited within basic block
   No predication can be used
- Exception handling is more complicated

   Need to store bypass registers as they are part of processor state
- A better register allocation algorithm
  - Which consider "last use" and "bypass" information to reduce register requirement

### Conclusion

- Avoiding redundant reads and writes can save up to 40-60% energy of RF
- A novel approach of bypass control representation is suggested which leads to
  - Area savings of 3-5%
  - No increase in instruction width
- Multi-stage bypass can further increase the benefits

**Questions Thank you**