Computer Architecture

Course: ELL782 and COL 718
Semester I, 2018-19
Credits: 3 (3-0-0) for ELL 782 and 4 for COL 718



Instructor: Dr. Smruti R. Sarangi

Lectures
: Mon, Thu 9:30-10:50. Block IV 254, Piazza link
Course Description: This course will give an introduction to designing and programming high performance processors.

Course Load: Exams + assignments

Evaluation Minor 1 (15%), Minor 2 (15%), Major (25%), 3 Assignments (12% + 12% + 16%), Attendance (5%) (via Timble)

Teaching Assistants Hameedah Sultan, Diksha Moolchandani, Ankit Gola, Anand Singh


Textbook:
Background on processors and caches: Computer Organisation and Architecture, Smruti R. Sarangi, McGrawHill India. Link to buy. Slides, and videos (link)


S. No.
Date
Lecture
Slides
References
1.
26th July
Course policies. Instruction sets.
Slides on assembly language
Book website: link
Out-of-Order Pipelines
2.
30th July
Instruction sets, and basic processor design.
Slides on processor design
Read three chapters: Assembly language, processor design
and principles of pipelining from Computer Organisation
and Architecture. (Sarangi, McGrawHill, 2015)
3.
2nd August
Basic OOO processor design and branch predictors.
OOO Execution - I
4.
6th August
Branch Prediction

Two level prediction, Two level prediction-IIAgree predictor, General techniques, Three level adaptive
5.
9th August
Register Renaming and the select unit
OOO - II Processor microarchitecture book
Quantifying the complexity of superscalar processors
Design space of renaming techniques
6.
13th Aug
Wakeup, Bypass, Broadcast, Select


7.
18th Aug
Load-Store Queue, Commit

Optimized Load Store Queue
8.
20th Aug.
Recovery from speculation: RRF and RRAT, SRAM vs CAM based checkpoints. ROB based OOO
processor design.
OOO-III Circular queues (photo of white board)
Circular queues: link
Diagram of the pipeline: image
Pipeline loops: read paper (Section 2.2)
9.
27th Aug
Little's Law [self-study]
Intro. to scheduling and replay

Little's Law
Scheduling and Replay
10.
30th Aug
Non-Selective and Deferred Selective Replay
Token based replay
Basics of SRAM cells

Load store speculation, store sets, dynamic dependence tracking, memory cloaking and bypassing
Background: See slides for Chapter 6 (link)
The Memory System
11.
6th Sep
DRAM cells (and arrays), CAM cells
Temporal locality (+ stack distance)
Spatial locality (+ address distance)
Memory hierarchy: i-cache, d-cache (L1), L2, L3, ...
Fully associative, set associative, and direct mapped caches

Background: See slides for logic registers, and memories (link)

[self-study]: Read the chapter on the memory system. Slides at link
12.
10th Sep
1. Types of cache misses
2. Methods to increase the hit rate, reduce miss penalty
3. Average memory access time computation
4. Virtual memory


13.
13th Sept.
Instruction prefetching
Inst. Prefetching CGP, Markov, PIF, RDIP
14
17th Sept
Pentium Trace Cache, stride based inst. prefetching

Trace cache patent
Survey
15
20th Sept.
Runahead Execution, Caches with Cacti Data Prefetching
Caches
Runahead Execution
Multi core memory systems (book), Cacti Report
16
24th Sept.
NUCA Caches

S-NUCA, R-NUCA
17
27th Sept.
R-NUCA overview
Network topologies

Network slides : Slides 99-109
18
8th
Oct
Basics of On Chip Networks
Routing On chip networks (book)
19
11th Oct
Routing and Flow Control
Flow Control
20
22nd Oct
Design of Routers
Router Micro-arch Allocator Implementations
Multiprocessor Systems
21
25th Oct
1. Limits to single-core performance
2. E is propto CV^2
3. P is propto CV^2f
4. P grows as f^3
5. T = AP
6. Leakage-temperature loop
7. Reliability and temperature relationship
8. Idea of multicores
9. Simple program in OpenMP
10. Moore's Law
link to slides (Chapter 11)

Power and temperature
(slides)
1. Slides on hard errors (link). Just the basic idea of errors is
in the syllabus. Leave out the details.

22
27th Oct
1. Programs in MPI
2. Amdahl's Law
3. Flynn's classification
4. SIMD processors, and SSE instructions
5. Axioms of coherence
6. Sequential consistency


23
29th Oct
1. Write atomicity: coherent systems need not have write atomicity
2. Memory model = atomicity + ordering
3. Difference between SC and coherence
4. Weak memory models
5. Writing programs in weak memory models


24
1st Nov
1. Snoopy coherence protocols: write-update and write-invalidate
2. Multi-threading
3. GPUs [self study]

Primer on Cache Coherence and Memory Consistency(book)
25
3rd Nov
Directory Coherence and Atomic Primitives Directory Coherence
26
5th Nov
Memory Consistency Models
Memory Consistency
A Formal Hierarchy of Weak Memory Models (link)
Tutorial on Shared Memory Models (link)
27
12th Nov
Transactional Memory
Transactional Memory
28
15th Nov
Talk on Quantum Computing
Dr. Moitreyee Roy, IBM Almaden Research Labs