ALS-Project list
1.
Cache coherence protocol simulation I(Berkley vs
invalidate)
2. Cache coherence protocol simulation II(Illinois
vs Berkley)
Multilevel clustered caches
(inclusive or exclusive) and Snoopy or directory-based protocols
In Project 1 and 2, follow the model and approach in "Analysis and comparison of cache coherence protocols for a packet-switched multiprocessor"
Q Yang, LN Bhuyan, BC Liu - IEEE Transactions on Computers, 1989.
3
Simulation of Queuing Model for finding queue size
for p% rejection and finding p% rejection for fixed queue size.
All the simulation for this
experiment will be for open queue. There may be multiple request
generator or mutiple server and also queue may be separate for
each server/generator or single for all server/generator. You require
to simulate for different type request generator(Poisson,binomial or
anyother distribution)
4
Multiprocessor interconnection protocol(Tree, MESH,
Hypercube)
By comparison of different
interconnection mechanism for different load and processor
configuration in multiprocessor environment it is found that complete interconnection
network works for very high load, while for lower loads hypercube and ring type
of network is used.
You have to find the thresh-hold load for each of the interconnection network
then dynamically moving from one interconnection network to another. In moving from
one network to another we will assume some wake time delay.
We will assume number of node equal to 8.
5.
Simulation intelligent multi-banking memory service
multiple memory request generator,
load balancing in queue just like in Railway reservation counter, every
counter can serve you, you can change your queue to another when
you find that your present expectation time is more then others
queues. What will be extra hardware required to compute this
intelligence
and changing queue dynamically.
We will assume there are N processors generating requests to a N ported
shared memory. In general each processor node is tied to one port and will
build the queue there. We suggest that if queue size of one processor is getting larger
while other queues are small then some request could be send to other ports.
This load distribution depends on two factors, one is address space distribution
of each request generator and another is, request rate of each processor node.
In the simulator you have to use different address space distribution and different
request rate function for each processor and compare them.
Memory banks are other end can be assumes high address interleaved.
6
Loop Unrolling + PCA using ArchOpt tool (assign 2)
If you unroll loop in
application, you can get more degree of Instruction level
parallelism in
application, The process of loop unrolling using SUIF
compiler infrastructure, and then we can use our ArchOpt
automated tool for producing the optimal architecture for new loop
unrolled C code. Architecture optimization techniques do use the
standard methods of
dimension reduction of search space using principal components
analysis(PCA).
7 Simulation of
diff. bus arbitration policy (assign3)
In assignment 3, we have stated
to used FIFO bus arbitration policy, But that may not always give good
performance, So other bus arbitration policies to be tried like
fixed priority,
Read priority, Write priority, Round-Robin or Proportional share,
Combination of one or more.
8
Implementation of multiple request generator in
shared I&D cache (assign3).
For simplicity we can
used
shared cache rather than multiple cache, where the cache coherence
cost is more, we require to simulate the design of a shared cache
multiprocessor. Here the cache access time is not one cycle but its
multi-cycle. Here processor cycle one unit, you require to calculate
what should be the cache-access time in unit for m-number of processor
for different cache policy or vice versa.
9
Integration of new cache simulator with dinero &
cacti(assign 3).
In general most of the cache
simulator take memory access trace as input and produce result of
number of miss/hit ratio. Here inplace of random request generator
we require to used trace of actual
application generated by dinero. delay can be taken from cacti
simuator to do the experiments
10
Instruction encoding in VLIW using VIES (vliw instr encoding scheme)
In VLIW processors instruction encoding effects instruction size. Different
instruction encodings are used to compress the code. Main motivation
of all these techniqes is to remove NOPs.
You have to implement a one of the encoding scheme (there is a collaction of paper
here)
encoding scheme and show the compression ratio. Example codes, on which you have
to show compression will be given. You will also be given source code of an
assembler which you have to modify to get the encoding done.
11.
Finding optimal design parameters of VLIW processor for DSP applications)
In 2nd assignment we found optimal parameters of a superscalar processor.
Similarly you have to find optimal set of parameters for VLIW processor.
In VLIW processor number of parameters that can be changed are less, so
experiments can be performed manually. Experiments has to be performed on
5-6 applications of dsptone benchmarks.
As a output of the project, variation of performance with different parameters
and an optimal architecture is required. Cost function should be discussed
with the TA before implementation.
12.
Design of VLIW processor using BlockSim(Fetch Mechanism)
VLIW processor is an extention of RISC processor. In this assignment micro-architecture of
VLIW processor is to be designed using Block-sim ( A java based tool made in our department)
In the micro-architecture special care should be taken for fetch machanism of instructions,
assuming that you will get compressed instruction of differnt size.
Time Chart
Assignment uploaded : 4th April
Assignment selection and dissussion of design: 10th April
Submission:28th April
Demos:29-30 April