ALS-Project list


1.     Cache coherence protocol simulation I(Berkley vs invalidate)
2.     Cache coherence protocol simulation II(Illinois vs Berkley)

 Multilevel clustered caches (inclusive or exclusive) and Snoopy or directory-based protocols
In Project 1 and 2, follow the model and approach in "Analysis and comparison of cache coherence protocols for a packet-switched multiprocessor" Q Yang, LN Bhuyan, BC Liu - IEEE Transactions on Computers, 1989.

3      Simulation of Queuing Model for finding queue size for p% rejection and finding p% rejection for fixed queue size.

All the simulation for this experiment will be for open queue. There may be multiple request generator  or mutiple server and also queue may be separate for each server/generator or single for all server/generator. You require to simulate for different type request generator(Poisson,binomial or anyother distribution)

4     Multiprocessor interconnection protocol(Tree, MESH, Hypercube)

 By comparison of different interconnection mechanism for different load and processor configuration in multiprocessor environment it is found that complete interconnection network works for very high load, while for lower loads hypercube and ring type of network is used.
You have to find the thresh-hold load for each of the interconnection network then dynamically moving from one interconnection network to another. In moving from one network to another we will assume some wake time delay.
We will assume number of node equal to 8.
5.     Simulation intelligent multi-banking memory service

multiple memory request generator, load balancing in queue just like in Railway reservation counter, every counter can serve you, you can change  your queue to another when you find that  your present expectation time is more then others queues. What will be extra hardware required to compute this intelligence and changing queue dynamically.
We will assume there are N processors generating requests to a N ported shared memory. In general each processor node is tied to one port and will build the queue there. We suggest that if queue size of one processor is getting larger while other queues are small then some request could be send to other ports. This load distribution depends on two factors, one is address space distribution of each request generator and another is, request rate of each processor node. In the simulator you have to use different address space distribution and different request rate function for each processor and compare them. Memory banks are other end can be assumes high address interleaved.
6      Loop Unrolling + PCA using ArchOpt tool (assign 2)

 If you unroll loop in application, you can get more degree of Instruction level  parallelism in application, The process of loop unrolling using SUIF compiler infrastructure, and then we can use our ArchOpt automated tool for producing the optimal architecture for new loop unrolled C code. Architecture optimization techniques do use the standard methods of dimension reduction of search space using  principal components analysis(PCA).

7     Simulation of diff. bus arbitration policy (assign3)

 In assignment 3, we have stated to used FIFO bus arbitration policy, But that may not always give good performance, So other bus arbitration policies  to be tried like fixed priority, Read priority, Write priority, Round-Robin or Proportional share, Combination of one or more.

8     Implementation of multiple request generator in shared I&D cache (assign3).

 For simplicity we can used shared cache rather than multiple cache, where the cache coherence cost is more, we require to simulate the design of a shared cache multiprocessor. Here the cache access time is not one cycle but its multi-cycle. Here processor cycle one unit, you require to calculate what should be the cache-access time in unit for m-number of processor for different cache policy or vice versa.

9       Integration of new cache simulator with dinero & cacti(assign 3).

 In general most of the cache simulator take memory access trace as input and produce result of number of miss/hit ratio. Here inplace of random request generator
we require to used trace of actual application generated by dinero.  delay can be taken from cacti simuator to do the experiments

10  Instruction encoding in VLIW using VIES (vliw instr encoding scheme)
In VLIW processors instruction encoding effects instruction size. Different instruction encodings are used to compress the code. Main motivation of all these techniqes is to remove NOPs. You have to implement a one of the encoding scheme (there is a collaction of paper here) encoding scheme and show the compression ratio. Example codes, on which you have to show compression will be given. You will also be given source code of an assembler which you have to modify to get the encoding done.
11. Finding optimal design parameters of VLIW processor for DSP applications)

In 2nd assignment we found optimal parameters of a superscalar processor. Similarly you have to find optimal set of parameters for VLIW processor. In VLIW processor number of parameters that can be changed are less, so experiments can be performed manually. Experiments has to be performed on 5-6 applications of dsptone benchmarks. As a output of the project, variation of performance with different parameters and an optimal architecture is required. Cost function should be discussed with the TA before implementation.
12. Design of VLIW processor using BlockSim(Fetch Mechanism)

    

VLIW processor is an extention of RISC processor. In this assignment micro-architecture of VLIW processor is to be designed using Block-sim ( A java based tool made in our department) In the micro-architecture special care should be taken for fetch machanism of instructions, assuming that you will get compressed instruction of differnt size.

Time Chart

Assignment uploaded : 4th April
Assignment selection and dissussion of design: 10th April
Submission:28th April
Demos:29-30 April