sim-fast: It does no time accounting, only functional simulation-it executes each instruction serially, simulating no instructions in parallel. sim-fast is optimized for raw speed, and assumes no cache, instruction checking.
sim-safe: It
also performs functional simulation, but checks for correct alignment and
access permissions for each memory reference. sim-fast and sim-safe do
not accept any additional command line arguments.
sim-profile:
It can generate detailed profiles on instruction classes and addresses,
text symbols, memory accesses, branches, and data segment symbols. It accepts
following additional command-line arguments,
which toggle the various profiling features:
-iclass
instruction class profiling (e.g. ALU, branch).
-iprof
instruction profiling (e.g., bnez, addi).
-brprof
branch class profiling (e.g., direct, calls, conditional).
-amprof
addr. mode profiling (e.g., displaced, R+R).
-segprof
load/store segment profiling (e.g., data, heap).
-tsymprof
execution profile by text symbol (functions).
-dsymprof
reference profile by data segment symbol.
-taddrprof
execution profile by text address.
-all
turn on all profiling listed above.
sim-cache and sim-cheetah: These simulators are ideal for fast simulation of caches if the effect of cache performance on execution time is not needed.
sim-cache accepts the following
arguments,
in addition to the universal arguments.
-cache:dl1 <config> configures a level-one
data cache.
-cache:dl2 <config> configures a level-two
data cache.
-cache:il1 <config> configures
a level-one instr. cache.
-cache:il2 <config> configures
a level-two instr. cache.
-tlb:dtlb <config>
configures the data TLB.
-tlb:itlb <config>
configures the instruction TLB.
-flush <boolean>
flush all caches on a system call; (<boolean> = 0 | 1 | true | TRUE
| false | FALSE).
-icompress
remap SimpleScalar's 64-bit instructions to a 32-bit equivalent in the
simulation (i.e., model a machine with 4-word instructions).
-pcstat <stat>
generate a text-based profile.
The cache configuration (<config>) is formatted
as follows:
<name>:<nsets>:<bsize>:<assoc>:<repl>
Each of these fields has the following meaning:
<name> cache name,
must be unique.
<nsets> number of
sets in the cache.
<bsize> block size
(for TLBs, use the page size).
<assoc> associativity of
the cache (power of two).
<repl>
replacement policy (l | f | r), where l = LRU, f = FIFO, r = random replacement.
The cache size is therefore the product of <nsets>, <bsize>,
and <assoc>.
To have a unified level in the hierarchy,
"point" the instruction cache to the name of the data cache in the corresponding
level, as in the following example:
-cache:il1 il1:128:64:1:l
-cache:il2 dl2
-cache:dl1 dl1:256:32:1:l
-cache:dl2 ul2:1024:64:2:l
The defaults used in sim-cache are as follows:
L1 instruction cache: il1:256:32:1:l
(8 KB)
L1 data cache:
dl1:256:32:1:l (8 KB)
L2 unified cache:
ul2:1024:64:4:l (256 KB)
instruction TLB:
itlb:16:4096:4:l (64 entries)
data TLB:
dtlb:32:4096:4:l (128 entries)
sim-cheetah accepts the following
command-line
arguments, in addition to the universal command line parameters.
-refs [inst | data | unified] specify
which reference stream to analyze.
-C [fa | sa | dm] fully associative, set
associative, or direct-mapped cache.
-R [lru | opt]
replacement policy.
-a <sets>
log base 2 minimum bound on number of sets to simulate simultaneously.
-b <sets>
log base 2 maximum bound on set number.
-l <line>
cache line size (in bytes).
-n <assoc>
maximum associativity to analyze (in log base 2).
-in <interval> cache
size interval to report when simulating fully associative caches.
-M <size>
maximum cache size of interest.
-C <size>
cache size for direct-mapped analyses.
sim-profile:
It generates detailed profiles on instruction classes and addresses, text
symbols, memory accesses, branches, and data segment symbols. It accepts
the following command-line arguments, which
toggle the various profiling features:
-iclass
instruction class profiling (e.g. ALU, branch).
-iprof
instruction profiling (e.g., bnez, addi).
-brprof
branch class profiling (e.g., direct, calls, conditional).
-amprof
addr. mode profiling (e.g., displaced, R+R).
-segprof load/store
segment profiling (e.g., data, heap).
-tsymprof execution profile
by text symbol (functions).
-dsymprof reference profile by data
segment symbol.
-taddrprof execution profile by
text address.
-all
turn on all profiling listed above.
sim-outorder:
This simulator supports out-of-order issue and execution, based on the
Register Update Unit . The RUU scheme uses a reorder buffer to automatically
rename registers and hold the results of pending instructions. Each cycle
the reorder buffer retires completed instructions in program order to the
architected register file.
The processor's memory system employs a load/store queue. Store values
are placed in the queue if the store is speculative. Loads are dispatched
to the memory system when the addresses of all previous stores are known.
Loads may be satisfied either by the memory system or by an earlier store
value residing in the queue, if their addresses match. Speculative loads
may generate cache misses, but speculative TLB misses stall the pipeline
until the branch condition is known.
sim-outorder runs about an order of magnitude slower than sim-fast.
In addition to the general arguments sim-outorder uses the following command-line
argu-ments:
Specifying the processor core
-fetch:ifqsize <size> set
the fetch width to be <size> instructions. Must be a power of two. The
default is 4.
-fetch:speed <ratio> set
the ratio of the front end speed relative to the execution core (allowing
<ratio> times as many instructions to be fetched as decoded per cycle).
-fetch:mplat <cycles> set the
branch misprediction latency. The default is 3 cycles.
-decode:width <insts> set the decode
width to be <insts>, which must be a power of two. The default is 4.
-issue:width <insts>
set the maximum issue width in a given cycle. Must be a power of two. The
default is 4.
-issue:inorder
force the simulator to use in-order issue. The default is false.
-issue:wrongpath
allow instructions to issue after a misspeculation. The default is true.
-ruu:size <insts>
capacity of the RUU (in instructions). The default is 16.
-lsq:size <insts>
capacity of the load/store queue (in instructions). The default is 8.
-res:ialu <num>
specify number of integer ALUs. The default is 4.
-res:imult <num>
specify number of integer multipliers/dividers. The default is 1.
-res:memports <num> specify number
of L1 cache ports. The default is 2.
-res:fpalu <num>
specify number of floating point ALUs. The default is 4.
-res: fpmult <num>
specify number of floating point multipliers/ dividers. The default is
1.
Specifying the memory hierarchy
All of the cache arguments and formats used in
sim-cache are also used in sim-out-order, with the following additions:
-cache:dl1lat <cycles> specify the
hit latency of the L1 data cache. The default is 1 cycle.
-cache:d12lat <cycles>specify the hit
latency of the L2 data cache. The default is 6 cycles.
-cache:il1lat <cycles> specify
the hit latency of the L1 instruction cache. The default is 1 cycle.
-cache:il2lat <cycles> specify
the hit latency of the L2 instruction cache. The default is 6 cycles.
-mem:lat <1st> <next> specify main
memory access latency (first, rest). The defaults are 18 cycles and 2 cycles.
-mem:width <bytes>
specify width of memory bus in bytes. The default is 8 bytes.
-tlb:lat <cycles>
specify latency (in cycles) to service a TLB miss. The default is 30 cycles.
Specifying the branch predictor
Branch prediction is specified by choosing the following flag with
one of the six subsequent arguments. The default
is a bimodal predictor with 2048 entries.
-bpred <type>
nottaken always predict
not taken.
taken
always predict taken.
perfect
perfect predictor.
bimod
bimodal predictor, using a branch target buffer (BTB) with 2-bit counters.
2lev
2-level adaptive predictor.
comb
combined predictor (bimodal and 2-level adaptive).
The predictor-specific arguments are listed
below:
-bpred:bimod <size> set the bimodal
predictor table size to be <size> entries.
-bpred:2lev <l1size> <l2size> <hist_size>
<xor> specify the 2-level adaptive predictor.
<l1size>
specifies the number of entries in the first-level table,
<l2size>
specifies the number of entries in the second-level table,
<hist_size> specifies the history
width, and
<xor>
allows you to xor the history and the address in the second level of the
predictor.
The default settings for the above four parameters
are 1, 1024, 8, and 0, respectively.
-bpred:comb <size>
set the meta-table size of the combined predictor to be <size> entries.
The default is 1024.
-bpred:ras <size>
set the return stack size to <size> (0 entries means to return stack).
The default is 8 entries.
-bpred:btb <sets> <assoc>
configure the BTB to have <sets> sets and an associativity of <assoc>.
The defaults are 512 sets and an associativity of 4.
-bpred:spec_update <stage> allow speculative
updates of the branch predictor in the decode or writeback stages (<stage>
= [ID|WB]). The default is nonspeculative updates in the commit stage.
Visualization
-pcstat <stat> record statistic
<stat> by text address;
-ptrace <file> <range> pipeline
tracing.