GitHub - ewalke31/cs233hw7

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Makefile		Makefile
README		README
csim.py		csim.py
gcc.trace		gcc.trace
matrix.c		matrix.c
swim.trace		swim.trace
timeit.c		timeit.c

Repository files navigation

Eric Walker & Derek Reitz
ewalke31 & dreitz5
CSF HW 7
--------------------------------------------------------------------------------
Problem 1
Part 1)
Size:          ij D1-Cache miss rate:          ji:
4096           1.3%                           20%
2048           1.3%                           20%
512            1.5%                           19.5%
256            1.6%                           18.2%
128            2.3%                           14.8%
64             3.2%                           3.2%

The reason write order (ij vs. ji) matters is because in c, arrays are stored
with j increasing and then i increasing. Therefore, elements with the same i
are stored next to each other. 
The reason the ij writes have a lower cache miss rate (and is therefore 
faster) is due to memory locality. When the CPU writes to memory, it firsts 
checks the cache for that memory address. If it's not in the cache (miss) 
the cache loads a block of values from memory. The size of this block depends 
on the architecture, but the next few writes that write to addresses close 
to the first missed address will be hits.
With the ji writes, since the writes following a miss are not as close in 
memory (4096 bytes away) although the first couple writes may be inside the
previous block, many fewer consecutive write addresses will be. This leads
to a greater miss rate.
With a 2D array of size 64 x 64, the access order does not matter. The 
reason for this is because the entire array is contained in the same block.

Part 2)
Size: 4096
Optimization:          ij D1-Cache miss rate:          ji:
O0                     1.3%                            20%
O1                     6.2%                            99.8%
O2                     6.2%                            99.8%
O3                     24.8%                           99.1%

Once the compiler flags are set, the assignments occur in an unknown order.
Since this code can't really be optimzed very much, we do not know what 
the order is after optimzation. One possible explanation is that the inner 
loop assignment is mixed up at the 1st and 2nd optimization levels. This does
not affect the ij access that significantly, because it's likely that any
jth address for a given i will still be in the same block. For the ji access
however, the random i causes the accesses to be a random number times 4096
bytes away from each other, which is almost always outside of the last block.
The third optimization may affect the outer loop as well, which hinders
the ij access significantly. 
--------------------------------------------------------------------------------
Problem 2

Comments within csim.py explain our process. We go through the trace and based
on whether it is a load or store add different numbers to the output info
depending on which case it matches. We use a list (indexed by set) of
dictionaries that map the block number to the timestamp, updating the time for
any access for LRU and only when written for FIFO. We use a 2D list to
represent the cache, and this keeps track of the tag, valid bit, and dirty bit,
updating when appropriate.