In our project, we empirically investigate the performance of various matrix multiplication
algorithms under a cache-adaptive model. We try to simulate a two-level memory hierarchy and
see how the algorithm performs in the external memory model when the cache changes size. We
then attempt to test the algorithms on a real machine, by checking the number of page faults the
algorithm requires with varying cache availability. Both the simulation and on the real machine
demonstrate that cache-adaptive algorithms are more stable as the cache profile varies.
Theory and Algorithm
Formally, the cache-adaptive model can be seen as an extension to the disk-access model (DAM),
where the costs are only incurred when the algorithm has to read or write information to the disk
[1]. The cache-adaptive model is the case where we allow the size of the cache to change during the
execution of the program. The intention of the cache-adaptive model is to mimic situations in real life
where the cache available to a program is not stationary but is being divided based on the CPU and
on the needs of the other competing processes.
We can specify the memory profile of a cache as a function m : \( N \times N \) where \( m(t) \) is defined the
be the number of cache blocks the cache has at the tth cache miss of the algorithm, and let the total
cache size \( M(t) = B · m(t) \) where \( B \) is the cache block size [1].
Definition 1. Cache-adaptivity[1]
For any memory profile \( m \), the c-speed augmentation of \( m \) is defined as the memory profile
\( m'(t) = c · m(t) \).
For a problem \( P \), an algorithm \( A \) is optimal in the cache-adaptive model if there exists some \( c \)
and a large enough problem input size that, such that for any memory profile \( m \), the worst case
running time of \( A \) on a c-speed augmentation of m is better than the worst-case running time of
any other algorithm solving \( P \) on the memory profile \( m \).
The cache-adaptive algorithm is shown below.
[1] Bender, M. A., Ebrahimi, R., Fineman, J. T., Ghasemiesfeh, G., Johnson, R., and
McCauley, S. Cache-adaptive algorithms. In Proceedings of the Twenty-Fifth Annual ACMSIAM Symposium on Discrete Algorithms (USA, 2014), SODA ’14, Society for Industrial and
Applied Mathematics, p. 958–971.
Experiments
1. The class Block represents the block data
in both Memory and Cache.
It is a list of integers and each element
has 4-byte size. We utilize the object reference trick in which when we write to Block
in Cache, the
contents are also updated in Memory.
2. The class Memory plays as a memory to store data.
It has only 2 main functions. The first function is named
read_block_from_memory(), which is used to
return a Block from a requested address. The second
function is allocate() which is used to initialize the memory according to the matrix data we used for
multiplication. We store matrix in row-major order.
3. Our Cache class is the linchpin of the simulator.
It communicates with Memory through CPU
and transferring data using Block.
4. There are three main eviction policies we implemented: LRUPolicy (Least Recently Used),
LFUPolicy
(Least Frequently Used), and FIFOPolicy (First-In First-Out).
LRUPolicy evicts least recently
used Block. To do this, the algorithm maintains an OrderedDict which has Doubly Linked List as
underlying data structure. Everytime a block is accessed, it moves the index of block to the end. When
cache is full, it would remove the index of blocks at the beginning.
In LFUPolicy, the policy evicts
the one with least frequently used. We use a hashmap to maintain the frequencies of Block index. In
removing phase, we need to loop through hashmap to find the index with least frequencies and remove
it.
For FIFOPolicy, cache is evicted in order as they added. We maintain an OrderDicted to serve
this purpose.
5. CPU contains three important functions.
The first one is read() which is used to read a Block from
Cache.
If this Block is not inside cache,
it would be fetched from Memory and store inside
Cache.
The second function is write() which writes a byte to a Block
in Cache.
If no existing Block,
then the function requests Block from Memory,
update this Block, and write the Block
to Cache.
The final function is for the purpose of cache-adaptive: change_cache_size().
It updates the size of the cache every time a cache miss occurs according to a specific memory profile.
6. This class is responsible for changing cache's size every time a cache miss occurs as in [1]. The memory
profile is as followed:
$$
m(t) = \begin{cases}
\text{$m$} & \text{if $t \leq m^{5/2}$} \\
\text{$2m^{5/2} - t$} & \text{if $m^{5/2} < t < 2m^{5/2}-B$} \\
\text{$B$} & \text{if $t \geq 2m^{5/2} - B$}
\end{cases}
$$
where \(t\) is the \(t^{th} \) cache misses, \( m = c_1 B\), where \(c_1\) is any constant speed augmentation. Here we
choose \(c_1 = 1\).
For the results of cache-adaptive algorithms compared to other algorithms (e.g., naive, cache-aware, and cache-oblivious)
for matrix multiplication, please see the report.