08 Memory
Principle of Locality¶
Programs access a small proportion of its address space at any time. This exhibits locality.
| Spatial locality | Temporal locality | |
|---|---|---|
| Meaning | Load clusters of related data into cache | Recently-used data |
| Example | - Multiple elements in a loop - Sequential instruction access | - Global variables from global region of memory - Instructions of a loop |
Data can only move between 2 adjacent levels
Register is highest level; Tertiary memory is lowest memory
flowchart LR
CPU <-->
Register <-->
L1 <-->
L2 <-->
L3 <-->
PM[Primary<br />Memory] <-->
SM[Secondary<br />Memory] <-->
TM[Tertiary<br />Memory]
subgraph Cache
L1
L2
L3
end Memory Performance¶
| Block (Cache Line) | Unit of copying Consists multiple words |
| Hit | Event of data requested by processor being present in upper level |
| Miss | Not a hit |
| Hit Rate/Ratio | Fraction of memory found in upper level \(\frac{\text{Hits}}{\text{Access}}\) |
| Miss Rate/Ratio | \(1 - \text{Hit Ratio}\) |
| Hit Time | Time to access the upper level + time to determine if access is hit or miss |
| Miss Penalty | Time to replace a block in upper level with the corresponding block from lover level |
| AMAT | Average Memory Access Time \(T_\text{avg} = T_\text{cache} + (m*\text{Miss Penalty}) + (m*T_\text{memory})\) |
Cache Memory¶
L1 cache can be located on CPU Chip
Cache Operation¶
flowchart LR
start((Start)) -->
1[Retries RA from CPU] -->
2[Is block containing RA in cache] -->
something Types of Access¶
- Regular access
- Irregular access
Direct Mapped Cache¶
Cache Line Table¶
Let \(s\) be address bus size (number of bits to represent main memory address)
| Cache Line | Main memory blocks held |
|---|---|
| \(0\) | \(0, m, 2m, \dots, 2s-m\) |
| \(1\) | \(1, m+1, 2m+1, \dots, 2s-m+1\) |
| \(\dots\) | |
| \(m-1\) | \(m-1, \dots,\) |
Memory Addressing¶
| Tag | Line number/Index | Line/Byte offset | |
|---|---|---|---|
| Purpose | Distinguish block from other blocks that can fit into a cache line | Specify one of the \(m=2r\) blocks of main memory | Identify unique word/byte within line of cache memory |
| Bits | Most significant \(s-r-w\) bits | Next most significant \(r\) bits | Least significant \(w\) bits |
No of cache blocks \(= 2^something\)
Tags & Valid Bits¶
Initially valid bit = 0, when you boot up the computer
Address Subdivision Diagram¶
Take from slides
Reading from Cache¶
- Search the index
- Check validity
- If valid, check the tag
- If equal tag, mark as hit
- If invalid, mark as miss
Fully-Associative Cache¶
Writing to Cache¶
✅ No calculation happens, so it is efficient and faster
Reading from Cache¶
❌ Comparison of all bits happens, so it is inefficient and slower
- Check validity for all valid lines
- If valid, check the tag
- If invalid, add new element in first-come-first-serve order
Set Associative Cache¶
Combination of Direct Mapped Cache and Fully-Associative Cache
Particular block address is mapped to a particular set
'\(n\)-way associated cache' means each set has \(n\) blocks, where \(n \in \{2, 4, 8, \dots \}\)
No of bits for set index = \(2^N\)
Note
- Direct Mapped Cache is basically one-way associative caching
- Fully-Associative Cache is basically associative caching with no of sets = total no of blocks
Write¶
Write-Through¶
Write-Back¶
Dirty block something
Write-Buffer¶
If the same data is accessed again, there is overhead of checking queue also, just in case
LRU Replacement¶
Least Recently-Used
We have to implement a data structure to keep track
- doubly linked list
- hash map