================== UBM ================== What kind of paper is this? - Investigates topic that has been researched for decades. - Assumes lots of background and is therefore harder to understand - Reader should expect detailed evaluation Explain figure 1 - What are we seeing in Figure 1a? - Is this an interesting reference pattern? - What is IRG - In 1b, does graph just expand to the right with more blocks? Why? - In 1c, LRU extends less far to the right. Why? what is the problem? blocks don't fit. - how many blocks are referenced? ~5500 - what is the maximum number of blocks in cache? 1000 ==> memory size contains only ~20% of the working set - is the LRU distribution surprising for 1,000 blocks? no. at 25 the IRG is 25*50 = 5,000 block reference interval - how large does the cache have to be before LRU will does as well as opt? at least the size of the length of the longest loop period plus longest interval (~3000+~2000) - Would FRB/LRU-k/2Q/EELRU do better? yes, but each doesn't deal with sequential and/or loop references - given the data presented is this a serious problem to fix? not clear. perhaps the machine is just too overloaded. What is the proposed solution? UBM 1. Detection: classifies references (sequential, looping, and other) 2. Replacement: one scheme per reference class 3. Allocation: partitions cache memory between the 3 categories - Detection: - UBM keeps a on-line history of references (fileID, block ref) - on a new ref: - classify first references as other - if an earlier references references the previous block, mark this references and earlier as sequential if in sequential one, mark end when reference doesn't reference previous of sequential one. - if an earlier sequential reference is referenced, mark start of loop if in loop mode, mark end of loop if references isn't part of earlier sequential reference pattern - when marking end, add table entry (see figure 4) - Replacement - on block miss: - free memory available read block in - no free memory: - Sequential partition: MRU - Looping partition replace longest loop (sort of LRU) within in loop: use MRU - Other partition: LRU - Allocator: - What is marginal gain? expected number of extra hits/time by adding one more buffer - sequential: 0 - loop: 1/(p_{i_max}+1) - other: Why not calculate directly? Too expensive--need LRU for all queue sizes approximation of Belady Belady: hit(n) = 1 - c*n^{-k} Estimate n and k from two points (keep ghost buffers) - Hypothesis: 1. does it matter? 2. how accurate is the detection? 3. how good is replacement vs. OPT? 4. how good is the allocator vs. OPT? 5. are there simpler schemes possible that work well? 6. runtime overhead for UBM? - Environment: - apps with different patterns why does glimpse have loops? - trace-driven evaluation (what are the problems with traces?) - FreeBSD implementation - sum of the memory requirements of apps is much higher than available memory - a number of apps make single pass through lots of memory (mpeg). - 3 traces - Results Re 1. yes. Re 2. good. Re 3. close to OPT Re 4. don't know. but it seems effective Re 5. doesn't appear so (unless programmer specifies strategy) Re 6. 5% vs. LRU Explain figure 12 and table 2 Why does UBM do better (i.e. improve more) with cold cache? When would you actually have a "cold cache" as measured here? After rebooting or unmounting file system... Otherwise, cache will be cold buf full of garbage How might you improve UBM in cold case? - Predict that sequential references might turn into loops - Keep ghost buffers in SEQ pool to measure how useful blocks there would have been - How would you adjust model marginal gain is actually > 0 Problems - Unfair - Indexed data structures (B-tree)