UBM === What kind of paper is this? - Investigates topic that has been researched for decades. - Assumes lots of background and is therefore harder to understand - Reader should expect detailed evaluation Explain figure 1 - What are we seeing in Figure 1a? - Is this an interesting reference pattern? - What is IRG - In 1b, does graph just expand to the right with more blocks? Almost Why? Larger IRG means block needed farther in future But depending on access patterns, larger IGR blocks may be cacheable E.g., first reference only low-IRG blocks, then only high-IRG See Fig 1b, 200 cache blocks vs. 400 - In 1c, LRU extends less far to the right. Why? what is the problem? blocks don't fit. - how many blocks are referenced? ~5500 - what is the maximum number of blocks in cache? 1000 ==> memory size contains only ~20% of the working set - is the LRU distribution surprising for 1,000 blocks? no. at 25 the IRG is 25*50 = 5,000 block reference interval - how large does the cache have to be before LRU will do as well as opt? Would need to hold longest loop in memory In Fig 1a, looks like 3000 blocks + 2000 reference interval - Would FRB/LRU-k/2Q/EELRU do better? yes, but each doesn't deal with sequential and/or loop references - given the data presented is this a serious problem to fix? not clear. perhaps the machine is just too overloaded. What is the proposed solution? UBM 1. Detection: classifies references (sequential, looping, and other) 2. Replacement: one scheme per reference class 3. Allocation: partitions cache memory between the 3 categories - Detection: - UBM keeps an on-line history of references (fileID, block ref) - on a new ref: - classify first references as other - if an earlier references references the previous block, mark this references and earlier as sequential - if in sequential mode, mark end when reference isn't for next block - if an earlier sequential reference is referenced, mark start of loop if in loop mode, mark end of loop if references isn't part of earlier sequential reference pattern - when marking end, add table entry (see figure 4) - Replacement - on block miss: - if free memory available read block in - if no free memory: - Sequential partition: MRU - Looping partition replace longest loop (sort of LRU) within in loop: use MRU - Other partition: LRU - Allocator: - What is marginal gain? expected number of extra hits/time by adding one more buffer - sequential: 0 - loop: 1/(p_{i_max}+1) - other: Why not calculate directly? Too expensive--need LRU for all queue sizes approximation of Belady Belady: hit(n) = 1 - c*n^{-k} Estimate n and k from two points (keep ghost buffers) - Hypothesis: 1. does it matter? 2. how accurate is the detection? 3. how good is replacement vs. OPT? 4. how good is the allocator vs. OPT? 5. are there simpler schemes possible that work well? 6. runtime overhead for UBM? - Environment: - apps with different patterns why does glimpse have loops? - trace-driven evaluation (what are the problems with traces?) - FreeBSD implementation - sum of the memory requirements of apps is much higher than available memory - a number of apps make single pass through lots of memory (mpeg). - 3 traces - Results Re 1. yes. Re 2. good. Re 3. close to OPT Re 4. don't know. but it seems effective Re 5. doesn't appear so (unless programmer specifies strategy) Re 6. 5% vs. LRU Explain figure 12 and table 2 Why does UBM do better (i.e. improve more) with cold cache? When would you actually have a "cold cache" as measured here? After rebooting, unmounting file system, maybe deleting a file... Otherwise, cache will be cold but full of garbage How might you improve UBM in cold case? - Predict that sequential references might turn into loops - Keep ghost buffers in SEQ pool to measure how useful blocks there would have been - How would you adjust model marginal gain is actually > 0 Problems - Unfair - Indexed data structures (B-tree)