Fall 2003 Final Solutions

1 (a) Yes it is possible to have more data written to the log than to the data portion. Consider a scenario where you have a disk cache. Non-meta data changes are coalesced in the disk cache. Changes to meta-data, however, would need to be written to the log. It could happen that a file is created and various meta-data changing operations are performed. The file is written into, but it remains in the buffer cache. The user then deletes the file, which means one more instruction to add to the log. In the meantime, however, none of the data in the file ever made it out to disk.

1 (b) Most people realized an instance where this statement was true. On disk recovery, the log will be read in order to establish the consistency of the data. While this is going on, there is little traffic to the rest of the disk.

2. Hard link - Specifies the inode of the file it is linked too. Therefore, needs to understand only one set of inodes.
Soft Link - A file created that contains the path of the specified file.

Hard links only know about the underlying organization of the filesystem, ex. inodes. Several problems may occur. Multiple files with the same inode# may appear in different filesystems causing confusion as to which one was wanted. Slightly different inode organizations or policies would also cause problems as one filesystem does not know how to interact correctly with the other filesystem. Soft links, however, know nothing about the underlying organization. Everything is just a path to a filet. Thus, another filesystem is just another part of a possible path.

3 (a) Most of you understood this. In general, spatial locality allows a mechanism called prefetching to proactively fetch the next page before you need. Doubling the page size effectively performs prefetching without actually have to have a special daemon who issues requests for pages to be prefetched. If the data being used spans multiple pages and is accessed sequentially, double pages will cut the number of page faults in half.

3 (b) The idea here is that you have a group of pages that are all used frequently. However, only portions of the data on those pages are used. Hypothetically, only the first half of each page is used. If the number of "original" pages in you working set exceeds the number of physical pages, you will have a bunch of pages that need to be swapped out for memory capacity issues. If instead, you made each page half its original size, you may be able to fit your working set into physical memory. The non-used portions of the original pages will not be paged in, allowing you to increase the number of pages in memory at a time and therefore decrease the number of page faults.

4. There are cases when LRU and FIFO act the same way. However, FIFO does not automatically approximate LRU. Suppose you have four page frames and 6 pages.

For the access pattern A B C D E F, FIFO and LRU do the same thing. However, for the access pattern ABACADAEAF, FIFO would kick out A to bring in E while LRU would kick out B to bring in E.

Hence, the two techniques are not synonymous. The first touched item is not necessarily the item touched longest ago.

5. The general answer to this question is that if you employ a garbage collector, it will take care of your memory management by creating whole chunks of free space by moving live data into contiguous regions of data. Consequently, first fit and best fit have had their job of finding good locations for data stolen by the garbage collector.

Having said that, a number of you talked about the expense of garbage collection and how this would affect how frequently garbage collection was performed. Because of this, you assumed the two fit algorithms could be used on the space given by the GC in order to reduce the frequency of garbage collection. I gave this some credit.

6. This statement is true by definition.

Internal fragmentation occurs when an allocated resource has unused portions that nobody else can use. These unused portions are allocated and therefore are not on the free list.

The free list consists of non-allocated items. External fragmentation is defined by the existence of available items that cannot be used. Frequently, they cannot be used because they're too small and can't be coalesced together to create large blocks needed for contiguous allocation.

7. The first part of garbage collection is to discover which variables are still live and which are dead. Items are dead if nobody can or will access them.

Mark and Sweep locates all live data by starting at a root (or several roots) and following that object's links and marking each object it reaches. If objects cannot be reached via this traversal, they can be deleted. Consequently, if a circular structure exists that cannot be reached during this sweep of the data, it will be freed.

Reference Counting takes an alternate approach to determining when data is live or data. Each object contains a count of the number of other objects that point to it. If no objects point to it (its reference count is zero), it is dead and can be deleted. The problem comes when you have a structure that only has self-enclosed links. In this case, each object in the structure points to at least one other object. Consequently, all objects' reference counts are larger than zero and none of them are considered dead and, hence, won't be freed.