1 (a) Yes it is possible to have more data written to the log than to the
data portion. Consider a scenario where you have a disk cache.
Non-meta data changes are coalesced in the disk cache. Changes to
meta-data, however, would need to be written to the log. It could
happen that a file is created and various meta-data changing
operations are performed. The file is written into, but it remains in
the buffer cache. The user then deletes the file, which means one
more instruction to add to the log. In the meantime, however, none of
the data in the file ever made it out to disk.
1 (b) Most people realized an instance where this statement was true. On
disk recovery, the log will be read in order to establish the
consistency of the data. While this is going on, there is little
traffic to the rest of the disk.
2. Hard link - Specifies the inode of the file it is linked too. Therefore,
needs to understand only one set of inodes.
Soft Link - A file created that contains the path of the specified file.
Hard links only know about the underlying organization of the filesystem,
ex. inodes. Several problems may occur. Multiple files with the same
inode# may appear in different filesystems causing confusion as to
which one was wanted. Slightly different inode organizations or
policies would also cause problems as one filesystem does not know how
to interact correctly with the other filesystem. Soft links, however,
know nothing about the underlying organization. Everything is just a
path to a filet. Thus, another filesystem is just another part of a
possible path.
3 (a) Most of you understood this. In general, spatial locality allows a
mechanism called prefetching to proactively fetch the next page before
you need. Doubling the page size effectively performs prefetching
without actually have to have a special daemon who issues requests for
pages to be prefetched. If the data being used spans multiple pages
and is accessed sequentially, double pages will cut the number of page
faults in half.
3 (b) The idea here is that you have a group of pages that are all used
frequently. However, only portions of the data on those pages are
used. Hypothetically, only the first half of each page is used. If
the number of "original" pages in you working set exceeds the number
of physical pages, you will have a bunch of pages that need to be
swapped out for memory capacity issues. If instead, you made each
page half its original size, you may be able to fit your working set
into physical memory. The non-used portions of the original pages
will not be paged in, allowing you to increase the number of pages in
memory at a time and therefore decrease the number of page faults.
4. There are cases when LRU and FIFO act the same way. However, FIFO does
not automatically approximate LRU. Suppose you have four page frames
and 6 pages.
For the access pattern A B C D E F, FIFO and LRU do the same thing.
However, for the access pattern ABACADAEAF, FIFO would kick out A to
bring in E while LRU would kick out B to bring in E.
Hence, the two techniques are not synonymous. The first touched item is
not necessarily the item touched longest ago.
5. The general answer to this question is that if you employ a garbage
collector, it will take care of your memory management by creating
whole chunks of free space by moving live data into contiguous regions
of data. Consequently, first fit and best fit have had their job of
finding good locations for data stolen by the garbage collector.
Having said that, a number of you talked about the expense of garbage
collection and how this would affect how frequently garbage collection
was performed. Because of this, you assumed the two fit algorithms
could be used on the space given by the GC in order to reduce the
frequency of garbage collection. I gave this some credit.
6. This statement is true by definition.
Internal fragmentation occurs when an allocated resource has unused
portions that nobody else can use. These unused portions are
allocated and therefore are not on the free list.
The free list consists of non-allocated items. External fragmentation is
defined by the existence of available items that cannot be used.
Frequently, they cannot be used because they're too small and can't be
coalesced together to create large blocks needed for contiguous
allocation.
7. The first part of garbage collection is to discover which variables are
still live and which are dead. Items are dead if nobody can or will
access them.
Mark and Sweep locates all live data by starting at a root
(or several roots) and following that object's links and marking each
object it reaches. If objects cannot be reached via this traversal,
they can be deleted. Consequently, if a circular structure exists that
cannot be reached during this sweep of the data, it will be freed.
Reference Counting takes an alternate approach to determining when data is
live or data. Each object contains a count of the number of other
objects that point to it. If no objects point to it (its reference
count is zero), it is dead and can be deleted. The problem comes when
you have a structure that only has self-enclosed links. In this case,
each object in the structure points to at least one other object.
Consequently, all objects' reference counts are larger than zero and
none of them are considered dead and, hence, won't be freed.