CS 140 Winter 2004 Final Solutions

Problem 7

Question

In our discussion on file systems we noted that although most files are small, large files consume the majority of I/O operations and disk space. Describe the implications of these observations on file system design.

Solution

Many possible answers, some of the best listed here. To optimize for small files, use a small block size or support fragments, have low metadata overhead, and cluster related files together. To optimize for large files, try to arrange files contiguously (avoid fragmentation), implement read-ahead, and separate large file data (beyond some point) from small file data so small file clustering still works.

Full credit required mentioning 3 ways to optimize total. Of those at least one must apply to small files and one to large files.

Problem 8

Question

Assume that you have been given a fancy disk scheduling algorithm that actually contains a simulation of the disk you are using. The algorithm works by using the simulation to test all permutations of the requests in the disk queue to determine which request can be processed the fastest (i.e. it computes the optimal schedule). Assuming that this algorithm can be made to work quickly enough, what are the disadvantages compared to something like the SCAN algorithm used in existing systems?

Solution

The "optimal" algorithm could lead to starvation for requests far from the disk head if closer requests continued to arrive.

A common mistake was objecting to the amount of memory required for simulating the disk. There is no need to know what data is on the disk to simulate its behavior, and there is no need to store all the permutations of the requests in memory either.

Full credit for mentioning starvation. Most other answers were implausible and received at most 2-3 points.

Problem 9

Question

Logging is frequently used to enhance the security of computer systems.
(a) Describe two ways in which a log of events is helpful for security.
(b) Describe a way a log can actually harm security.

Solution

Many plausible answers, all of which were considered acceptable. Some students lost points because they confused logs used for security (e.g. the system log) with filesystem logs/journals.

Problem 10

Question

Explain why some file system will try not to allocate sequential file blocks to sequential disk block even when it knows that the file will be access sequentially most of the time.

Solution

Anticipated answer was that, for whatever reason, some systems cannot keep up with the full speed of the disk, so using adjacent sectors would only allow reading a single sector per rotation of the disk. Also accepted for full credit were two optimizations found in BSD FFS. First, FFS switches cylinder groups after a file grows beyond some fixed length, to allow small files to continue to cluster in the cylinder group preferred by the file's directory. Second, BSD FFS tries to place sequential file sectors in the same sector on different platters, because that too is fast.

Half credit was given for answers involving RAID, because RAID is not (normally) a filesystem feature. In UNIX parlance, it is a "block device" feature that belongs in the layer below the filesystem.

Problem 11

Question

Why is it bad to disable interrupts for long periods of time? Give an example of a problem it can cause.

Solution

The most common problem from disabling interrupts is increased latency, which slows interactive, network, and device responsiveness and can make real-time response impossible. Lost interrupts are not usually a problem, because hardware and OSes have to be designed to tolerate lost or combined interrupts anyhow. Note that only the kernel can disable interrupts, because enabling user programs to do so would allow them to monopolize the CPU. The kernel has to be trusted in any case not to go into infinite loops, to spin on a lock with interrupts off, and so on, so those are not good answers either.

Full credit required mentioning increased latency or something directly related to latency. Other answers lost at least 3 points.

Problem 12

Question

Describe tests for measuring (a) latency and (b) bandwidth of a file stem. Give a description of what the test would do and how you would rform the measurements.

Solution

(a) Latency is the time between making a request and receiving a response, in units of time. To measure filesystem latency, we can access many randomly selected files and average the time to read a small part of them.

(b) Bandwidth is the rate at which data is transferred, in units of data per time. To measure maximum filesystem bandwidth, we access a very large file sequentially and divide the file size by the time required. To avoid cache effects, the file must be much larger than the file cache.

cs140-win0405-staff@lists.stanford.edu