Winter 2007 Final Solutions

About This Final

Problem 1

(a)
Having the same physical page in multiple page tables is commonly used to share memory between processes. For example, running the same program twice on a modern OS will normally have the program's code pages shared using this mechanism. This condition is not obviously a bug.
(b)
Although not as obvious as question 1(a), having the same physical page at multiple different virtual addresses can be generated on a modern operating system. One easy way is to simply memory map the same file twice. Some students noted that Pintos (as well as Linux and Windows) actually maps am application's pages both in the user's part of the virtual address space and in the kernel's part. Technically these are part of the same page table so would satisfy this condition. This condition is not obviously a bug.
(c)
Because of ways modern operating systems emulate things such as reference and dirty bits, it is not uncommon to have a permission mismatch such as this. Clearly a write access to the page would generate a page fault exception. Provided that the OS handles the exception by making the page writable and without signaling an error to the application, this would work fine. Some students noted that the Linux/Unix copy-on- write optimizations to the fork() system call do this. This condition is not obviously a bug.
(d)
Many modern operating systems that implement memory mapped files use the same pages that hold the file in the file system buffer cache to map the file into the application. Doing this means that there will be coherency between the read/write and map interface to a file. This condition is not obviously a bug.

Problem 2

(a)
n=1 means that all blocks are the same size. We know that in order to have fragmentation we need different size blocks. Hence there is a simple algorithm will give no fragmentation.
(b)
Of course it is possible to have a memory allocator have no fragmentation given a particularly favorable allocation pattern. Even if the blocks are of different sizes, given nice block lifetimes like first-allocated-first-freed or never free anything would result in no fragmentation.

Problem 3

(a)
An atomic disk could be useful for crash recovery. The whole problem with crash recovery was a crash could leave the file system in an inconsistent state. With atomic group writes the file system could be modified to always keep the file system in a consistent state and hence never have to do recovery after a crash. For example, when creating a file you could do a group write to update the inode, directory, and freelist at the same time.
(b)
Write-ahead logging gives two benefits: faster crash recovery and better performance. The atomic group write would eliminate the need for crash recovery but wouldn't get the performance benefits of logging. It's clearly not pointless.
(c)
Given multiple requests it is possible to schedule them to improve performance. Clearly the group write could be optimized by the disk drive manufactures to do something like short-seek-time-first scheduling to reduce disk seek times. This optimization would be not possible if the requests were given one at a time.

Problem 4

(a)
c (extent-based) -- b (linked) -- a (indexed)
It is easy to see that (c) is the best structure for sequential access to very large files, since in (c) files are contiguously allocated and the next block to read is physically the next on the disk. No seek time to find the next block, and each block will be read sequentially as the disk head moves.
Both (b) and (a) require some looking up operation in order to know where the next block is. However, (b) performances are worse, since for "very large files" it may require multiple disk accesses to the indirect blocks.
(b)
c (extent-based) -- a (indexed) -- b (linked)
(c) is still the best structure here: just need to use an offset.
(a) will probably need to look at some levels of indirect blocks in order to find the right block to access (we are dealing with very large files).
(b) is absolutely the worst structure. In fact, in order to find a random block, we will need to traverse a linked list of blocks, which will take a time linear in the offset size.
(c)
b (linked) -- a (indexed) -- c (extent-based)
(c) can suffer heavily of external fragmentation, especially for large files. So it is not the best stucture for getting the most bytes on the disk, since lots of space will be wasted. However, for small files and large block size, (c) might prove to be better than (a) and (b).
(a) and (b) stuctures are generally more suitable for this question. The metadata overhead for (b) is likely to be smaller than the one for (a), since it only needs pointers to the next allocated block rather than an entire block (or blocks) which may or may not be totally used.

Problem 5

We first look up the inumber associated to the file handle, and then examine the ".." directory entry inumber. If they are equal, then the name of the directory is the root. Otherwise, we go up one level (chdir ..) and iterate through the entries in the parent directory. Once we find a match, we return the name associated with that entry. It is worth noticing that the file name is stored in the directory entry, not in the inode: otherwise operations like hard links (files with different names but same inode) would be impossible. Some students returned the whole directory path: starting from the root, we can do a search (e.g. breadth first) of the subdirectories, memorize the traversed directories, and stop when a match is found (or, vice versa, open the given file handle, and move up till we find the root). These solutions were accepted, even though it is more than we wanted.

Problem 6

The idea here is the use of a parity disk. We were also looking for a description of the XOR operation that produces it, and how this allows a complete recovery of a single disk failure (basically, XOR on the remaining disks and the parity disk).

Problem 7

The LRU (least recently used) algorithm tries to approximate future behavior by using past behavior, based on the assumption of temporal locality. In order to choose an element for eviction, the optimal solution is to discard the one which will be used the least amount in the future. Because we can't know the future, we approximate it by choosing the element that was least recently used in the past.

5 points were given for describing what LRU is/means. The other 5 points were given for mentioning the LRU assumption that past behavior is similar to future behavior. In-depth descriptions of how LRU might be implemented did not matter.

Problem 8

(a) Capability. The key is a capability that each user has to access the door, and the door has no idea who is opening it.
(b) Capability. This is the same as part (a), but now the capability/key is the knowledge of the combination. Some people succesfully argued that this was more similar to ACL if you assume that the each user had their own unique combination for the door representing their identity, and the door decided who to let in based on the user's specific combination.
(c) ACL. Normally, a badge identifies the person, and the door security system maintains a list of people allowed through the door, so the resource is maintaining a list of (user, privelege) pairs. Some people instead assumed that each door had a corresponding badge, and everyone who had access to a particular door all had identical badges for that door, just like having a key in part (a). In this case, this would resemble a capability.
(d) ACL. This is just like part (c). The biometric device uniquely identifies the person, just like the badge. The door security system maintains a list of (user, privelege) pairs.

Problem 9

By using the message digest as the name, we have an easy way of checking if the returned object is the correct object. Message digests are large integers generated by hashing an object's contents, and, assuming perfect hashing, are unique to that object. If we have a message_digest() function that generated the name/message digest of the object, we can check if the name is equal to message_digest(query(name)), in which case the object returned by query() was the correct object. Some people lost points by making strong assumptions about how the untrusted storage system worked, or got into implementation specifics, which didn't matter for this question.

Problem 10

You must choose b. With type a cards, once there is a collision, the colliding nodes will be in a deadlock and keep trying to retransmit at the same time and so will not be able to transmit. With part b type cards, that creates more load for every machine since they recevie all the packets. This is promiscuos mode. If we have to choose cards for our network, we can take atmost 1 card of type a and remaining of type b. Solutions that said use all cards of type b were also given full credit.

Problem 11

a) IP header has a field Time to Live (TTL). This number is decremented every hop and a packet is discarded once the TTL goes to 0. Hence a packet cannot loop forever.
b) IP supports fragmentation and reassembly. If packet size is greater than the MTU of a network, split the packet into fragments. Reassebmly is done using the byte offset field in the IP header.
c) IP supports hierarchial addresses. Every IP address has a network ID and a host ID. Routers only need to know how to forward packets to other networks as opposed to all the billions of hosts.

Problem 12

a) Since IP is best effort, it could be possible that a packet is duplicated on the network.
b) This means that receiver is using a sliding window of size 4 and so sends an ack only for every 4 packets recevied.