CS 140 Winter 2005 Midterm Solutions
For this exam, the mean was 119 and the standard deviation 15.
If you have not picked up your exam, it is available from Judy
Polenta in Gates 351.
If you're a SCPD student, then we'll send it back to you through
the SCPD (for both local and remote students).
If you believe that part of your exam was graded in error, please
request a regrade. You should preferably attend the office hours of
the person who graded the question (shown below), or as a second
choice email the staff
list. Arithmetic errors should go to the staff list or
the office hours of any of the staff.
|1, 2, 3, 4||Adam|
|5, 6, 7, 8||Sameer|
|9, 10, 11, 12||Nick|
|13, 14, 15||Mendel|
(8 points) Explain why the speed at which a CPU can raise exceptions and return from exceptions could factor into the overheads seen by a virtual machine monitor.
A VMM operates between what the OS thinks is the hardware, and the real
hardware. Any time the OS executes a privileged instruction, or tries to
access hardware, an exception is generated and the VMM takes over.
Exactly like the OS takes over when a user process tries those things.
Since this will be a fairly common occurrence, the time it takes to
generate and return from exceptions will have a large influence on the
overhead of a VMM.
(12 points) Assume you are given an operating system environment with a workload running on it, a list of physical addresses, and a virtual machine monitor capable of running the operating system environment. Explain how you could modify the virtual machine monitor to record the program counter addresses that wrote into any of the physical addresses on the list.
Have the VMM mark any pages that contain addresses on the list in its
page tables read-only. Any time a VM tries to write to those addresses,
an exception will be generated, and the address could be checked against
the list and recorded if it is on the list, then the instruction can
proceed. A wide variety of methods will work, but all are variations on
that theme. Trapping _all_ memory access or similar solutions were only
partially acceptable, since they would be hundreds to thousands of times
(10 points) If you wanted to implement a spoiler/denial of service attack against a machine over a local area network, would you use UDP or TCP? Justify your answer.
Two answers were acceptable. UDP can be used to flood and overwhelm
the machine, and its local routers and network. TCP can be used by
creating many many connection requests, overwhelming the network stack
with connections to keep track of track, blocking any additional
(10 points) Would going from a stop-and-go protocol to one that uses a sliding window likely increase or decrease the number of acknowledgement packets sent for a network session consisting of sending a large file. Justify your answer.
Decrease. In sliding window one ACK packet can acknowledge more then one
(8 points) Explain how a machine determines the destination Ethernet address based on the destination IP address.
An ARP request is broadcast over the local subnet of the destination host. The machine with the destination IP responds with an ARP reply containing its MAC address. The IP -> MAC mapping can be stored in an ARP cache (with some timeout value for each entry) to make future ARP lookups faster.
If you said "a table is used" to determine the MAC address for a given IP address, you got very little credit. The point of the question was how this table is constructed in the first place.
(10 points) Explain why the end-to-end argument might suggest that adding useful security features such as encryption to the link-level layer might not be such a good idea.
Multiple answers were accepted for this. One popular one was that encryption at the link level would be redundant since applications that need to encrypt already encrypt at the application later. Another popular one was that there will be overhead at each hop to encrypt / decrypt packets (this may also caused intermediate routers to know the encyption / decryption key).
Full credit was given if two or more reasons were given. Most of the credit was given if only one reason was given.
(10 points) (a) Would a BSD file system ever put two files created in the same directory in two different cylinder groups? (b) Would a BSD file system ever put files created in different directories in the same cylinder group? Justify your answer. If possible, indicate what would the user of the file system have to do to make it happen.
(a) Yes. If the space used by all the file(s) in a directory was greater than or equal to the space in a cylinder group. Any subseqeunt files created will go into a different cynlinder group.
(b) Yes. If the user creates more directories than cylinder groups.
For Part (b), many students said that in the same case as Part (a), when we run out of space on the cynlinder group, then files from two directories may end up in the same group. Full credit was not given to this answer, because although this is possible, it is not certain. The answer above guarantees this case even when the cylinder groups are significantly less than full.
(10 points) Your Pintos partner decided to add write ahead logging to your file system. He asked you if the log should be in the beginning, middle, or end of the disk. What do you think? Justify your answer.
There was no right or wrong answer for this question. We were mainly looking for the justification that you gave in support of your answer and how coherent it was. A popular answer was that the middle of the disk is best because it will minimize the seek time required to do the logging.
(10 points) A recently published paper reports an attack on the popular message digest algorithm SHA1 that some people believe may lead to SHA1 being broken. What does it mean for a message digest algorithm to be broken? What bad things can happen?
A message digest is a hashing function that is useful for authentication and message integrity. Breaking a message digest (such as SHA1) means that it is (relatively) easy to find collisions, which are two messages that create the same hashed output. When this happens someone could change the contents of the message and have the output hash to remain unchanged.
This would be bad if a cracker was mirroring a Linux distribution that uses SHA1 to verify the integrity of the file. This individual could modify the dist to include a Trojan horse / virus / spyware / whatever, but the dist would appear intact since SHA1 hashes the same as the original, unmodified copy.
(12 points) Assume that you are given a file system that used capabilities to protect access to files. Describe an algorithm you could use to convert it into using access control lists. The access control lists should give the users the same access to files as with capabilities.
A capabilities list is a list of users, with each user containing a list of objects and operations allowed. If the user possesses a capability to perform an action, then it is allowed.
ACLs are organized by object. Each object in the system is listed, with the users allowed to access it and the operations allowed.
To convert between capabilities lists and access control lists -
- For each user u
- For each object o and permission p
- If object o does not exist in the ACL, insert object o into the ACL.
- Insert (u, p) in the ACL under object o.
(10 points) Assume that you know that exactly one bit of a file system’s free block bitmap is incorrect but you do not know which bit that is. Describe an algorithm for computing which bit is incorrect.
This is accomplished by building a second free block bitmap and comparing that against the original bitmap.
- Create a new bitmap initialized to 1 (free).
- Starting from the root, you walk through the file system recursively examining each file and directory.
- For each block allocated, mark the appropriate bit in the new bitmap to 0 (allocated).
- When this is complete, compare each bit in the original with the corresponding bit in the new bitmap. The bit that is different is the one you are looking for.
(10 points) (a) What units would you use if you were asked to measure the latency of a file system? (b) How about the units for a measure of bandwidth?
To receive full credit on this question, the units needed to be stated accompanying an explanation. As long as the units made sense, credit was given.
Latency: Unit of time (seconds, milliseconds, etc.). Latency is a measure of how much time it takes for a request from the time issued to completion.
Bandwidth: Unit of file space vs. unit of time (bytes/sec, MB/sec, GB/sec, etc.). Bandwidth is the amount of data that can pass through a connection given a period of time.
(10 points) Explain why in a Unix file system deleting a hard link to a file would require updating the file’s inode while deleting a soft link would not.
Unix’s hard links are reference counted in the file’s inode so deleting a hard link requires decrementing the link count in the inode. Soft links are simply separate “files” with a pathname and are not reference counted in the file’s inode so deleting the soft link does not imply a change to the inode.
(10 points) Would an extent based file system or a linked-file file system allow you to use a greater fraction of the capacity of a disk for a workload of small files? (b) How about big files? Justify your answer.
There are two effects in determining the fraction of the disk a file system can use for file data. The first and frequently dominant effect is external fragmentation (you can control internal fragmentation by setting the block size smaller if needed by the workload). For both the small and large files external fragmentation is likely to harm the extend file system giving the linked file system an advantage. A second effect is the overhead of the metadata. Since an extent file system uses only two pointers to describe a file, it can have less overhead than a linked file system with a link pointer per block and one at the end. For normal files system (i.e. not special-cases with files that don’t grow, etc.) the fragmentation issue dominates this metadata overhead.
In summary, the linked file system would probably win on usable capacity for both most “normal” workloads (a) and (b).
(10 points) For each of the following attributes of a disk, give the ordering of them (e.g. X <= Y < Z < Q):
# of heads
# of cylinders
# of sectors
# of platters
# of tracks per platter
# of bytes of capacity
# platters <= # of heads // Need at least one head per platter, normally 2 these days
# of cylinders == # of tracks per platter // same thing by definition
# sectors < # of bytes of capacity // Sector size greater than one
There are typically many sectors per cylinder and the number of cylinders is normally much greater than the number of disk heads. So:
# platters <= # heads < # cylinders == # tracks per platter < # sectors < # bytes