File systems


A design issue is the semantics of a file system operation that requires multiple disk writes. In particular, what happens if the logical update requires writing multiple disks blocks and the power fails during the update? For example, to create a new file, requires allocating an inode (which requires updating the list of free inodes on disk), writing a directory entry to record the allocated i-node under the name of the new file (which may require allocating a new block and updating the directory inode). If the power fails during the operation, the list of free inodes and blocks may be inconsistent with the blocks and inodes in use. Again this is up to implementation of the file system to keep on disk data structures consistent:

Another design issue is the semantics are of concurrent writes to the same data item. What is the order of two updates that happen at the same time? For example, two processes open the same file and write to it. Modern Unix operating systems allow the application to lock a file to get exclusive access. If file locking is not used and if the file descriptor is shared, then the bytes of the two writes will get into the file in some order (this happens often for log files). If the file descriptor is not shared, the end result is not defined. For example, one write may overwrite the other one (e.g., if they are writing to the same part of the file.) Lots of other examples with directories, names, etc.

An implementation issue is performance, because writing to magnetic disk is relatively expensive compared to computing. Three primary ways to improve performance are: careful file system layout and data structure design that induces few seeks (locality, btrees, logging, etc), an in-memory cache of frequently-accessed blocks (or even prefetching), and overlap I/O with computation so that file operations don't have to wait until their completion and so that that the disk driver has more data to write, which allows disk scheduling. (We will talk about performance in detail later.)

xv6 code examples

xv6 implements a minimal Unix file system interface. xv6 doesn't pay attention to file system layout. It overlaps computation and I/O, but doesn't do any disk scheduling. Its cache is write-through, which simplifies keeping on disk data structures consistent, but is bad for performance.

What is xv6's disk layout? Who determines how many inodes, blocks, etc. (mkfs.c). How does xv6 keep track of free blocks and inodes? See balloc()/bfree() and ialloc()/ifree(). Is this layout a good one for performance? What are other options?

Free blocks are tracked using bitmap. Free inodes are tracked by linearly scanning the inode array to find one that is free. Other options (bitmap for inodes, better locality among inodes and blocks)

On disk, files are represented by an inode (struct dinode in fs.h), and blocks. Small files have up to 12 block addresses in their inode; large files use the last address in the inode as a disk address for a block with 128 disk addresses (512/4). The size of a file is thus limited to 12 * 512 + 128*512 bytes. What would you change to support larger files? (Ans: e.g., double indirect blocks.)

Directories are files with a bit of structure to them. The file contains of records of the type struct dirent. The entry contains the name for a file (or directory) and its corresponding inode number.

Directory contents: A directory is much like a file, but user can't directly write or read from it. It can only do so using system calls like readdir(). Also, the kernel reads the contents to do path resolution (e.g., converting "x/y" to inode number). The content of a "directory file" is an array of dirents (notice that the contents are structured this time).

dirent is free if inum is zero
How many files can appear in a directory? (max file size / sizeof(struct dirent))

xv6's on-disk inode (64 bytes):

Each i-node has an i-number. To turn i-number into inode's block, use 2 + inum/IPB. (IPB = inodes per block). Blocksize=512 inodesize=64, IPB=8

Can view FS as an on-disk data structure. Draw tree: dirs, inodes, blocks. There are two allocation pools: inodes and blocks.

Q: how does xv6 create a file?:

$ echo > a
log_write 4 ialloc (44, from create 54)
# allocating an inode for a

log_write 4 iupdate (44, from create 54)
# updating the inode of a

log_write 29 writei (47, from dirlink 48, from create 54)
# adding entry for a in the contents to the current working directory

log_write 2 iupdate
# updating the inode of the current working directory.