Storage Devices

Disk Scheduling

UNIX Filesystem Interface

We will focus on magnetic disks, given their popularity. The interface design and implementation heavily depends on the characteristics of the underlying storage device. Most OSes today, have been optimized for magnetic disks.

The API for a minimal file system consists of: open, read, write, seek, close, and stat. Dup duplicates a file descriptor. For example:

  fd = open("x/y", O_RDWR);
  read (fd, buf, 100);
  write (fd, buf, 512);
  close (fd)

Notice that this interface assumes that the file is a stream of bytes. Alternatively, the data in a file could have been organized in a structured way. The structured variant is often called a database. Any particular structure is likely to be useful to only a small class of applications, and other applications will have to work hard to fit their data into one of the pre-defined structures. Besides, if you want structure, you can easily write a user-mode library program that imposes that format on any file (with performance penalties however because of layering of different abstractions one above another). (Databases have special requirements and support an important class of applications, and thus have a specialized plan.)

To allow users to remember where they stored a file, they can assign a symbolic name to a file, which appears in a directory. There's a couple of different things going on here.

Maintaining the file offset behind the read/write interface is an interesting design decision. The alternative is that the state of a read operation should be maintained by the process doing the reading (i.e., that the pointer should be passed as an argument to read). This argument is compelling in view of the UNIX fork() semantics, which clones a process which shares the file descriptors of its parent.

With offsets in the file descriptor, a read by the parent of a shared file descriptor (e.g., stdin) changes the read pointer seen by the child. This isn't always desirable: for example, consider a data file, in which the program seeks around and reads various records. If we fork(), the child and parent might interfere. On the other hand, the alternative (no offset in FD) would make it difficult to get "(echo one; echo two) > x" right. Easy to implement separate-offsets if kernel provides shared-offsets (re-open file, mostly), but not the other way around.

The file API turned out to be quite a useful abstraction. Unix uses it for many things that aren't just files on local disk, e.g. pipes, devices in /dev, network storage, etc. Plan9 took this further, and a few of those ideas came back to Linux, like the /proc filesystem.

Unix API doesn't specify that the effects of write are immediately on the disk before a write returns. It is up to the implementation of the file system within certain bounds. Choices include (that aren't non-exclusive):

Some examples of file systems: