A Filesystems reading list

来源：互联网发布：网络著名作家有哪些编辑：程序博客网时间：2024/04/29 05:20

KHB: A Filesystems reading list

August 21, 2006

This article was contributed by Valerie Henson

We've all been there - you're wandering around a party at some Linuxevent clutching your drink and looking for someone to talk to, buteveryone is having some obscure technical conversation full ofunfamiliar jargon. Then, as you slide past a cluster ofimportant-looking people, you overhear the word "superblock" andthink, "Superblock, that's a file system thing... I read about filesystems in operating systems class once." Gratefully, you join theconversation, only to discover while you know some of the terms -cylinder group, indirect block, inode - you're still unable to come upwith stunning ripostes like, "Aha, but that's really just anotherversion of soft updates, and it doesn't solve the nlinks problem."(Admiring silence ensues.) Now what? You want to be able to makewitty remarks about the pros and cons of journaling while throwingback the last of your martini, but you don't know where to start.

Fortunately, you can get a decent grasp of modern file systems withoutreading a whole book on file systems. (I haven't yet read a book onfile systems I would recommend, anyway.) After reading these filesystems papers (or at least their abstracts), you'll be able to atleast fake a working knowledge of file systems - as long as everyoneis drinking and it's too loud to hear anyone clearly. Enjoy!

The Basics

These papers are oldies but goodies. While the systems they describeare fairly obsolete and have been heavily improved since these initialdescriptions, they make a good introduction to file systems structureand terminology.

A Fast FileSystem for UNIXby Marshall Kirk McKusick, William Joy, SamuelLeffler and Robert Fabry. This paper describes the first version ofthe original UNIX file system that was suitable for production use.It became known as FFS (Fast File System) or UFS (UNIX File System).The "fast" part of the name comes from the fact that the original UNIXfile system maxed out at about 5% of disk bandwidth, whereas the firstiteration of FFS could use about 50% - a huge improvement. This paperis absolutely foundational, as the majority of production UNIX filesystems are FFS-style file systems. While some parts of this paperare obsolete (check out the section on rotational delay), it's asimple, readable explanation of basic file system architecture thatyou can refer back to time and again. Also, it's pretty fun to read apaper describing the first implementation of, for example, symboliclinks for a UNIX file system.

For extra credit, you can read the original file system checker paper,Fsck- the UNIX file system check program, by Marshall Kirk McKusickand T. J. Kowalski. It describes the major issues in checking andrepairing file system metadata consistency. Improving fsck is ahot topic in file systemsright now, so reading this paper might be worthwhile.

Vnodes:An Architecture for Multiple File System Types in Sun UNIX bySteve Kleiman. The original UNIX file system interface had beendesigned to support exactly one kind of file system. With the adventof FFS and other file systems, operating systems now needed to supportseveral different file systems. Several solutions were proposed, butthe dominant solution ended up being the VFS (Virtual File System)interface, first proposed and implemented by Sun. This paper explainsthe rationale behind VFS and vnodes.

Designand Implementation of the Sun Network Filesystem by RusselSandberg, David Goldberg, Steve Kleiman, Dan Walsh, and Bob Lyon.Once upon a time (1985, specifically), people weren't really clear onwhy you would want a network file system (as opposed to, for example,a network disk or copying around files via rcp). This paper explainsthe needs and requirements that resulted in the invention of NFS, thenetwork file system everyone loves to hate but uses all the timeanyway. It also discusses the design of the VFS. A fun quote fromthe paper: "One of the advantages of the NFS was immediately obvious:as the df output below shows, a diskless workstation can have accessto more than a Gigabyte of disk!"

Slaying the fsck dragon

One of the major problems in file systems is keeping the on-disk dataconsistent in the event that a file system is interrupted in themiddle of update (for example, if the system loses power). OriginalFFS solved this problem by running fsck on the file system after acrash or other unclean unmount, but this took a really long time andcould lose data. Many smart people thought about this problem andcame up with four major approaches: journaling, log-structured filesystems, soft updates, and copy-on-write. Each method provided a wayof quickly recovering the file system after a crash. The most popularapproach was journaling, since it was both relatively simple and easyto "bolt-on" to existing FFS-style file systems.

Journaling file systems solve the fsck problem by first writing anentry describing an update to the file system to a on-disk journal - arecord of file system operations. Once the journal entry is complete,the main file system is updated; if the operation is interrupted, thejournal entry is replayed on the next mount, completing anyhalf-finished operations in progress at the time of the crash. Mostproduction file systems (including ext3, XFS, VxFS, logging UFS, andreiserfs) use journaling to avoid fsck after a crash. No canonicaljournaling paper exists outside the database literature (from whencethe idea was lifted wholesale), butJournalingthe Linux ext2fs Filesystem by Stephen Tweedie is a good choicefor learning both journaling techniques in general and the details ofext3 in particular.

TheDesign and Implementation of a Log-Structured File System byMendel Rosenblum and John K. Ousterhout. Journaling file systems haveto write each operation to disk twice: once in the log, and once inthe final location. What would happen if we only wrote the data todisk once - in the journal? While the log-structured architecture was anelectrifying new idea, it ultimately turned out to be impractical forproduction use, despite the concerted efforts of many computer scienceresearchers. Today, no major production file system islog-structured. (Note that a log-structured file system isnot the same as alogging file system - logging is anothername for journaling.)

If you're looking for cocktail party gossip, Margot Seltzerand several colleagues published papers critiquing and comparinglog-structured file systems to variations of FFS-style file systems,in which LFS usually came out rather the worse for the wear. This ledto a semi-famous flame war in the form of web pages, archivedhere.

SoftUpdates: A Technique for Eliminating Most Synchronous Writes in theFast Filesystem by Marshall Kirk McKusick and Greg Ganger. Softupdates carefully orders writes to a file system such that in theevent of a crash, the only inconsistencies are relatively harmlessones - leaked blocks and inodes. After a crash, the file system ismounted immediately and fsck runs in the background. The performanceof soft updates is excellent, but the complexity is very high - as in,soft updates has been implemented only once (on BSD) to my knowledge.Personally, it took me about 5 years to thoroughly understand softupdates and I haven't met anyone other than the authors who claimed tounderstand it well enough to implement it. The paper is prettyunderstandable up to about page 5, at which point your head willexplode. Don't feel bad about this, it happens to everyone.

File System Designfor an NFS File Server Appliance by Dave Hitz, James Lau, andMichael Malcom. This paper describes the file system used insideNetApp file servers, Write-Anywhere File Layout (WAFL), as of 1994(it's been improved in many ways since then). WAFL was the firstmajor use of a copy-on-write file system - one in which "live" (inuse) metadata is never overwritten in place but copied elsewhere ondisk. Once a consistent set of updates has been written to disk, the"superblock" is re-written to point to the new set of metadata.Copy-on-write has an interesting set of trade-offs all its own, buthas been implemented in a production file system twice now;Solaris's ZFSis also a copy-on-write file system.

File system performance

Each of these papers focuses on file system performance, but alsointroduces more than one interesting idea and makes a good startingpoint for exploring several areas of file system design andimplementation.

Extent-likePerformance from a UNIX File System by Larry McVoy and SteveKleiman. This 1991 paper describes optimizations to FFS that doubledfile system bandwidth for sequential I/O workloads. While theoptimizations described in this paper are considered old hat thesedays (ever heard of readahead?), it's a good introduction to filesystem performance.

Sidebar: Where are they now?

You might have recognized some of the names in the author lists of thepapers in this article - and chances are, you aren't recognizing theirnames because of their file system work. What else did these peopledo? Here's a totally non-scientific selection.

Bill Joy - co-founded Sun Microsystems
Larry McVoy - wrote BitKeeper, co-founded BitMover
Steve Kleiman - CTO, Network Appliance
Mendel Rosenblum - co-founder, VMWare
John Ousterhout - wrote Tcl/Tk, co-founded several companies
Margot Seltzer - co-founder, Sleepycat Software
Dave Hitz - co-founder, Network Appliance

Obviously, anyone wanting to found a successful company and makemillions of dollars should consider writing a file system first.Scalabilityin the XFS File System by Adam Sweeney, Doug Doucette, Wei Hu,Curtis Anderson, Mike Nishimoto, and Geoff Peck. This paper describesthe motivation and implementation of XFS, a 64-bit file system usingextents, B+ trees, dynamically allocated inodes, and journaling. XFSis not by any means an FFS-style file system and reading this paperwill give you the basics on most extent-based file systems. It alsodescribes quite a few useful optimizations for avoiding fragmentation,scaling to multiple threads, and the like.

TheUtility of File Names by Daniel Ellard, Jonathan Ledlie, andMargot Seltzer. File system performance and on-disk layout can bevastly improved if the file system can predict (with reasonableaccuracy) the size and access pattern of a file before it writes it todisk. The obvious solution is to add a new set of file systeminterfaces allowing the application to give explicit hints about thesize and properties of a new file. Unfortunately, the history of filesystems is littered with unused per-file interfaces like this (howoften do you set the noatime flag on a file?). However, it turns outthat applications are already giving these hints - in the form of filenames, permissions, and other per-file properties. This paper is thefirst in a series demonstrating that a file system can make useful predictions about the future of a file based on the filename and other properties.