Fault Tolerance

The HPFS's extensive use of lazy writes makes it imperative for the HPFS to be able to recover gracefully from write errors under any but the most dire circumstances. After all, by the time a write is known to have failed, the application has long since gone on its way under the illusion that it has safely shipped the data into disk storage. The errors may be detected by the hardware (such as a "sector not found" error returned by the disk adapter), or they may be detected by the disk driver in spite of the hardware during a read-after-write verification of the data. The primary mechanism for handling write errors is called a hot fix. When an error is detected, the file system takes a free block out of a reserved hot fix pool, writes the data to that block, and updates the hot fix map. (The hot fix map is simply a series of pairs of double words, with each pair containing the number of a bad sector associated with the number of its hot fix replacement. A pointer to the hot fix map is maintained in the Spare Block.) A copy of the hot fix map is then written to disk, and a warning message is displayed to let the user know that all is not well with the disk device.

Each time the file system requests a sector read or write from the disk driver, it scans the hot fix map and replaces any bad sector members with the corresponding good sector holding the actual data. This look aside translation of sector numbers is not as expensive as it sounds, since the hot fix list need only be scanned when a sector is physically read or written, not each time it is accessed in the cache. One of CHKDSK's duties is to empty the hot fix map. For each replacement block on the hot fix map, it allocates a new sector that is in a favorable location for the file that owns the data, moves the data from the hot fix block to tile newly allocated sector, and updates the file's allocation information which may involve rebalancing allocation trees and other elaborate operations). It then adds the bad sector to the bad block list, releases the replacement sector back to the hot fix pool, deletes the hot fix entry from the hot fix map, and writes the updated hot fix map to disk. of course, write errors that can be detected and fixed on the fly are not the only calamity that can befall a file system. The HPFS designers also had to consider the inevitable damage to be wreaked by power failures, program crashes, malicious viruses and Trojan horses, and those users who turn off the machine without selecting Shut-down in the Presentation Manager Shell. (Shutdown notifies the file system to flush the disk cache, update directories, and do whatever else is necessary to bring the disk to a consistent state.)

The HPFS defends itself against the user who is too abrupt with the Big Red Switch by maintaining a Dirty FS flag in the Spare Block of each HPFS volume. The flag is only cleared when all files on the volume have been closed and all dirty buffers in the cache have been written out or, in the case of the boot volume since OS2.INI and the swap file are never closed), when Shutdown has been selected and has completed its work. During the OS/2 boot sequence, the file system inspects the Dirty FS flag on each HPFS volume and, if the flag is set, will not allow further access to that volume until CHKDSK has been run. If the Dirty FS flag is set on the boot volume, the system will refuse to boot the user must boot OS/2 in maintenance mode from a diskette and run CHKDSK to check and possibly repair the boot volume. In the event of a truly major catastrophe, such as loss of the Super Block or the root directory, the HPFS is designed to give data recovery the best possible chance of success. Every type of crucial file objects including Fnodes, allocation sectors, and directory blocks is doubly linked to both its parent and its children and contains a unique 32-bit signature. Fnodes also contain the initial pofiion of the name of their file or directory. Consequently, CHKDSK can rebuild an entire volume by methodically scanning the disk for Fnodes, allocation sectors, and directory blocks, using them to reconstruct the files and directories and finally regenerating the freespace bitmaps.


< [Performance Issues] | [HPFS Home] | [Application Programs and the HPFS] >
Html'ed by Hartmut Frommert