Summary

The HPFS solves all of the historical problems of the FAT file system. it achieves excellent throughput even in extreme casses--many very small files or a few very large files--by means of advanced data structures and techniques such as intelligent caching read-ahead and write-behind. Disk space is used economically because it is managed on a sector basis. Existing application programs will need modification to take advantage of the HPF'S's support for extended attributes and long filenames but these changes will not be difficult. All application programs will benefit from the HPFS's improved performance and decreased CPU use whether they are modified or not. This article is based on a prerelease version of the HPFS that was still undergoing modification and tuning. Therefore the final release of the HPFS may differ in some details from the description given here.

Most programmers are at at least passingly famiIiar with the data structure known as a binary tree. Binary trees are a technique for imposing a logical ordering on a collection of data items by means of pointers, without regard to the physical order of the data. In a simple binary tree, each node contains some data, including a key value that determines the node's logical position in the tree, as well as pointers to the node's left and right sub trees. The node that begins the tree is known as the root: the nodes that sit at the ends of the tree's branches are sometimes called the leaves. To find a particular piece of data, the binary tree is traversed from the root. At each node, the desired key is compared with the node's key: if they don't match, one branch of the node's sub tree or another is selected based on whether the desired key is less than or greater than the node's key. This process continues until a match is found or an empty sub tree is encountered (see Figure A). Such simple binary trees, although easy to understand and implement, have disadvantages in practice. If keys are not well distributed or are added to the tree in a non-random fashion, the tree can become quite asymmetric, leading to wide variations in tree traversal times. In order to make access times uniform, many programmers prefer a particular type of balanced tree known as a B- Tree. For the purposes of this discussion, the important points about a B-Tree are that data is stored in all nodes, more than one data item might be stored in a node, and all of the branches of the tree are of identical length (see Figure B). The worst-case behavior of a B-Tree is predictable and much better than that of a simple binary tree, but the maintenance of a B-Tree is correspondingly more complex. Adding a new data item, changing a key value, or deleting a data item may result in the splitting or merging of a node, which in turm forces a cascade of other operations on the tree to rebalance it. A B+ Tree is a specialized form of B-Tree that has two types of nodes: internal, which only point to other nodes, and external, which contain the actual data (see Figure C). The advantage of a B+ Tree over a B- Tree is that the internal nodes of the B+Tree can hold many more decision values than the intenmediate-level nodes of a B-Tree, so the fan out of the tree is faster and the average length of a branch is shorter. This makes up for the fact that yell must always follow a B+ Tree branch to its end to get the data for which you are looking, whereas in a B-Tree you may discover the data at an interme-diate code or even at the root.
FAT File System High Performance File System
Maximum filename length 11(in 8.3 format) 254
Number of dot (.) delimeters allowed One Multiple
File Attributes Bit flags Bit flags plus up to 64Kb of free-form ASCll of binary information
Maximium Path Length 64 260
Miniumum disk space overhead per file Directory entry (32 bytes) Directory entry (length varies) + Fnode (512 bytes)
Average wasted space per file 1/2 cluster (typically 2Kb or more) 1/2 sector (256 bytes)
Minimum alocation unit Cluster (typically 4Kb or more) Sector (512 bytes)
Allocation info for files Centralized in FAT on home track Located nearby each file in its Fnode
Free disk space info Centralized in FAT on home track Located near free space in bitmaps
Free disk space described per byte 2Kb ( 1/2 cluster at 8 sectors /clustor) 4Kb (8 sectors)
Directory structure Unsorted linear list, must be searched exhaustivily Sorted B-Tree
Directory Location Root directory on home track, others scattered Localized near seek center of volume
Cache replacement strategy Simple LRU Modified LRU, sensitive to data type and usage history
Read ahead None in MS-DOS 3.3 or earlier, primitive read-ahead optional in MS-DOS 4 Always present, sensitive to data type and usage history
Write behind Not available Used by default, but can be defeated on per-handle basis

[Fig. A]

FIGURE A: To find a piece of data, the binary tree is traversed from the root untill the data is found or an empty subtree is encountered.

[Fig. B]

FIGURE B: In a balanced B-Tree, data is stored in nodes, more than one data item can be stored in a node, and all branches of the tree are the same length.

[Fig. C]

FIGURE C: A B+ Tree has internal nodes that point to other nodes and external nodes that contain actual data.


< [Application Programs and HPFS] | [HPFS Home]
Html'ed by Hartmut Frommert