Adventures in Luserland: Finding Disk Sectors Associated with File Records.

来源：互联网发布：影像系统制作软件编辑：程序博客网时间：2024/05/09 13:30

Much to our dismay, we kernel programmers sometimes find ourselves up in user mode writing test utilities and diagnostic programs.

Recently a client was having a problem with data corruption while testing new Fibre Channel Adapters. A proprietary disk stress test program was detecting data corruption under extreme stress conditions. Part of the problem of getting at the root cause of the corruption was the inability to determine if the disk media was corrupted, if there was a disk cache issue, or if the Fiber Channel HBAs were malfunctioning. Analysis was hampered by the difficulty of correlating high level file IO operations with low level SCSI requests. I suggested that the client could modify the disk test program to collect physical disk address (Logical Block Address, or LBA)1 information and display this information when it detected data corruption. From the blank stares I realized that nobody knew what I was talking about, so naturally I volunteered to share the information on how to do this, code I have written several times in the past, with my client. As the process ends up being a bit more complicated than one would suspect, I thought I would share the technique with the WD-3 community.

A while ago Microsoft exposed several file system APIs that, while intended for defragmentation operations, can be exploited for other purposes. (For more information about defragmentation, see the Sysinternals article Inside Windows NT Disk Defragmenting.) Translation from file offset and length to disk LBA, with certain caveats and restrictions, is one of those possible exploits.

FSCTL_GET_RETRIEVAL_POINTERS
This file system control operation is the basis for performing translation. Given a file handle, a program performs a DeviceIoControl request, providing as input a STARTING_VCN_INPUT_BUFFER, and getting back as output a RETRIEVAL_POINTERS_BUFFER. What this request does is to perform a mapping between file address space and volume address space, in the form of clusters. A cluster is an allocation unit that's specific to a file system. Generally, a cluster is the smallest allocation unit that the file system will use when allocating physical storage for a file. In Win32 terminology, a file has Virtual Clusters, where each cluster represents a block of data starting at a specific byte offset into the file data stream. Volumes have Logical Clusters, each representing a block of data starting at a specific byte offset on the volume.

While file systems usually try to allocate contiguous chunks of data for storage, over time as the volume storage is reused, fragmentation of the volume storage space may make it impossible to provide contiguous storage for files. Naively one might think that if you can discover the LBA of offset zero in a file, all other LBAs can be found through a simple addition process. Instead, on-disk file storage can be scattered across the volume. FSCTL_GET_RETRIEVAL_POINTERS is used to obtain, from the file system driver, a list of Virtual Clusters that make up the file, and a corresponding set of Logical Clusters that represent the volume storage for these Virtual Clusters.

FSCTL_GET_RETRIEVAL_POINTERS returns what amounts to the following information:

Given a file offset in bytes, the corresponding Virtual Cluster offset.
The Logical Cluster that corresponds to this Virtual Cluster.
The next Virtual Cluster offset.
The Logical Cluster returned is really a set of Logical Clusters representing a logically contiguous storage allocation.2 Using the next Virtual Cluster offset, the length of this storage allocation can be determined as well as the start of the next set of virtual clusters.

The following diagram illustrates the first three levels of translation, from file byte offset to virtual cluster block and then to volume logical cluster block. Given a file byte offset and the file system virtual cluster size, the file virtual cluster containing the file byte offset can easily be computed. Given a file virtual cluster and a map from file virtual cluster to volume logical cluster, the volume logical cluster that corresponds to the file virtual cluster that contains the file byte offset can easily be calculated.

--------------------------------------------------------------------------------

The input to FSCTL_GET_RETRIEVAL_POINTER is a STARTING_VCN_INPUT_BUFFER, defined as follows:

typedef struct {
LARGE_INTEGER StartingVcn;
} STARTING_VCN_INPUT_BUFFER, *PSTARTING_VCN_INPUT_BUFFER;
Note that the StartingVcn field is not a byte offset into the file. Instead it is a virtual cluster number in the file. This article will describe how to convert from byte offset to virtual cluster number, but for now it is sufficient to know that the file byte offset address space is partitioned into equal sized segments called virtual clusters. The first virtual cluster is cluster number 0 (zero), so if you want to obtain the first virtual cluster in the file, use the value 0.

The output from FSCTL_GET_RETRIEVAL_POINTER is a RETRIEVAL_POINTERS_BUFFER, defined as follows:

typedef struct RETRIEVAL_POINTERS_BUFFER {
DWORD ExtentCount;
LARGE_INTEGER StartingVcn;
struct {
LARGE_INTEGER NextVcn;
LARGE_INTEGER Lcn;
} Extents[1];
} RETRIEVAL_POINTERS_BUFFER, *PRETRIEVAL_POINTERS_BUFFER;
This is a typical Microsoft variable length data structure, where the Extents field defines a variable length array of structs, with the ExtentCount field indicating how many extents in this array contain valid data. The caller is responsible for allocating the storage, so as usual this presents the programmer with the choice between calling the interface twice, once to learn how much storage to allocate and once to retrieve the data, and using a fixed allocation scheme. This article uses the fixed allocation scheme with the simplest possible allocation strategy: one extent.

The caller provides as input a virtual cluster number. The output RETRIEVAL_POINTERS_BUFFER returns a StartingVcn value which might be different than the input VCN.. The returned StartingVcn is set to the start of an allocation extent, a logically contiguous allocation of storage space on the volume providing the storage for the file data. So if you request a translation of VCN 1, but the contiguous allocation of logical clusters on this volume for VCN 1 starts at VCN 0, then the value returned in StartingVcn will be VCN 0.

On return, the Extents array will hold ExtentCount elements, where each element has a NextVcn and an Lcn field. In our programming examples, we are never going to ask for more than one extent, so our calls to FSCTL_GET_RETRIEVAL_POINTER will return either one or zero as values for ExtentCount.

If ExtentCount equals one on return, then the Extents NextVcn field contains the virtual cluster number for the next contiguous allocation of clusters (extent) in the file. The length of the current extent, the one we asked for, is simply (NextVcn - StartingVcn.) Of course this length is expressed in units of clusters, which are file system and volume specific values.

Finally, the returned Extents contains an Lcn field, which is the volume logical cluster number that corresponds the file virtual cluster number contained in StartingVcn in the returned RETRIEVAL_POINTERS_BUFFER .

To iterate all of the mappings from virtual cluster to logical cluster, simply start at virtual cluster 0, and then proceed to NextVcn, until there are no more clusters to map.

The following code fragment illustrates how to iterate through a file, producing the mapping of virtual to logical clusters for the entire file, and calling a caller supplied function for each extent returned by FSCTL_GET_RETRIEVAL_POINTER . (This callback function could, for example, simply output the values for display purposes, or perhaps store them for later use.)

BOOL iterateAllClusters(
    HANDLE fileHandle,
    IterateAction callback)
{
    STARTING_VCN_INPUT_BUFFER inputVcn;
    RETRIEVAL_POINTERS_BUFFER rpBuf;
    inputVcn.StartingVcn.QuadPart = 0; // start at the beginning
    DWORD error = NO_ERROR;
    BOOL result = false;
    do {
        DWORD dwBytesReturned;
        result = DeviceIoControl(fileHandle,
            FSCTL_GET_RETRIEVAL_POINTERS,
            &inputVcn,
            sizeof(STARTING_VCN_INPUT_BUFFER),
            &rpBuf,
            sizeof(RETRIEVAL_POINTERS_BUFFER),
            &dwBytesReturned,
            NULL);
        error = GetLastError();
        switch (error) {
        case ERROR_HANDLE_EOF:
            //
            // done, no data.
            //
            result = true;
            break;

        case ERROR_MORE_DATA:
            //
            // more to do, valid data
            //
            inputVcn.StartingVcn = rpBuf.Extents[0].NextVcn;
            //
            // fall through
            //
        case NO_ERROR:
            //
            // done, valid data
            //
            callback(rpBuf.StartingVcn.QuadPart, rpBuf.Extents[0].Lcn.QuadPart,
                rpBuf.Extents[0].NextVcn.QuadPart - rpBuf.StartingVcn.QuadPart);
            result = true;
            break;
        default:
            //
            // user error?
            //
            break;
        }
    } while (error == ERROR_MORE_DATA);
    return result;
}
Note that, as FSCTL_GET_RETRIEVAL_POINTER is really an iterator, the return value from DeviceIoControl can essentially be ignored. Instead the error code should be examined for the values: NO_ERROR, ERROR_MORE_DATA, and ERROR_HANDLE_EOF. Any other returned error code represents a programmer error.

If the error code is NO_ERROR then the returned data is valid but no more virtual to logical cluster mappings are available. If the error code is ERROR_MORE_DATA then the returned data is valid and more virtual to logical cluster mappings are available. If the error code returned is ERROR_HANDLE_EOF the returned data is not valid, and there are no more cluster mappings available.

Cluster's Last Stand
So far what we have seen is straight forward, but we also have not managed to move from the virtual-to-logical cluster mappings provided by FSCTL_GET_RETRIEVAL_POINTER to our goal, which is the LBA on a physical disk for a byte offset in a file. So lets complicate things a little bit more, by adding another level of translation to our diagram above:

--------------------------------------------------------------------------------

Our goal is to obtain the LBA on a physical disk for an offset in a file. So far all we have is a logical cluster on a volume. The question is how to get from logical cluster to physical LBA. This question begs another question: what exactly is a volume logical cluster? The answer is the infamous it depends.

First of all the number of bytes in a cluster is determined by the file system. Luckily there is a defined API to ask any file system what a specific volume's cluster size is. Secondly, exactly where the volume cluster address space starts on a volume is once again entirely up to the file system. On some file systems, the first sector of a volume corresponds to the start of the first cluster on the volume. On other file systems the start of the first cluster on the volume may be some sector offset past the first sector of the volume. Unfortunately the mapping from volume LCN to volume logical sector is not provided by any public API.

It just so happens that one file system, the NTFS file system, currently maps the start of cluster 0 to volume logical sector 0. Another file system, the FAT32 file system does not map the start of cluster 0 to volume logical sector 0. We will explore how to deal with mapping LCN to volume logical sector for both of these file system in this article.

We are also making an assumption that the logical cluster address space is virtually contiguous on the volume. That is, for example, that volume cluster 1 is logically adjacent to clusters 0 and 2. This is an assumption that can be freely violated by any file system, as what a cluster means is entirely up to the file system. The only requirement is that the file system provide a mapping from virtual to logical cluster, and that that mapping works as specified for the defragging interface that uses it. We are walking outside of the defined functionality, so we have no right to complain if these techniques do not work on all file systems.

There is a third complication implied by the above diagram. Note that the lowest block in the diagram, labeled Physical Disk Allocations, has a shaded section labeled Simple Volume Allocation on Disk. Note also that in the previous sentence the word simple is in boldface type. There are complex volume allocations that map volumes to more than one physical disk. Spanned, striped, and mirrored volumes are not simple volume allocations. The techniques documented in this article will not work on spanned striped or mirrored volumes, except for the case where each plex of a mirrored volume is allocated at the same offset on its physical disk.

There is a fourth complication: none of this works on compressed files or volumes or NTFS sparse allocated files.

Four Easy Pieces
Enough of limitations, lets get back to what we have to do with clusters in order to translate them to physical disk LBAs. We need four additional pieces of information:

The volume sector size
The volume cluster size
The starting offset of the SIMPLE volume on its physical disk
The starting offset on the volume of the first cluster.
The volume sector size and volume cluster size (in sectors) can be obtained from the Win32 GetDiskFreeSpace API. Of course this requires the volume name as input, and that can be obtained from the Win32 GetVolumePathName API, given a file name.

char volumeName[MAX_PATH];
BOOL result = GetVolumePathName(filename, volumeName, sizeof(volumeName));
if (!result) {
    // do something to error out;
}
result = GetDiskFreeSpace(volumeName,
            &SectorsPerCluster,
            &BytesPerSector,
            &NumberOfFreeClusters,
            &TotalNumberOfClusters);
if (!result) {
     // do something to error out;
}
This takes care of items one and two in our list above. We now have the volume sector size and the volume cluster size. Note that the underlying file system must consider volume logical cluster size to be the same as file virtual cluster size.

Item three requires us to know where our volume is located on a physical disk. Given this information, we should be able to compute the LBA of our file offset. As we would soon do, if only it were quite this simple...

There are two documented ways to find the volume offset on a disk. The first method is to use IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS to directly obtain the volume offset information. This IOCTL returns a VOLUME_DISK_EXTENTS variable length structure defined as follows:

typedef struct _VOLUME_DISK_EXTENTS {
DWORD NumberOfDiskExtents;
DISK_EXTENT Extents[1];
} VOLUME_DISK_EXTENTS, *PVOLUME_DISK_EXTENTS;
NumberOfDiskExtents indicates, on return, the number of different physical disks that are used to store this volume's data. For example a mirrored disk would have two disk extents, so NumberOfDiskExtents would have the value 2. The DISK_EXTENT array at Extents must be large enough to handle the total number of extents for the target volume. We are not going to deal with anything other than simple volumes, so we should not need more than one extent.

The DISK_EXTENT structure is defined as follows:

typedef struct _DISK_EXTENT {
    DWORD DiskNumber;
    LARGE_INTEGER StartingOffset;
    LARGE_INTEGER ExtentLength;
} DISK_EXTENT, *PDISK_EXTENT;
Each extent has a StartingOffset expressed in bytes, and a length also expressed in bytes. The disk containing this extent is indicated by DiskNumber, an unsigned integer value. The name that can be used to open this physical disk (for example to read or write the data directly from the physical disk,) can be generated by appending DiskNumber to the string //./PhysicalDrive, as in //./PhysicalDrive0 or //./PhysicalDrive1.

The second method uses IOCTL_VOLUME_LOGICAL_TO_PHYSICAL. This IOCTL doesn't produce the volume offset on a disk, instead it produces the physical offset on a disk for a logical offset on a volume. This IOCTL returns yet another variable length data structure that contains a list of physical disks and the corresponding physical offset on the disk for input volume logical offset. This allows it to support complex volumes such as mirrored or striped volumes. Note that IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS, while it will give you a list of physical disks that host a volume, does not tell you which of those disks contains the data of interest. It cannot support our efforts for striped volumes.

The input for IOCTL_VOLUME_LOGICAL_TO_PHYSICAL is a VOLUME_LOGICAL_OFFSET structure, which is really just a container for a 64-bit file byte offset. The output is a VOLUME_PHYSICAL_OFFSETS variable length structure, defined as follows:

typedef struct _VOLUME_PHYSICAL_OFFSET {
    ULONG DiskNumber;
    LONGLONG Offset;
} VOLUME_PHYSICAL_OFFSET, *PVOLUME_PHYSICAL_OFFSET;
typedef struct _VOLUME_PHYSICAL_OFFSETS {
    ULONG NumberOfPhysicalOffsets;
    VOLUME_PHYSICAL_OFFSET PhysicalOffset[1];
} VOLUME_PHYSICAL_OFFSETS, *PVOLUME_PHYSICAL_OFFSETS;
As usual the number of VOLUME_PHYSICAL_OFFSETS is indicated in the returned NumberOfPhysicalOffsets, and the caller must provide an output buffer large enough to contain the indicated number of physical offsets, typically through an iterative process. The array of VOLUME_PHYSICAL_OFFSETS contains a list of physical disks and offsets, one for each physical disk that contains the volume logical offset of interest. Each VOLUME_PHYSICAL_OFFSET in turn contains a DiskNumber and an Offset field. The DiskNumber field can be used to construct the physical device name, as in the previous example. The Offset can be used to find the correct LBA on a physical disk that contains the input volume logical address.

The following code fragment illustrates how to use IOCTL_VOLUME_LOGICAL_TO_PHYSICAL. Earlier we found the volume logical cluster for a file offset and the volume cluster size. We can use this information to obtain the volume logical offset. We are only going to deal with simple volumes, so we don't need to worry about the number of disks that contain our data.

VOLUME_LOGICAL_OFFSET logicalOffset;
VOLUME_PHYSICAL_OFFSETS physicalOffsets;
logicalOffset.LogicalOffset = Extents[0].Lcn.QuadPart * volumeData.BytesPerCluster;

result = DeviceIoControl(translation->hVolume,
    IOCTL_VOLUME_LOGICAL_TO_PHYSICAL,
    &logicalOffset,
    sizeof(VOLUME_LOGICAL_OFFSET),
    &physicalOffsets,
    sizeof(physicalOffsets),
    &dwBytesReturned,
    NULL);
if (!result) {
    error = GetLastError();
    // error out here
}
startSector = outputBuffer.physical.PhysicalOffset[0].Offset / volumeData.BytesPerSector;
There is a small problem with using this IOCTL: it isn't defined in the PlatformSDK. It is however defined in the w2k/XP/W2K3 DDKs, so you have to include a path to the appropriate DDK include file directory to use this in a program.

Bug, Not A Bug?
There is a big problem with both IOCTL_VOLUME_LOGICAL_TO_PHYSICAL and IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS: they are broken with respect to FAT32 volumes. Well, it isn't clear that they are broken; perhaps FAT32 is broken. Maybe it's just the documentation that is broken. I'm pretty sure something is broken, as the results returned for FAT32 volumes, in both cases, do not locate the data on the physical disk, while the results returned for NTFS volumes do in fact locate the data on the physical disk. More explicitly, if you read the indicated sector off of the physical disk and the corresponding file offset in the original file, and compare the returned data buffers, for FAT32 you will get frequent miscompares while for NTFS everything will be just fine.

What is going on here? There is an assumption in the translation process that started with FSCTL_GET_RETRIEVAL_POINTERS that the retrieval pointers returned are relative to the start of the volume. That assumption is simply false for FAT32 volumes, as this file system starts its data cluster allocations at some magic offset into the volume, but neglects to use that fact when responding to FSCTL_GET_RETRIEVAL_POINTERS requests. NTFS, on the other hand, thinks that cluster 0 is co-located with volume sector zero. To be fair, it is not documented anywhere that I can find that the values returned by FSCTL_GET_RETRIEVAL_POINTERS can be used for anything other than input to FSCTL_MOVE_FILE, and as far as I know the defragmentation APIs work correctly. So we can gripe, moan, complain and whine, but as we are using FSCTL_GET_RETRIEVAL_POINTERS for undocumented porpoises, all this carping is to no avail. We have to deal with the peculiar nature of the FAT32 file system. I take back all the bad things I said in the first paragraph of this section. What we have here is a feature, not a bug.

PHAT32 ON DISK STRUCTURES
Okay, so it's a feature. Well restricting the usefulness of our 'get the physical sector for a file offset' utility even further by saying 'NTFS only' starts to make our application more useless than utilitarian, so we will not take that particular cop out. Instead we will just use the official documented on disk data structures for the infamous FAT32 file system in order to work around (as in, "hack") this particular problem.

An interesting idea, however, as there appears to be no such thing as the 'official documented on disk data structures for FAT32', we have to go off on another tangent here. Rather than use the unavailable documented FAT32 on disk data structures, we will have to abuse the web and use whatever consensus GOOGLE turns up for the unofficial documented FAT32 on disk data structures. It turns out that all we are really interested in is what is known as the FAT32 Boot Parameter Block, or BPB.

What follows here is my best guess at the FAT32 BPB structure, verified within the context of the utility we are describing in this article, and collated from your friend and mine: google.com.

//
// This is the on disk layout of the FAT32 BPB. You can find
// many versions of it defined on the web. I think this one is
// correct. Too bad there doesn't appear to be a spec for this.
//
struct OndiskFat32Bpb {
    UCHAR   BytesPerSector[2];
    UCHAR   SectorsPerCluster[1];
    UCHAR   ReservedSectors[2];
    UCHAR   NumberOfFats[1];
    UCHAR   RootEntries[2];
    UCHAR   NumberOfSectors[2];
    UCHAR   MediaDescriptor[1];
    UCHAR   SectorsPerFat[2];
    UCHAR   SectorsPerTrack[2];
    UCHAR   HeadsPerCylinder[2];
    UCHAR   HiddenSectors[4];
    UCHAR   BigNumberOfSectors[4];
    UCHAR   BigSectorsPerFat[4];
    UCHAR   ExtFlags[2];
    UCHAR   FsVersion[2];
    UCHAR   RootDirectoryStart[4];
    UCHAR   FsInfoSector[2];
    UCHAR   BackupBootSector[2];
    UCHAR   Reserved[12];
};
Note that this version of the FAT32 BPB is defined completely in terms of UCHARs, and is such is rather unfriendly for use within an application. So here is a perhaps overly-clever c++ structure that redefines the on-disk version into a programming friendly version, and provides a constructor that automatically converts from the on-disk format. The idea is that your program reads the on-disk version and then constructs the programmer friendly version, and then uses the latter.

//
// this is a processor friendly version of the FAT32 BPB
//
struct Fat32Bpb {
    USHORT BytesPerSector;
    UCHAR   SectorsPerCluster;
    USHORT ReservedSectors;
    UCHAR   NumberOfFats;
    USHORT RootEntries;
    USHORT NumberOfSectors;
    UCHAR   MediaDescriptor;
    USHORT SectorsPerFat;
    USHORT SectorsPerTrack;
    USHORT HeadsPerCylinder;
    ULONG   HiddenSectors;
    ULONG   BigNumberOfSectors;
    ULONG   BigSectorsPerFat;
    USHORT ExtFlags;
    USHORT FsVersion;
    ULONG   RootDirectoryStart;
    USHORT FsInfoSector;
    USHORT BackupBootSector;
    //
    // copy constructor from ondisk format to in-memory format
    //
    Fat32Bpb(const OndiskFat32Bpb& rhs)
    {
        unalignedCopy( &BytesPerSector,     rhs.BytesPerSector );
        unalignedCopy( &SectorsPerCluster, rhs.SectorsPerCluster );
        unalignedCopy( &ReservedSectors,    rhs.ReservedSectors );
        unalignedCopy( &NumberOfFats,       rhs.NumberOfFats );
        unalignedCopy( &RootEntries,        rhs.RootEntries);
        unalignedCopy( &NumberOfSectors,    rhs.NumberOfSectors);
        unalignedCopy( &MediaDescriptor,    rhs.MediaDescriptor);
        unalignedCopy( &SectorsPerFat,      rhs.SectorsPerFat);
        unalignedCopy( &SectorsPerTrack,    rhs.SectorsPerTrack);
        unalignedCopy( &HeadsPerCylinder,   rhs.HeadsPerCylinder);
        unalignedCopy( &HiddenSectors,      rhs.HiddenSectors);
        unalignedCopy( &BigNumberOfSectors, rhs.BigNumberOfSectors);
        unalignedCopy( &BigSectorsPerFat,   rhs.BigSectorsPerFat);
        unalignedCopy( &ExtFlags,           rhs.ExtFlags);
        unalignedCopy( &FsVersion,          rhs.FsVersion);
        unalignedCopy( &RootDirectoryStart, rhs.RootDirectoryStart);
        unalignedCopy( &FsInfoSector,       rhs.FsInfoSector);
        unalignedCopy( &BackupBootSector,   rhs.BackupBootSector);
    }
};
So that is all very well and nice, but what exactly is this unalignedCopy function?

//
// template for creating aligned data structures
// for unaligned arrays of bytes.
//
template <typename T> struct UNPACK_TYPE {
    union {
        UCHAR byteArray[sizeof(T)];
        T     alignedType;
    };
};
//
// macro to instantiate an instance of an UNPACK_TYPE
// type and use it to copy from an unaligned array of bytes
// to an aligned integral type
//
#define COPY_OP(U_T, DST, SRC) /
    U_T * ucD = reinterpret_cast<U_T *>(DST);               /
    const U_T * ucS = reinterpret_cast<const U_T *>(SRC);   /
    *ucD = *ucS;
//
// template function to perform a copy from an unaligned array of
// bytes into an integral type.
//
template <typename T> void unalignedCopy(T * Dst, const UCHAR* Src)
{
    COPY_OP(UNPACK_TYPE<T>, Dst, Src);
}
Lost? Me too. C++ can be so confusing sometimes. It is however massively neat. The unalignedCopy template uses the COPY_OP macro to create, on the (first use) fly, templated UNPACK_TYPE structures for whatever data type happens to be specified as the destination type in the specific invocation of unalignedCopy.

So for example, the constructor's call to unalignedCopy( &BytesPerSector, rhs.BytesPerSector ); results in the creation of a UNPACK_TYPE structure that looks like this:

struct UNPACK_TYPEsome_c++_mangling_stuff_here {
union {
UCHAR byteArray[sizeof(USHORT)];
USHORT alignedType;
}
};
And of course a templated instantiation of unalignedCopy specialized to copy only the appropriate bytes from the raw on-disk format to the unpacked programmer friendly format.

Having access to the FAT32 BPB is only part of the solution. The rest of the solution is to use the BPB to work around the fact that the extent translation process for FAT32 file systems does not give us the correct physical disk LBA. What is going on with FAT32 is that the translated extent is not relative to volume logical sector zero, but instead it is relative to some offset within the volume. That offset can be computed using the FAT32 BPB structure. So all we have to do is, for FAT32 file systems, is compute the offset translation.

Fat32Bpb bpb(bootSector.bpb.ondiskBpb);
//
// ok now figure the offset to the first data cluster
//
translation->rootStart = bpb.ReservedSectors + (bpb.NumberOfFats * bpb.BigSectorsPerFat);
translation->clusterStart = translation->rootStart + ((bpb.RootEntries * 32) / 512);
if ((bpb.RootEntries * 32) % 512) {
translation->clusterStart++;
}
For FAT32, the extents are relative to a computed offset into the volume. That offset is computed by finding the root directory offset, which is in turn found by adding the ReservedSectors value from the BPB to the BPB NumberOfFats multiplied by the BPB BigSectorsPerFat values. This produces what the program refers to as the rootStart. The actual data allocations begin at the clusterStart offset, which is computed based on the rootStart, plus the size of the root directory.

Note that for file systems other than FAT32 translation->clusterStart is always zero.

The translation structure will be used later on to add the clusterStart value to our computed logical to physical disk address as follows:

VOLUME_LOGICAL_OFFSET logicalOffset;
struct {
VOLUME_PHYSICAL_OFFSETS physical;
VOLUME_PHYSICAL_OFFSET plex2;
} outputBuffer;
logicalOffset.LogicalOffset = translation->rpBuf.Extents[0].Lcn.QuadPart * translation->volumeData.BytesPerCluster;

result = DeviceIoControl(translation->hVolume,
            IOCTL_VOLUME_LOGICAL_TO_PHYSICAL,
            &logicalOffset,
            sizeof(VOLUME_LOGICAL_OFFSET),
            &outputBuffer,
            sizeof(outputBuffer),
            &dwBytesReturned,
            NULL);
if (!result) {
    error = GetLastError();
    break;
}
startSector = outputBuffer.physical.PhysicalOffset[0].Offset /
                       translation->volumeData.BytesPerSector;
startSector += translation->clusterStart;
One application is worth a whole lot of words.
To wrap this all up and put it all together, I've included a sample application that demonstrates how to use the methods discussed here to read physical disk sectors directly for a given file offset. The application, and the source code, are of course provided 'as is' and 'with all faults', and are intended only for use in test and debugging situations on non-production systems.

The application is called getFileExtents and is a command line application used as follows:

Usage: getFileExtents filename [fileOffset length]

There are two ways to use getFileExtents. If you just supply a filename, the program prints out all of the extent to LBA translations for the file. If instead you provide a filename and a fileOffset /length pair, getfileExtents will find the LBA for the specified offset.

Code Tour
A simple API for performing file offset to physical disk LBA translation is defined in the header file fileTranslation.h. The API consists of the following functions:

//
/// call initFileTranslation once before calling any other translation functions
///
/// filename is the file of interest.
///
/// verify requests that each call to getNextTranslation validate the translation.
///   If set, not all files can be opened.
///
/// initFileTranslation returns an opaque pointer value that is a token that
/// must be used on all calls to getNextTranslation and closeTranslation.
/// The pointer remains valid until a call to closeTranslation is made.
///
/// If initFileTranslation returns NULL GetLastError can be used to determine
/// what went wrong.
//
PVOID initFileTranslation(
    char * filename,
    BOOL verify);
//
/// resetTranslation must be called before starting
/// a translation interation using getNextTranslation.
/// Note that calls to resetTranslation and getLBAandLengthByOffset
/// will result in the next call to getNextTranslation starting from
/// the beginning of the file.
//
void resetTranslation(
    PVOID translationToken);
//
/// use this to iteratively fetch the disk extents for the file
/// returns a win32 error code.
/// There are three normal errors returned:
///
/// NO_ERROR - iterationm is complete and the returned data
/// is valid.
///
/// ERROR_HANDLE_EOF - iterationm is complete and the returned
/// data is NOT valid.
///
/// ERROR_MORE_DATA - iterationm is incomplete and
/// the returned data is valid
///
/// fileOffset returns the byte offset in the file of this translation
///
/// startSector returns the physical disk logical block address that corresponds
/// to fileOffset.
///
/// nSectors returns the number of contiguous sectors that constitute this disk extent.
//
DWORD getNextTranslation(
        PVOID translationToken,
        LONGLONG& fileOffset,
        LONGLONG& startSector,
        LONGLONG& nSectors);
//
/// pass in a file offset (byte offset in file),
/// and the length in bytes of the data at that offset,
/// and get back the physical LBA (sector offset) and
/// byte length of this on disk run.
///
/// returns true if the operation succeeded, false otherwise.
/// if runLength < recordLength, this record is split across more
/// than one physical run. runLength should be added to fileOffset,
/// and subtracted from recordLength, and getLBAandLengthByOffset
/// should be called again to get the next LBA.
/// RecordLength mod BytesPerSector must be zero!
//
BOOL getLBAandLengthByOffset(
    PVOID translationToken,
    LONGLONG fileOffset,
    LONGLONG recordLength,
    LONGLONG& startSectorLBA,
    LONGLONG& runLength);
//
/// diagnostic interface to debug print to stdout
/// the raw virtual to logical cluster mapping
//
BOOL printAllClusters(
    PVOID translationToken);
//
/// end operations on this file.
//
void closeTranslation(
    PVOID translationToken);
The file filetranslation.cpp implements the API defined above.

The file getFileExtents.cpp implements a sample application that uses the fileTranslation API.

Installing the supplied code samples and program.
The source code and a sample application are provided as a zip file that you can download from http://www.wd-3.com/downloads/getFileExtents.zip. The source is packaged as a Visual Studio .Net 2003 project, although there is nothing particularly VS.N 2003 specific about the code, so you can manually convert it to any other IDE or makefile build environment. You may have to alter include file paths for any build environment to include both sdk and ddk header files.

Install by unzipping to your favorite source sample directory.

The sample application is located in the debug subdirectory, which is also where the fishy smell is coming from.

About the author:

Mark Roddy is an independent consultant specializing in Windows NT kernel software development. Mark has been working exclusively in the NT kernel since 1994, with a focus on storage subsystems and highly reliable computer platforms. In addition to software development, he has been training developers since 1996, and currently works with Azius to provide Windows NT device driver training.

--------------------------------------------------------------------------------

I'll use the term Logical Block Address (LBA) to refer to the physical sectors on the disk. This is the more formal and correct way to refer to the on-disk location of data, and reflects the rather complex internal hardware and software architecture of modern disk drives.
Logically contiguous, as there are other translations underneath the volume level that complicate matters. We discuss this and the limitations of our translation method in this article. The article also uses the term extent to mean a logically or physically contiguous block of storage.