Why Recovering a Deleted Ext3 File Is Difficult . . .

来源：互联网发布：鹏业软件怎么样编辑：程序博客网时间：2024/06/05 15:02

and why you should back up important files

http://linux.sys-con.com/node/117909

BY BRIAN CARRIER AUGUST 12, 2005 03:00 PM EDT

We have all done it before. You accidentally type in thewrong argument to rm or select the wrong file for deletion. As you hit enter,you notice your mistake and your stomach drops. You reach for the backup of thesystem and realize that there isn't one.

Thereare many undelete tools for FAT and NTFS file systems, but there are few forExt3, which is currently the default file system for most Linux distributions.This is because of the way that Ext3 files are deleted. Crucial informationthat stores where the file content is located is cleared during the deletionprocess.

Inthis article, we take a low-level look at why recovery is difficult and look atsome approaches that are sometimes effective. We will use some open sourcetools for the recovery, but the techniques are not completely automated.

What Is a File?
Before we can see how to recover files, we need to look at how files arestored. Typically, file systems are located inside of a disk partition. Thepartition is usually organized into 512-byte sectors. When the partition isformatted as Ext3, consecutive sectors will be grouped into blocks, whose sizecan range from 1,024 to 4,096 bytes. The blocks are grouped together into blockgroups, whose size will be tens of thousands of blocks. Each file has datastored in three major locations: blocks, inodes, and directory entries. Thefile content is stored in blocks, which are allocated for the exclusive use ofthe file. A file is allocated as many blocks as it needs. Ideally, the filewill be allocated consecutive blocks, but this is not always possible.

Themetadata for the file is stored in an inode structure, which is located in aninode table at the beginning of a block group. There are a finite number ofinodes and each is assigned to a block group. File metadata includes thetemporal data such as the last modified, last accessed, last changed, anddeleted times. Metadata also includes the file size, user ID, group ID,permissions, and block addresses where the file content is stored.

Theaddresses of the first 12 blocks are saved in the inode and additionaladdresses are stored externally in blocks, called indirect blocks. If the filerequires many blocks and not all of the addresses can fit into one indirectblock, a double indirect block is used whose address is given in the inode. Thedouble indirect block contains addresses of single indirect blocks, whichcontain addresses of blocks with file content. There is also a triple indirectaddress in the inode that adds one more layer of pointers.

Last,the file's name is stored in a directory entry structure, which is located in ablock allocated to the file's parent directory. An Ext3 directory is similar toa file and its blocks contain a list of directory entry structures, eachcontaining the name of a file and the inode address where the file metadata isstored. When you use the ls -i command, you can see the inode address thatcorresponds to each file name. We can see the relationship between thedirectory entry, the inode, and the blocks in Figure 1.

Whena new file is created, the operating system (OS) gets to choose which blocksand inode it will allocate for the file. Linux will try to allocate the blocksand inode in the same block group as its parent directory. This causes files inthe same directory to be close together. Later we'll use this fact to restrictwhere we search for deleted data.

TheExt3 file system has a journal that records updates to the file system metadatabefore the update occurs. In case of a system crash, the OS reads the journaland will either reprocess or roll back the transactions in the journal so thatrecovery will be faster then examining each metadata structure, which is theold and slow way. Example metadata structures include the directory entriesthat store file names and inodes that store file metadata. The journal containsthe full block that is being updated, not just the value being changed. When anew file is created, the journal should contain the updated version of theblocks containing the directory entry and the inode.

Deletion Process
Several things occur when an Ext3 file is deleted from Linux. Keep in mind thatthe OS gets to choose exactly what occurs when a file is deleted and thisarticle assumes a general Linux system.

Ata minimum, the OS must mark each of the blocks, the inode, and the directoryentry as unallocated so that later files can use them. This minimal approach iswhat occurred several years ago with the Ext2 file system. In this case, therecovery process was relatively simple because the inode still contained theblock addresses for the file content and tools such as debugfs and e2undelcould easily re-create the file. This worked as long as the blocks had not beenallocated to a new file and the original content was not overwritten.

WithExt3, there is an additional step that makes recovery much more difficult. Whenthe blocks are unallocated, the file size and block addresses in the inode arecleared; therefore we can no longer determine where the file content waslocated. We can see the relationship between the directory entry, the inode,and the blocks of an unallocated file in Figure 2.

Recovery Approaches
Now that we know the components involved with files and which ones are clearedduring deletion, we can examine two approaches to file recovery (besides usinga backup). The first approach uses the application type of the deleted file andthe second approach uses data in the journal. Regardless of the approach, youshould stop using the file system because you could create a file thatoverwrites the data you are trying to recover. You can power the system off andput the drive in another Linux computer as a slave drive or boot from a LinuxCD.

Thefirst step for both techniques is to determine the deleted file's inodeaddress. This can be determined from debugfs or The Sleuth Kit (TSK). I'll givethe debugfs method here. debugfs comes with most Linux distributions and is afile system debugger. To start debugfs, you'll need to know the device name forthe partition that contains the deleted file. In my example, I have booted froma CD and the file is located on /dev/hda5:

# debugfs /dev/hda5
debugfs 1.37 (21-Mar-2005)
debugfs:

Wecan then use the cd command to change to the directory of the deleted file:

debugfs: cd /home/carrier/

Thels -d command will list the allocated and deleted files in the directory.Remember that the directory entry structure stores the name and the inode ofthe file and this listing will give us both values because neither is clearedduring the deletion process. The deleted files have their inode addresssurrounded by "<" and ">":

debugfs: ls -d
415848 (12) . 376097 (12) .. 415864 (16) .bashrc
[...]
<415926> (28) oops.dat

Thefile we are trying to recover is /home/carrier/oops.dat and we can see itpreviously allocated to inode 415,926. The "(28)" shows us that thedirectory entry structure is 28 bytes long, but we don't care about that.

File Carving Recovery
The first recovery technique, called file carving, uses signatures from thedeleted file. Many file types have standard values in the first bytes of thefile header, and this recovery technique looks for the header value of thedeleted file to determine where the file may have started. For example, JPEGfiles start with 0xffd8 and end with 0xffd9. To recover a deleted JPEG file, wewould look at the first two bytes of each block and look for one with 0xffd8 inthe first two bytes. When we find such a block, we look for a block that has 0xffd9in it. The data in between are assumed to be the file. Unfortunately, not allfile types have a standard footer signature, so determining where to end isdifficult. An example of an open source tool that does file carving is foremostand there are several commercial options as well.

Wecan run a tool like foremost on the full file system, but we'll probably end upwith way too many files, including allocated ones. We therefore want to run iton as little data as possible. The first way we can restrict the data size isto examine only the block group where the file was located. Remember thatinodes and blocks for a file are allocated to the same block group, if there isroom. In our case, we know which inode the file used and therefore we canexamine only the blocks in the same group. The imap command in debugfs willtell us to which block group an inode belongs:

debugfs: imap <415926>
Inode 415926 is part of block group 25
located at block 819426, offset 0x0a80

Theoutput of the fsstat command in TSK would also tell us this:

# fsstat /dev/hda5
[...]
Group: 25:
Inode Range: 408801 - 425152
Block Range: 819200 - 851967

Wenext need to determine the blocks that are in the block group of the deletedfile. We can see them in the previous fsstat output, but if we're using debugfs, we'll need to calculate the range. The stats command gives us the number ofblocks in each group:

debugfs: stats
[...]
Blocks per group: 32768
[...]

Sincewe are looking at block group 25, then the block range is from 819,200 (25 *32,768) to 851,967 (26 * 32,768 - 1). By focusing on only these blocks, we arelooking at 128MB instead of the full file system. Although if we can't find thefile in these blocks, we'll still need to search the full file system.

Thenext step to reduce the data we analyze is to extract the unallocated blocksfrom the file system because that is where our deleted file will be located.debugfs does not currently allow us to extract the unallocated space from onlya specific block group, so we will need to use the dls tool from TSK.

# dls /dev/hda5 819200-851867 > /mnt/unalloc.dat

Theabove command will save the unallocated blocks in block group 25 to a filenamed /mnt/unalloc.dat. Make sure that this file is on a different file systembecause otherwise you may end up overwriting your deleted file.

Now we can run the foremost tool on theunallocated data. foremost can recover only file types for which it has beenconfigured. If foremost doesn't have the header signature for the type of thedeleted file, you'll need to examine some similar files and customize theconfiguration file. We can run it as follows:

# foremost -d -i /mnt/unalloc.dat -o/mnt/output/

The -d option will try to detect whichblocks are indirect blocks and won't include them in the final output file. The/mnt/output/ directory will contain the files that could be recovered. If yourfile is not in there, you can expand your search to all unallocated blocks inthe file system instead of only the blocks in the block group.

Journal-BasedRecovery
The second method for trying to recover the files is to use the journal. Wealready saw that inode updates are first recorded in the journal, but theimportant concept here is that the entire block in which an inode is located isrecorded in the journal. Therefore, when one inode is updated, the journal willcontain copies of other inodes stored in the same block. The previous versionof our deleted file's inode may exist in the journal because another file wasupdated before the deletion.

The easiest way to look for previousversions of the inode is by using the logdump -i command in debugfs:

debugfs: logdump -i <415926>
Inode 415926 is at group 25, block 819426, offset 2688
Journal starts at block 1, transaction 104588
FS block 819426 logged at sequence 104940, journal block 2687
   (inode block for inode 415926):
   Inode: 415926 Type: regular Mode: 0664 Flags: 0x0
   User: 500 Group: 500 Size: 2048000
   [...]
   Blocks: (0+12): 843274 (IND): 843286
[...]

In this case, we found a previous copy ofthe inode and the file content blocks are listed on the last line. The lastline shows that the first block of the file is 843,274 and the next 12 blocksin the file system are the next 12 blocks in the file. The file is large andrequires an indirect block, which is located in block 843,286. So far, allblocks are consecutive and there was no fragmentation. Block 843,286 containsthe rest of the block addresses, so we should try to look at a previous versionto learn where the rest of the file is located. We can see if there is a copyin the journal using logdump -b:

debugfs: logdump -b 843286 -c

Unfortunately, we don't find a copy of theblock that contains the original list of block pointers so, if we want torecover the file, we need to assume that the remaining file content is storedin block 843,287 and onward. A more advanced approach would also consider whichblocks are currently allocated and skip over those. The data can be extractedwith tools such as dd or the Linux Disk Editor. The journal can also besearched using the jls and jcat tools from TSK.

Conclusion
File recovery with Ext3 is not a trivial matter, which reinforces the conceptof making backups of important files. If the file was not fragmented, thensearching for its header signature can be useful, but the tool needs to know toignore the indirect blocks and where to stop copying (not all files have astandard footer signature). Restricting the search to the local block group canhelp save time. The journal could be useful if files near the deleted file wererecently updated and a previous version of the inode existed, but this is notalways guaranteed and the file's indirect block may not exist.

References andBibliography

Carrier, B. "The Sleuth Kit": www.sleuthkit.org
Carrier, C. (2005). File System Forensic Analysis. Addison-Wesley.
Crane, A. "Linux Ext2fs Undeletion mini-HOWTO." February 1999:http://tldp.org/HOWTO/Ext2fs-Undeletion.html
Diedrich, O. "e2undel": http://e2undel.sourceforge.net/
Farmer, D., and Venema, W. (2004). Forensic Discovery. Addison-Wesley.
Heavner, S.D. "Linux Disk Editor": http://lde.sourceforge.net/
Kendall, K.; Kornblum, J.; and Mikus, N. "Foremost":http://foremost.sourceforge.net/
Ts'o, T. "E2fsprogs": http://e2fsprogs.sourceforge.net/
Tweedie, S. "EXT3, Journaling Filesystem." July 2000:http://olstrans.sourceforge.net/release/ OLS2000-ext3/OLS2000-ext3.html

0 0