Android安全攻击——对象序列化OOM问题

来源：互联网发布：未闻花名网络歌手歌词编辑：程序博客网时间：2024/06/06 04:38

前言

最近在项目中使用ObjectInputStream/ObjectOutputStream进行对象的序列化和反序列化，出现了OOM的问题，在解决的过程中简单的研究了一下对象的序列化和反序列化（使用Serializable接口）的过程，简单做一个记录。发现了一个持久化存储序列化数据的安全风险，可能会受到恶意攻击，导致必现的OOM。

使用场景

1 数据使用方案

持久化过程：应用在使用过程中，首先使用ObjectOutputStream的writeObject接口将对象序列化成byte数据，然后利用加密算法对序列化数据进行加密，最终将加密后的数据持久化存储到应用的数据目录下的某个文件中。

读取解析过程：首先将数据从文件中读取出来，然后用对应的解密算法解密，最后使用对应的ObjectInputStream的readObject接口将字节流解析成对应的对象。

2 遇到的问题

上述方案在使用的过程中，遇到以下两种OOM的崩溃

(1) OOM 1

java.lang.OutOfMemoryError: Failed to allocate a 942137073 byte allocation with 4194240 free bytes and 487MB until OOMat java.io.ObjectInputStream.readBlockDataLong(ObjectInputStream.java:569)at java.io.ObjectInputStream.readContent(ObjectInputStream.java:699)at java.io.ObjectInputStream.discardData(ObjectInputStream.java:636)at java.io.ObjectInputStream.readNewClassDesc(ObjectInputStream.java:1662)at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:657)at java.io.ObjectInputStream.readNewObject(ObjectInputStream.java:1782)at java.io.ObjectInputStream.readNonPrimitiveContent(ObjectInputStream.java:761)at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1983)at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1940)

(2) OOM 2

java.lang.OutOfMemoryError: Failed to allocate a 789137073 byte allocation with 2317152 free bytes and 456MB until OOM      at java.io.DataInputStream.decodeUTF      at java.io.DataInputStream.decodeUTF      at java.io.ObjectInputStream.readContent(ObjectInputStream.java:699)      at java.io.ObjectInputStream.discardData(ObjectInputStream.java:636)      at java.io.ObjectInputStream.readNewClassDesc(ObjectInputStream.java:1662)      at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:657)      at java.io.ObjectInputStream.readNewObject(ObjectInputStream.java:1782)      at java.io.ObjectInputStream.readNonPrimitiveContent(ObjectInputStream.java:761)      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1983)      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1940)

堆栈里面大致的意思是，在用ObjectInputStream的readObject接口进行对象的反序列化的时候，需要分配900M+/700M+的内存，导致上层出现OOM，众所周知，应用java层能够分配的最大内存由系统属性dalvik.vm.heapsize定义，这个值根据不同的厂商和机器都有可能是不一样的，我手上的测试机配如下：

该机器的heapsize设置为256M，也就是该机器的每个应用虚拟机能够分配的最大内存即为256M，当虚拟机需要的内存超过256M时，会出现OutOfMemoryError的问题，这边顺便记录一下，很多人用Exception去捕获所有的异常，但是这样并不能捕获OutOfMemoryError，看一下继承关系：

由继承关系可知，OutOfMemoryError是继承自Error，和Exception并不是一个继承分支，因此想要捕获包括Error在内的所有异常，必须使用Throwable去捕获。

3 分析问题

3.1 堆栈分析

上述两个OOM实际上出现的原因是一样的，下面使用OOM1来着重分析这个问题，也就是最终调用ObjectInputStream.readBlockDataLong出现的OOM问题，先看一下这个函数：

    /**     * Reads and returns an array of raw bytes with primitive data. The array     * will have up to 255 bytes. The primitive data will be in the format     * described by {@code DataOutputStream}.     *     * @return The primitive data read, as raw bytes     *     * @throws IOException     *             If an IO exception happened when reading the primitive data.     */    private byte[] readBlockData() throws IOException {        byte[] result = new byte[input.readByte() & 0xff];        input.readFully(result);        return result;    }    /**     * Reads and returns an array of raw bytes with primitive data. The array     * will have more than 255 bytes. The primitive data will be in the format     * described by {@code DataOutputStream}.     *     * @return The primitive data read, as raw bytes     *     * @throws IOException     *             If an IO exception happened when reading the primitive data.     */    private byte[] readBlockDataLong() throws IOException {        byte[] result = new byte[input.readInt()];        input.readFully(result);        return result;    }

上面贴出来了两个函数，readBlockData和readBlockDataLong函数，从函数名称分析，这两个函数的功能应该是类似的，readBlockDataLong函数像是用于读取较大数据量的数据，看一下注释，readBlockData函数用于读取数据量小于等于255的数据块，readBlockDataLong函数用于读取数据量大于255的数据块。

继续向上看堆栈，里面调用到了ObjectInputStream.readContent函数，看一下这个函数：

    /**     * Reads the content of the receiver based on the previously read token     * {@code tc}.     *     * @param tc     *            The token code for the next item in the stream     * @return the object read from the stream     *     * @throws IOException     *             If an IO exception happened when reading the class     *             descriptor.     * @throws ClassNotFoundException     *             If the class corresponding to the object being read could not     *             be found.     */    private Object readContent(byte tc) throws ClassNotFoundException,            IOException {        switch (tc) {            case TC_BLOCKDATA:                return readBlockData();            case TC_BLOCKDATALONG:                return readBlockDataLong();            case TC_CLASSDESC:                return readNewClassDesc(false);            case TC_OBJECT:                return readNewObject(false);            case TC_LONGSTRING:                return readNewLongString(false);            case TC_EXCEPTION:                Exception exc = readException();                throw new WriteAbortedException("Read an exception", exc);            case TC_RESET:                resetState();                return null;            default:                throw corruptStream(tc);        }    }

这个函数是根据不同的tc（这里面认为是token），决定以不同的格式读取tc后面的数据，这个不禁让人想起利用ObjectInputStream/ObjectOutputStream进行序列化和反序列化时应该有一个特定的格式，或者说是标准，于是google了一下，找到了Serialize进行序列化的标准，见：

Grammar for the Stream Format

该标准定义了Serialize序列化时每个部分写入时的顺序以及对应的tc，本文重点分析问题，不重点讲解Serialize序列化的格式标准，有兴趣的同学可以自己参照标准研究一下。上面的OOM问题也就大致能定位原因了：反序列化的数据中包含了TC_BLOCKDATALONG 的token，导致在进行反序列化的时候走到了readBlockDataLong函数中，再往上一层堆栈走，看一下ObjectInputStream.readNewClassDesc和ObjectInputStream.discardData函数：

    /**     * Reads a new class descriptor from the receiver. It is assumed the class     * descriptor has not been read yet (not a cyclic reference). Return the     * class descriptor read.     *     * @param unshared     *            read the object unshared     * @return The {@code ObjectStreamClass} read from the stream.     *     * @throws IOException     *             If an IO exception happened when reading the class     *             descriptor.     * @throws ClassNotFoundException     *             If a class for one of the objects could not be found     */    private ObjectStreamClass readNewClassDesc(boolean unshared)            throws ClassNotFoundException, IOException {        ObjectStreamClass newClassDesc = readClassDescriptor();        registerObjectRead(newClassDesc, descriptorHandle, unshared);        descriptorHandle = oldHandle;        primitiveData = emptyStream;        //load class...        // Consume unread class annotation data and TC_ENDBLOCKDATA        discardData();        checkedSetSuperClassDesc(newClassDesc, readClassDesc());        return newClassDesc;    }    /**     * Reads and discards block data and objects until TC_ENDBLOCKDATA is found.     *     * @throws IOException     *             If an IO exception happened when reading the optional class     *             annotation.     * @throws ClassNotFoundException     *             If the class corresponding to the class descriptor could not     *             be found.     */    private void discardData() throws ClassNotFoundException, IOException {        primitiveData = emptyStream;        boolean resolve = mustResolve;        mustResolve = false;        do {            byte tc = nextTC();            if (tc == TC_ENDBLOCKDATA) {                mustResolve = resolve;                return; // End of annotation            }            readContent(tc);        } while (true);    }

看一下ObjectInputStream.readNewClassDesc函数注释，结合相关的代码，大概可以知道该函数的主要功能是读取序列化数据中class的描述，并用classloader将对应的class加载上来，然后调用discardData函数，看一下这个函数调用上面的注释，读取和消费不需要的数据，可能是一些注解annotation数据，直到读到TC_ENDBLOCKDATA为止。看一下TC_ENDBLOCKDATA的定义：

    /**     * Tag to mark a long block of data. The long following this tag     * indicates the size of the block.     */    public static final byte TC_BLOCKDATALONG = (byte) 0x7A;

这个tc代表的后面的数据块将是一个较大的数据块，tc后面的int型数据(4个字节组成)代表的是这个数据块的数据长度。

进一步的，导致问题的原因可以总结为：利用ObjectInputStream.readObject接口进行对象的反序列化时，读取完class的相关数据，利用classloader加载完该class后，ObjectInputStream.discardData函数会尝试消耗掉反序列化时不需要的TC_ENDBLOCKDATA数据，在读取后面的4字节组成的数据长度后，调用readBlockDataLong函数创建一个int型大小的byte数组时，出现了OOM。

3.2 TC_ENDBLOCKDATA异常数据分析

要看TC_ENDBLOCKDATA数据正常情况下什么时候会被写入，要从序列化的流程ObjectOutputStream函数中查找线索，在ObjectOutputStream.java中搜索TC_ENDBLOCKDATA，看到TC_ENDBLOCKDATA仅在函数drain中被使用到，看一下该函数：

    /**     * Writes buffered data to the target stream. This is similar to {@code     * flush} but the flush is not propagated to the target stream.     *     * @throws IOException     *             if an error occurs while writing to the target stream.     */    protected void drain() throws IOException {        if (primitiveTypes == null || primitiveTypesBuffer == null) {            return;        }        // If we got here we have a Stream previously created        int offset = 0;        byte[] written = primitiveTypesBuffer.toByteArray();        // Normalize the primitive data        while (offset < written.length) {            int toWrite = written.length - offset > 1024 ? 1024                    : written.length - offset;            if (toWrite < 256) {                output.writeByte(TC_BLOCKDATA);                output.writeByte((byte) toWrite);            } else {                output.writeByte(TC_BLOCKDATALONG);                output.writeInt(toWrite);            }            // write primitive types we had and the marker of end-of-buffer            output.write(written, offset, toWrite);            offset += toWrite;        }        // and now we're clean to a state where we can write an object        primitiveTypes = null;        primitiveTypesBuffer = null;    }

分析一下该函数可知，TC_BLOCKDATALONG标记和后面int型的长度字段是一起被写入到output流中的，再看上面的长度最大不会超过1024，当数据量较大时，整个数据块被分成多个大小为1024字节的TC_BLOCKDATALONG数据库写入到output流中，也就是说正常情况下，系统中TC_BLOCKDATALONG后面的长度字段不可能超过1024，因此，可以得出结论，上述出现OOM的过程中应该是最终用来进行反序列化的数据本身是有问题的，进一步的，极有可能是在数据存储、数据解密的过程中出现的问题。

3.3 异常复现

经过上述分析可知，最终进行反序列的数据有问题，导致OOM，顺着这个思路，直接看一下ObjectInputStream.writeClassDesc函数：

    /**     * Write a class descriptor {@code classDesc} (an     * {@code ObjectStreamClass}) to the stream.     *     * @param classDesc     *            The class descriptor (an {@code ObjectStreamClass}) to     *            be dumped     * @param unshared     *            Write the object unshared     * @return the handle assigned to the class descriptor     *     * @throws IOException     *             If an IO exception happened when writing the class     *             descriptor.     */    private int writeClassDesc(ObjectStreamClass classDesc, boolean unshared) throws IOException {        if (classDesc == null) {            writeNull();            return -1;        }        output.writeByte(TC_CLASSDESC);        writeClassDescriptor(classDesc);            annotateClass(classToWrite);            drain(); // flush primitive types in the annotation            output.writeByte(TC_ENDBLOCKDATA);            writeClassDesc(classDesc.getSuperclass(), unshared);                return handle;    }    /**     * Writes optional information for class {@code aClass} to the output     * stream. This optional data can be read when deserializing the class     * descriptor (ObjectStreamClass) for this class from an input stream. By     * default, no extra data is saved.     *     * @param aClass     *            the class to annotate.     * @throws IOException     *             if an error occurs while writing to the target stream.     * @see ObjectInputStream#resolveClass(ObjectStreamClass)     */    protected void annotateClass(Class<?> aClass) throws IOException {        // By default no extra info is saved. Subclasses can override    }

看下这个函数，里面调用writeClassDescriptor函数将class的描述写入到output中，然后调用annotateClass函数，接着写入TC_ENDBLOCKDATA，作为class描述的结束符，上面的ObjectInputStream.readNewClassDesc函数在读出class的描述后，会调用discardData函数，这个函数会检查在class的描述后面是否存在对应的tc。

根据这个思路可以继承ObjectInputStream函数，并在annotateClass函数中写入(TC_BLOCKDATALONG, 数据长度)，当写入的数据长度较大时，会出现必现的OOM，代码如下：

import android.util.Log;import java.io.DataOutputStream;import java.io.IOException;import java.io.ObjectOutputStream;import java.io.OutputStream;import java.lang.reflect.Field;public class AnObjectOutputStream extends ObjectOutputStream {    private static final String TAG = "AnObjectOutputStream";    /**     * 复现堆栈java.io.ObjectInputStream.readBlockDataLong     * 默认复现这个堆栈     */    private static byte[] DISCARD_BYTES_LONG_DATA = new byte[] {            0x7a, 0x7a, 0x7a, 0x67, 0x67    };    /**     * 复现堆栈 java.io.DataInputStream.decodeUTF     *         java.io.DataInputStream.decodeUTF     *         java.io.ObjectInputStream.readNewLongString     */    private static byte[] DISCARD_BYTES_LONG_STRING = new byte[] {            0x7c, 0x7a, 0x7a, 0x67, 0x67    };    private DataOutputStream mInnerOutput;        private boolean mStackBlockData = true;    public AnObjectOutputStream(OutputStream input) throws IOException {        super(input);    }    /**     * 调用setStackBlockData(false),将复现下面的堆栈     * 复现堆栈 java.io.DataInputStream.decodeUTF     *         java.io.DataInputStream.decodeUTF     *         java.io.ObjectInputStream.readNewLongString     */    public void setStackBlockData(boolean blockData) {        mStackBlockData = blockData;    }    protected void annotateClass(Class<?> aClass) throws IOException {        // By default no extra info is saved. Subclasses can override        Log.i(TAG, "annotateClass aClass:" + aClass);        installOutputStream();        if (mInnerOutput == null) {            return;        }        if (mStackBlockData) {            mInnerOutput.write(DISCARD_BYTES_LONG_DATA);        } else {            mInnerOutput.write(DISCARD_BYTES_LONG_STRING);        }        Log.i(TAG, "annotateClass write success");    }    private void installOutputStream() {        Object obj = null;        try {            Field field = getClass().getSuperclass().getDeclaredField("output");            field.setAccessible(true);            obj = field.get(this);        } catch (Exception e) {            e.printStackTrace();        }        if (obj == null) {            Log.i(TAG, "installOutputStream failed");            return;        }        mInnerOutput = (DataOutputStream)obj;    }}

由于ObjectOutputStream中的output成员属性为private，因此需要借助反射。果然，使用AnObjectOutputStream替代常规的ObjectOutputStream，运行一下必现的OOM，完整的调用如下：

import com.example.testpopupwindow.stream.AnObjectOutputStream;import java.io.ByteArrayInputStream;import java.io.ByteArrayOutputStream;import java.io.IOException;import java.io.ObjectInputStream;import java.io.ObjectOutputStream;import java.io.Serializable;public class SerializeThread extends Thread {    private static final String TAG = "SerializeThread";    private Employee mEmployee;    public void run() {        mEmployee = Employee.create("test");        Object obj = null;        try {            byte[] serializeRes =  serialize();            obj = unserialize(serializeRes);        } catch (IOException e) {            e.printStackTrace();        }    }    private byte[] serialize() throws IOException {        ByteArrayOutputStream arrOs = new ByteArrayOutputStream();        ObjectOutputStream oos = new AnObjectOutputStream(arrOs);        oos.writeObject(mEmployee);        oos.flush();        byte[] outArr = arrOs.toByteArray();        oos.close();        return outArr;    }    private Object unserialize(byte[] serializedata) throws IOException {        ByteArrayInputStream byteArrayInputStream = null;        ObjectInputStream objectInputStream = null;        try {            byteArrayInputStream = new ByteArrayInputStream(serializedata);            objectInputStream = new ObjectInputStream(byteArrayInputStream);            return objectInputStream.readObject();        } catch (Exception e) {        }        return null;    }    /**     * test error....     */    public static class Employee implements Serializable {        String mName;        /**         * test error....         */        private Employee(String name) {            mName = name;        }        public String toString() {            return "Employee mName:" + mName;        }        public static Employee create(String name) {            return  new  Employee(name);        }    }}

只要调用new SerializeThread().start()，即会出现下面的OOM堆栈：

3.4 安全问题

由上面的OOM问题，引出来一个ObjectInputStream/ObjectOutputStream实现Serialize序列化的安全问题，使用默认的ObjectOutputStream方式生成序列化数据，保存在本地后，如果被恶意在指定位置写入类似上述的字段，会导致应用在利用被修改后的序列化数据进行反序列化时，出现必现的崩溃。假设上述Employee在被序列化后生成的文件16进制数据如下：

插入的代码如下：

    private byte[] mDiscardBytes = new byte[] {            0x7a, 0x7a, 0x7a, 0x67, 0x67    };    private byte[] modifyBlockDataSize(byte[] content) {        for (int i=0; i<content.length; i++) {            if (content[i] == (byte)0x78) {                return insertLongBlockData(content, i);            }        }        return null;    }    private byte[] insertLongBlockData(byte[] data, int insertPos) {        byte[] newArray = new byte[data.length + mDiscardBytes.length];        System.arraycopy(data, 0, newArray, 0, insertPos);        System.arraycopy(mDiscardBytes, 0, newArray, insertPos, mDiscardBytes.length);        System.arraycopy(data, insertPos, newArray, insertPos + mDiscardBytes.length, data.length - insertPos);        return newArray;    }

经过这个处理以后，得出的序列化数据如下：

被圈出来的部分为插入的数据，经过上述插入后，反序列化以后会造成应用必现的OOM崩溃。

至于上面为什么要判断0x78，这个要参考一下ObjectInputStream.writeClassDesc和ObjectInputStream.readNewClassDesc函数，readNewClassDesc在读取完class的描述信息后，会尝试调用discardData方法读以TC_ENDBLOCKDATA(0x78)结尾之类的annoation之类的信息，而在discardData方法中会触发检查和读取TC_BLOCKDATALONG或者TC_LONGSTRING，因此只要在0x78前面插入一段TC_BLOCKDATALONG或者TC_LONGSTRING的tc和长度数据即可。

3.5 总结

（1）使用ObjectInputStream/ObjectOutputStream进行对象的序列化和反序列化出现的OOM问题，一般都是因为反序列化时的数据有问题；

（2）使用ObjectInputStream/ObjectOutputStream存在一定的安全风险，注意最起码要对序列化以后的数据进行加密

（3）在ObjectInputStream进行反序列化的时候，要用Throwable捕获包括error在内的所有异常，以便捕获OOM后继续运行

0 0