Spark Q&A : Kryo serialization failed: Buffer overflow

来源:互联网 发布:s5700交换机ip mac绑定 编辑:程序博客网 时间:2024/06/02 01:13

Q1 . Spark运行Job 报错

org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 2, required: 4. To avoid this, increase spark.kryoserializer.buffer.max valueat org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:299)...

A1: 根据报错信息进行逆向代码分析
org.apache.spark.serializer.KryoSerializerInstance中查看serialize方法源码

override def serialize[T: ClassTag](t: T): ByteBuffer = {    output.clear()    val kryo = borrowKryo()    try {      kryo.writeClassAndObject(output, t)    } catch {      case e: KryoException if e.getMessage.startsWith("Buffer overflow") =>        throw new SparkException(s"Kryo serialization failed: ${e.getMessage}. To avoid this, " +          "increase spark.kryoserializer.buffer.max value.")    } finally {      releaseKryo(kryo)    }    ByteBuffer.wrap(output.toBytes)  }

报错的位置是在try-catch块中涉及的writeClassAndObject方法,继续跟进:

public void writeClassAndObject (Output output, Object object) {        if (output == null) throw new IllegalArgumentException("output cannot be null.");        beginObject();        try {            if (object == null) {                writeClass(output, null);                return;            }            Registration registration = writeClass(output, object.getClass());            if (references && writeReferenceOrNull(output, object, false)) return;            if (TRACE || (DEBUG && depth == 1)) log("Write", object);            registration.getSerializer().write(this, output, object);        } finally {            if (--depth == 0 && autoReset) reset();        }    }

这里因为没有详细的日志可以看出哪里报的错,于是将该方法中涉及的方法都跟进看了看, 我的理解是由于writeReferenceOrNull报错, 跟进该方法:

boolean writeReferenceOrNull (Output output, Object object, boolean mayBeNull) {        if (object == null) {            if (TRACE || (DEBUG && depth == 1)) log("Write", null);            output.writeByte(Kryo.NULL);            return true;        }        if (!referenceResolver.useReferences(object.getClass())) {            if (mayBeNull) output.writeByte(Kryo.NOT_NULL);            return false;        }        // Determine if this object has already been seen in this object graph.        int id = referenceResolver.getWrittenId(object);        // If not the first time encountered, only write reference ID.        if (id != -1) {            if (DEBUG) debug("kryo", "Write object reference " + id + ": " + string(object));            output.writeInt(id + 2, true); // + 2 because 0 and 1 are used for NULL and NOT_NULL. // Q!            return true;        }        // Otherwise write NOT_NULL and then the object bytes.        id = referenceResolver.addWrittenObject(object);        output.writeByte(NOT_NULL);        if (TRACE) trace("kryo", "Write initial object reference " + id + ": " + string(object));        return false;    }

同上,跟进writeInt方法:

public int writeInt (int value, boolean optimizePositive) throws KryoException {        if (!optimizePositive) value = (value << 1) ^ (value >> 31);        if (value >>> 7 == 0) {            require(1);            buffer[position++] = (byte)value;            return 1;        }        if (value >>> 14 == 0) {            require(2);            buffer[position++] = (byte)((value & 0x7F) | 0x80);            buffer[position++] = (byte)(value >>> 7);            return 2;        }        if (value >>> 21 == 0) {            require(3);            buffer[position++] = (byte)((value & 0x7F) | 0x80);            buffer[position++] = (byte)(value >>> 7 | 0x80);            buffer[position++] = (byte)(value >>> 14);            return 3;        }        if (value >>> 28 == 0) {            require(4);            buffer[position++] = (byte)((value & 0x7F) | 0x80);            buffer[position++] = (byte)(value >>> 7 | 0x80);            buffer[position++] = (byte)(value >>> 14 | 0x80);            buffer[position++] = (byte)(value >>> 21);            return 4;        }        require(5);        buffer[position++] = (byte)((value & 0x7F) | 0x80);        buffer[position++] = (byte)(value >>> 7 | 0x80);        buffer[position++] = (byte)(value >>> 14 | 0x80);        buffer[position++] = (byte)(value >>> 21 | 0x80);        buffer[position++] = (byte)(value >>> 28);        return 5;    }

最终调用com.esotericsoftware.kryo.iorequire方法:

private boolean require(int required) throws KryoException {        if(this.capacity - this.position >= required) {            return false;        } else if(required > this.maxCapacity) {            throw new KryoException("Buffer overflow. Max capacity: " + this.maxCapacity + ", required: " + required);        } else {            this.flush();            while(this.capacity - this.position < required) {                if(this.capacity == this.maxCapacity) {                    throw new KryoException("Buffer overflow. Available: " + (this.capacity - this.position) + ", required: " + required);                }                this.capacity = Math.min(this.capacity * 2, this.maxCapacity);                if(this.capacity < 0) {                    this.capacity = this.maxCapacity;                }                byte[] newBuffer = new byte[this.capacity];                System.arraycopy(this.buffer, 0, newBuffer, 0, this.position);                this.buffer = newBuffer;            }            return true;        }    }

问题的关键在于output在写入id时, 因为id+2的值较大(value >>> 28 == 0), 需要申请4个byte
代码中可以看出,maxCapacity值是id的上限值,超过该值就会报错.
maxCapacity则是由如下的逻辑(由下往上描述)进行设置:

private lazy val output = ks.newKryoOutput() \\ 新建KryoOutputdef newKryoOutput(): KryoOutput = new KryoOutput(bufferSize, math.max(bufferSize, maxBufferSize)) \\ 设置maxBufferSizeprivate val bufferSize = ByteUnit.KiB.toBytes(bufferSizeKb).toInt \\ buffer的正常大小private val bufferSizeKb = conf.getSizeAsKb("spark.kryoserializer.buffer", "64k")private val maxBufferSize = ByteUnit.MiB.toBytes(maxBufferSizeMb).toInt \\ buffer的最大值val maxBufferSizeMb = conf.getSizeAsMb("spark.kryoserializer.buffer.max", "64m").toInt

得出结论, spark.kryoserializer.buffer.max没有设置对. 最大可以设置为2048mb.

原创粉丝点击