Spark 中的序列化
来源:互联网 发布:双色球预测软件下载 编辑:程序博客网 时间:2024/06/05 14:12
1.序列化常用于网络传输和数据持久化以便于存储和传输,Spark通过两种方式来创建序列化器
val serializer = instantiateClassFromConf[Serializer](
"spark.serializer", "org.apache.spark.serializer.JavaSerializer")
logDebug(s"Using serializer: ${serializer.getClass}")
//暂时在blockManager中没有用到该序列化方式
val closureSerializer = instantiateClassFromConf[Serializer](
"spark.closure.serializer", "org.apache.spark.serializer.JavaSerializer")
2.Spark中两种典型的序列化场景
序列化场景A:执行map等RDD操作时,首先执行cleanF,内部左F解析和F序列化
private def ensureSerializable(func: AnyRef) {
try {
if (SparkEnv.get != null) {
SparkEnv.get.closureSerializer.newInstance().serialize(func)
}valclosureSerializer = instantiateClassFromConf[Serializer](结论:
"spark.closure.serializer", "org.apache.spark.serializer.JavaSerializer")spark.closure.serializer配置决定了函数序列化的方式
序列化场景B:blockManager中
val blockManager = new BlockManager(executorId, rpcEnv, blockManagerMaster,
serializer, conf, memoryManager, mapOutputTracker, shuffleManager,
blockTransferService, securityManager, numUsableCores)val serializer = instantiateClassFromConf[Serializer](
"spark.serializer", "org.apache.spark.serializer.JavaSerializer")结论:spark.serializer决定了BlockManager工作中序列化的方式3.Spark默认采用Java的序列化器,本人建议采用Kryo序列化提高性能,下面分析下这两种序列化机制的异同
A.首先看看java的序列化机制
override def serialize[T: ClassTag](t: T): ByteBuffer = {
val bos = new ByteArrayOutputStream()
val out = serializeStream(bos)
out.writeObject(t)
out.close()
ByteBuffer.wrap(bos.toByteArray)
}
B.再看看Kyro的序列化机制
//创建两个延迟执行的工作流(Kyro输入输出流)
private lazy val output = ks.newKryoOutput()
private lazy val input = new KryoInput()
override def serialize[T: ClassTag](t: T): ByteBuffer = {//设置position=0 && total=0
output.clear()//获取Kryo对象
val kryo = borrowKryo()
try {
//开始对t函数进行序列化
kryo.writeClassAndObject(output, t)
} catch {
....
} finally {//将当前Kryo作为缓存
releaseKryo(kryo)
}
ByteBuffer.wrap(output.toBytes)
}===================================Kryo分析华丽丽的分界线============================================
A:获取Kryo对象并初始化KryoSerializer的过程首先查看是否有缓存的Kryo对象,如果有就rest清除掉kryo的状态,将缓存设置为null后返回
如果缓存中没有就创建一个新的Kryo,新建Kryo是一个繁琐的过程,包括新建Kryo对象,
public Kryo () {
this(new DefaultClassResolver(), new MapReferenceResolver());
}public Kryo (ClassResolver classResolver, ReferenceResolver referenceResolver) {
if (classResolver == null) throw new IllegalArgumentException("classResolver cannot be null.");
this.classResolver = classResolver;
classResolver.setKryo(this);
this.referenceResolver = referenceResolver;
if (referenceResolver != null) {
referenceResolver.setKryo(this);
references = true;
}
addDefaultSerializer(byte[].class, ByteArraySerializer.class);
addDefaultSerializer(char[].class, CharArraySerializer.class);
addDefaultSerializer(short[].class, ShortArraySerializer.class);
addDefaultSerializer(int[].class, IntArraySerializer.class);
addDefaultSerializer(long[].class, LongArraySerializer.class);
addDefaultSerializer(float[].class, FloatArraySerializer.class);
addDefaultSerializer(double[].class, DoubleArraySerializer.class);
addDefaultSerializer(boolean[].class, BooleanArraySerializer.class);
addDefaultSerializer(String[].class, StringArraySerializer.class);
addDefaultSerializer(Object[].class, ObjectArraySerializer.class);
addDefaultSerializer(BigInteger.class, BigIntegerSerializer.class);
addDefaultSerializer(BigDecimal.class, BigDecimalSerializer.class);
addDefaultSerializer(Class.class, ClassSerializer.class);
addDefaultSerializer(Date.class, DateSerializer.class);
addDefaultSerializer(Enum.class, EnumSerializer.class);
addDefaultSerializer(EnumSet.class, EnumSetSerializer.class);
addDefaultSerializer(Currency.class, CurrencySerializer.class);
addDefaultSerializer(StringBuffer.class, StringBufferSerializer.class);
addDefaultSerializer(StringBuilder.class, StringBuilderSerializer.class);
addDefaultSerializer(Collections.EMPTY_LIST.getClass(), CollectionsEmptyListSerializer.class);
addDefaultSerializer(Collections.EMPTY_MAP.getClass(), CollectionsEmptyMapSerializer.class);
addDefaultSerializer(Collections.EMPTY_SET.getClass(), CollectionsEmptySetSerializer.class);
addDefaultSerializer(Collections.singletonList(null).getClass(), CollectionsSingletonListSerializer.class);
addDefaultSerializer(Collections.singletonMap(null, null).getClass(), CollectionsSingletonMapSerializer.class);
addDefaultSerializer(Collections.singleton(null).getClass(), CollectionsSingletonSetSerializer.class);
addDefaultSerializer(Collection.class, CollectionSerializer.class);
addDefaultSerializer(TreeMap.class, TreeMapSerializer.class);
addDefaultSerializer(Map.class, MapSerializer.class);
addDefaultSerializer(KryoSerializable.class, KryoSerializableSerializer.class);
addDefaultSerializer(TimeZone.class, TimeZoneSerializer.class);
addDefaultSerializer(Calendar.class, CalendarSerializer.class);
lowPriorityDefaultSerializerCount = defaultSerializers.size();
// Primitives and string. Primitive wrappers automatically use the same registration as primitives.
register(int.class, new IntSerializer());
register(String.class, new StringSerializer());
register(float.class, new FloatSerializer());
register(boolean.class, new BooleanSerializer());
register(byte.class, new ByteSerializer());
register(char.class, new CharSerializer());
register(short.class, new ShortSerializer());
register(long.class, new LongSerializer());
register(double.class, new DoubleSerializer());
}注册Spark应用中必须使用的类型,比如
kryo.register(classOf[HttpBroadcast[_]], new KryoJavaSerializer())kryo.register(classOf[Array[Tuple2[Any, Any]]])
kryo.register(classOf[GenericRecord], new GenericAvroSerializer(avroSchemas))结论:register方法有很多种重载,能够给同一个class注册不同的序列化器,序列化器决定了执行序列化(读,写)的方法,register过程实际上是从ObjectMap<Class, Registration>中根据Class获取对应的Registratio然后更新Registratio中的Serializer,若不存在Value则使用class,Serializer,创建一个新的Registratio,然后完成注册给IntMap<Registration>和ObjectMap<Class, Registration>B:序列化函数F的过程
Registration registration = writeClass(output, object.getClass());
//获取到序列化器后,最终使用 Kryo定制化的Serializer进行序列化
registration.getSerializer().write(this, output, object);
writeClass的过程,实际上调用classResolver的writeClass方法,这个classResolver就是实例化Kryo时所用的DefaultClassResolver,然后调用DefaultClassResolver的writeClass方法
classResolver.writeClass(output, type);
DefaultClassResolver的writeClass方法
Registration registration = kryo.getRegistration(type);根据不同的类型来获取Registration,如果原来的
ObjectMap<Class, Registration>中没有Registration,则会进行类型变幻
if (registration == null) {
if (Proxy.isProxyClass(type)) {
// If a Proxy class, treat it like an InvocationHandler because the concrete class for a proxy is generated.
registration = getRegistration(InvocationHandler.class);
} else if (!type.isEnum() && Enum.class.isAssignableFrom(type)) {
// This handles an enum value that is an inner class. Eg: enum A {b{}};
registration = getRegistration(type.getEnclosingClass());
} else if (EnumSet.class.isAssignableFrom(type)) {
registration = classResolver.getRegistration(EnumSet.class);
}
if (registration == null) {
if (registrationRequired) {
throw new IllegalArgumentException("Class is not registered: " + className(type)
+ "\nNote: To register this class use: kryo.register(" + className(type) + ".class);");
}
registration = classResolver.registerImplicit(type);
}
}========================Spark 序列化 精华分界线==================================================
通上面的过程,已经弄清楚了spark序列化机制的来龙去脉,问道解惑,神明自得
0 0
- Spark 中的序列化
- Spark中的序列化机制
- spark中的序列化器
- spark 未序列化
- spark序列化问题解决
- spark序列化溢出
- spark Task序列化问题
- Spark之 KryoSerializer序列化
- spark未序列化问题
- [Spark优化]在Spark中使用Kryo序列化
- spark学习-33-Spark的SerializerManager序列化管理器
- Spark性能优化第四季-序列化
- Spark闭包与序列化
- Spark Task序列化代码分析
- Spark闭包与序列化
- Spark性能优化第四季-序列化
- Spark Task序列化代码分析
- spark压缩和序列化相关
- SQL Server执行计划的理解
- 农村调查笔记:勤劳未必致富 癌症病人越来越多
- 移动网页设计9大原则——第1部分
- 趋势科技搭建工控蜜罐系统
- spring整合Redis 入门例子
- Spark 中的序列化
- iOS开发之--- NSURLProtocol
- <context-param>与<init-param>的区别与作用?
- UILabel设置多种字体、颜色
- Android studio 导入项目,界面停在"Building 'project name' gradle project info"卡死
- CJSON学习笔记(二)
- Spark中的Driver本质
- 在systemd(CentOS7)自启动zookeeper
- static函数与普通函数