程序博客网 > 松下触摸屏软件说明书

RDD的属性

来源：互联网发布：松下触摸屏软件说明书编辑：程序博客网时间：2024/06/03 18:47

RDD resilient Distributed Dataset

properties：

Immutable
lazy evaluated
Cacheable
Type inferred

What's Immutable?

once created never changes
Big data by default immutable in nature
Immutability helps to： (1) Parallelize; (2) Caching

Why Big Data is immutable?

Parallelize for free, no need to lock;
Caching is safe， no worry for other change
immutability is about value not about reference

Immutability in collections

uses transformation for change. e.g. MAP
creates a new copy of collection leaves collection intact.
uses loop for updating mutable collections in place

Chanllenges of Immutability

good for parallelism but no good for space
multiple transformations result in: (1) Multiple of copies of data; (2) multiple passes of data
poor performance for multiple of copies and passes of data.

Get lazy for the chanllenges

don't computing transformations till it's need
defers evaluation
separate execution from evaluation
multiple transformations are combined in one

Laziness and immutability

you can be lazy only if the underneath data is immutable
you cannot combine transformation if transformation has side effect
combining laziness and immutability gives better performance and distributed processing

Chanllenges of Laziness :type inference

Laziness poses chanllenges in terms of data type
if laziness deters execution, determining the type of variable becomes chanllenging
if we can't determine the right type, it allows to have semantic issues
running big data programs and getting semantics errors are not fun.

Type inference

part of compiler to determining the type by value
as all the transformation are side effect free, we can determine the type by operation; v1.count() inferred as Int
every transformation has specific return type; map array gets array
having type inference relieves you think about representation for many transforms

Caching

immutable data allows you to cache data for long time
lazy transformation allows to recreate data on failure; from linear
transformations can be saved also; as linear
caching data improves execution engine performance

RDD means big collection of data with above properties.

0 0

松下触摸屏软件说明书

松下触摸屏软件说明书

原创粉丝点击

热门问题 老师的惩罚人脸识别我在镇武司摸鱼那些年重生之率土为王我在大康的咸鱼生活盘龙之生命进化天生仙种凡人之先天五行春回大明朝姑娘不必设防，我是瞎子竹马弄青梅涟涟不见复关泣涕涟涟涟源涟源市涟源邮编涟源红网涟源市邮编涟源龙山湖南涟源娄底到涟源涟源人流涟源在线涟源市属于哪个市涟源一中校花许倩照片涟源钢铁集团有限公司湖南涟源麻辣豆腐干涟源邮政编码涟源属于哪个地区株洲到涟源火车时刻表涟漪涟漪读音叶涟漪涟漪的读音涟漪的意思涟漪怎么读爱的涟漪涟漪吧涟漪水站涟漪涟漪涟漪效应涟漪饮用水涟漪纯净水涟漪电话涟漪app 指尖的涟漪泛起涟漪命起涟漪涟漪下载涟漪桶装水清月涟漪涟漪往事