Spark2 DataFrameStatFunctions探索性数据统计分析
来源:互联网 发布:linux 激活网卡 编辑:程序博客网 时间:2024/06/05 18:13
data数据源,请参考我的博客http://blog.csdn.net/hadoop_spark_storm/article/details/53412598
import org.apache.spark.sql.DataFrameStatFunctions
查看字段中频繁元素的集合
val colArray1=Array("affairs", "gender", "age", "yearsmarried")data.stat.freqItems(colArray1).show(10,truncate=false)+-------------------------------+----------------+------------------------------------------------------+-----------------------------------------------+|affairs_freqItems |gender_freqItems|age_freqItems |yearsmarried_freqItems |+-------------------------------+----------------+------------------------------------------------------+-----------------------------------------------+|[2.0, 7.0, 1.0, 3.0, 12.0, 0.0]|[male, female] |[32.0, 47.0, 22.0, 52.0, 37.0, 17.5, 27.0, 57.0, 42.0]|[0.75, 0.125, 1.5, 0.417, 4.0, 7.0, 10.0, 15.0]|+-------------------------------+----------------+------------------------------------------------------+-----------------------------------------------+val colArray2=Array("children", "religiousness", "education", "occupation", "rating")data.stat.freqItems(colArray2).show(10,truncate=false)+------------------+-------------------------+-----------------------------------------+-----------------------------------+-------------------------+|children_freqItems|religiousness_freqItems |education_freqItems |occupation_freqItems |rating_freqItems |+------------------+-------------------------+-----------------------------------------+-----------------------------------+-------------------------+|[no, yes] |[2.0, 5.0, 4.0, 1.0, 3.0]|[17.0, 20.0, 14.0, 16.0, 9.0, 18.0, 12.0]|[2.0, 5.0, 4.0, 7.0, 1.0, 3.0, 6.0]|[2.0, 5.0, 4.0, 1.0, 3.0]|+------------------+-------------------------+-----------------------------------------+-----------------------------------+-------------------------+
相关系数
val df = Range(0,10,step=1).toDF("id").withColumn("rand1", rand(seed=10)).withColumn("rand2", rand(seed=27))df: org.apache.spark.sql.DataFrame = [id: int, rand1: double ... 1 more field]df.show+---+-------------------+-------------------+| id| rand1| rand2|+---+-------------------+-------------------+| 0|0.41371264720975787| 0.714105256846827|| 1| 0.7311719281896606| 0.8143487574232506|| 2| 0.9031701155118229| 0.5282207324381174|| 3|0.09430205113458567| 0.4420100497826609|| 4|0.38340505276222947| 0.9387162206758006|| 5| 0.5569246135523511| 0.6398126862647711|| 6| 0.4977441406613893| 0.9895498513115722|| 7| 0.2076666106201438| 0.3398720242725498|| 8| 0.9571919406508957|0.15042237695815963|| 9| 0.7429395461204413| 0.7302723457066639|+---+-------------------+-------------------+df.stat.corr("rand1", "rand2", "pearson")res24: Double = -0.10993962467082698
0 0
- Spark2 DataFrameStatFunctions探索性数据统计分析
- 基于R统计分析——探索性数据分析
- 地统计分析笔记——探索数据
- 企业数据统计分析工作
- 企业数据统计分析工作
- 数据统计分析调研结果
- 对数据进行统计分析
- 网站数据统计分析工具
- 数据统计分析常用指标
- MATLAB 多元数据统计分析
- 统计分析基础-描述数据
- PHP+Hadoop数据统计分析
- 数据的统计分析
- 数据统计分析资料汇总
- 探索性数据分析
- 探索性数据分析
- SPSS——描述性统计分析——探索性分析
- spark2
- 块设备驱动(2)
- Java中static的使用
- UIViewController生命周期
- socket同时读写问题
- Activity 启动过程全解析
- Spark2 DataFrameStatFunctions探索性数据统计分析
- C++杂记:“error LNK1169: 找到一个或多个多重定义的符号”的解决方法
- ControlGet获取控件可见属性
- asp.net输出js到页面
- (最新android studio 2.2.2)如何在已有工程下新建一个module
- SQL: IN VS EXISTS
- Android高效率实现弹出带动画效果的对话框,仿照微信对话框效果
- Yii2.0-advanced-5—添加重复密码和验证码
- AttributeError: 'NoneType' object has no attribute 'sc' 解决方法(二)