spark-伪分布式搭建
来源:互联网 发布:服务器防火墙端口 是否 编辑:程序博客网 时间:2024/06/08 09:33
一.环境的准备(hadoop-2.8.0/spark-2.1.0/scala-2.12.)
hadoop的安装/scala的安装
二.安装配置
1.查看/etc/profile的配置
export JAVA_HOME=/opt/jdkexport JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/libexport SCALA_HOME=/home/sulei/文档/scala-2.12.1export PATH=${JAVA_HOME}/bin:$PATHexport PATH="$SCALA_HOME/bin:$PATH"
2.编辑conf/spark-env.sh
export JAVA_HOME=/opt/jdkexport SCALA_HOME=/home/sulei/文档/scala-2.12.1export SPARK_WORKER_MEMORY=1G
3.查看web的界面
4.bin/pyspark
三.简单的程序的测试
**补充挂在 sulei@sulei:/opt/spark-2.1.0$ df -lh
**
文件系统 容量 已用 可用 已用% 挂载点
udev 3.4G 0 3.4G 0% /dev
tmpfs 694M 9.4M 685M 2% /run
/dev/sda11 40G 16G 22G 42% /
tmpfs 3.4G 588K 3.4G 1% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 3.4G 0 3.4G 0% /sys/fs/cgroup
/dev/sda2 256M 33M 224M 13% /boot/efi
tmpfs 694M 76K 694M 1% /run/user/1000
/dev/sda9 310G 272G 38G 88% /media/sulei/32B03CC6B03C9279
scala> val textFile=sc.textFile("README.md")textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:24#界面中没有出现效果,原因是:懒加载#此处出错,如下scala> val textFile=sc.textFile("../README.md")textFile: org.apache.spark.rdd.RDD[String] = ../README.md MapPartitionsRDD[7] at textFile at <console>:24scala> textFile.count()res4: Long = 104
web的结果图
scala> textFile.first()res5: String = # Apache Sparkscala> textFile.take(10)res6: Array[String] = Array(# Apache Spark, "", Spark is a fast and general cluster computing system for Big Data. It provides, high-level APIs in Scala, Java, Python, and R, and an optimized engine that, supports general computation graphs for data analysis. It also supports a, rich set of higher-level tools including Spark SQL for SQL and DataFrames,, MLlib for machine learning, GraphX for graph processing,, and Spark Streaming for stream processing., "", <http://spark.apache.org/>)scala> textFile.filter(line => line.contains("Spark")).count()res7: Long = 20
四.wordcount程序
- spark-伪分布式搭建
- Spark伪分布式搭建
- spark分布式,伪分布式搭建
- Spark伪分布式环境搭建
- Spark伪分布式环境搭建
- spark环境搭建,伪分布式、集群
- 构建Spark分布式集群第一步:搭建Hadoop伪分布式环境
- 安装spark伪分布式
- spark伪分布式安装
- hadoop伪分布式搭建
- hadoop伪分布式搭建
- hadoop 伪分布式搭建
- Hbase伪分布式搭建
- hbase伪分布式搭建
- hadoop伪分布式搭建
- hadoop伪分布式搭建
- Hadoop伪分布式搭建
- hbase伪分布式搭建
- php安装memcached扩展
- mxnet-install using source code
- Java 8 Lambdas实现原理
- 序列操作(线段树)
- python写算法题:leetcode: 29. Divide Two Integers
- spark-伪分布式搭建
- 【自用】javanote170704(java基础学习)
- JavaScript02—JavaScript对象
- java多态
- 电商图片降质--nginx解决方案
- 遗传算法的matlab实现
- 基于梯度上升算法的Logistic回归
- 机器学习---opencv实现简单的KNN算法
- LeetCode 71 Simplify Path (栈)