spark-伪分布式搭建

来源:互联网 发布:服务器防火墙端口 是否 编辑:程序博客网 时间:2024/06/08 09:33

一.环境的准备(hadoop-2.8.0/spark-2.1.0/scala-2.12.)
hadoop的安装/scala的安装
二.安装配置
1.查看/etc/profile的配置

export JAVA_HOME=/opt/jdkexport JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/libexport SCALA_HOME=/home/sulei/文档/scala-2.12.1export PATH=${JAVA_HOME}/bin:$PATHexport PATH="$SCALA_HOME/bin:$PATH"

2.编辑conf/spark-env.sh

export JAVA_HOME=/opt/jdkexport SCALA_HOME=/home/sulei/文档/scala-2.12.1export SPARK_WORKER_MEMORY=1G

3.查看web的界面
这里写图片描述
4.bin/pyspark

三.简单的程序的测试
**补充挂在
sulei@sulei:/opt/spark-2.1.0$ df -lh
文件系统 容量 已用 可用 已用% 挂载点
udev 3.4G 0 3.4G 0% /dev
tmpfs 694M 9.4M 685M 2% /run
/dev/sda11 40G 16G 22G 42% /
tmpfs 3.4G 588K 3.4G 1% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 3.4G 0 3.4G 0% /sys/fs/cgroup
/dev/sda2 256M 33M 224M 13% /boot/efi
tmpfs 694M 76K 694M 1% /run/user/1000
/dev/sda9 310G 272G 38G 88% /media/sulei/32B03CC6B03C9279
**

scala> val textFile=sc.textFile("README.md")textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:24#界面中没有出现效果,原因是:懒加载#此处出错,如下scala> val textFile=sc.textFile("../README.md")textFile: org.apache.spark.rdd.RDD[String] = ../README.md MapPartitionsRDD[7] at textFile at <console>:24scala> textFile.count()res4: Long = 104                                                                

web的结果图

scala> textFile.first()res5: String = # Apache Sparkscala> textFile.take(10)res6: Array[String] = Array(# Apache Spark, "", Spark is a fast and general cluster computing system for Big Data. It provides, high-level APIs in Scala, Java, Python, and R, and an optimized engine that, supports general computation graphs for data analysis. It also supports a, rich set of higher-level tools including Spark SQL for SQL and DataFrames,, MLlib for machine learning, GraphX for graph processing,, and Spark Streaming for stream processing., "", <http://spark.apache.org/>)scala> textFile.filter(line => line.contains("Spark")).count()res7: Long = 20

四.wordcount程序

原创粉丝点击