使用Eclipse IDE搭建Apache Spark的Java开发环境
来源:互联网 发布:在线重装mac系统 编辑:程序博客网 时间:2024/06/14 07:18
前言
本文介绍如何使用Eclipse IDE搭建Apache Spark的Java开发环境。
由于本文不会涉及具体的软件安装过程以及Eclipse相关功能的说明,所以需要具备一定的使用Eclipse IDE开发Java应用的经验。
实验环境
软件安装
Apache Spark
从Apache Spark官网下载Spark 2.1.0安装包。
下载地址:http://spark.apache.org/downloads.html
注1:在下载选项中“2. Choose a package type”我们选择“Pre-built for Hadoop 2.6”,是因为将来需要跟我的Spark 2.1.0 Docker运行环境进行对接实验。如果无此需求可以自由选择相应的选项。
注2:Spark 2.1.0 Docker运行环境请参照我的另一篇文章:《 创建Spark 2.1.0 Docker镜像》
将下载的压缩包解压到合适的目录下。(本例中解压至D:\)
D:\spark-2.1.0-bin-hadoop2.6> dir
2017/01/15 23:32 <DIR> .2017/01/15 23:32 <DIR> ..2017/01/15 23:31 <DIR> bin2017/01/15 23:31 <DIR> conf2017/01/15 23:31 <DIR> data2017/01/15 23:31 <DIR> examples2017/01/15 23:31 <DIR> jars2017/01/15 23:31 <DIR> licenses2016/12/16 10:26 17,811 LICENSE2016/12/16 10:26 24,645 NOTICE2017/01/15 23:32 <DIR> python2017/01/15 23:32 <DIR> R2016/12/16 10:26 3,818 README.md2016/12/16 10:26 128 RELEASE2017/01/15 23:32 <DIR> sbin2017/01/15 23:32 <DIR> yarn 4 个文件 46,402 字节 12 个目录 62,461,882,368 可用字节
JDK 8
Java 8提供了对Lambda表达式的支持,推荐下载最新版本的JDK。
下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
下载后运行安装包进行安装即可。
Eclipse IDE
最新的Eclipse IDE版本是Neon.2,推荐下载Eclipse IDE for Java EE Developers的发行版本,比 for Java Developers的多了一些必要的Package。不过两者都已经捆绑了Apache Maven Plugin,所以对于开发简单的Spark Java应用的来说,两者没有任何区别。
下载后运行安装包,根据提示安装Eclipse IDE。
winutils
Windows上运行Hadoop/Spark需要hadoop.dll和winutils.exe,但是官网提供的binary中并不包括这两个文件,利用源代码编译可以生成它们。如果感兴趣可以源代码编译试试,当然如果想省事的话,网上已经有其他人编译好的文件可供下载使用。
问题描述
http://stackoverflow.com/questions/35652665/java-io-ioexception-could-not-locate-executable-null-bin-winutils-exe-in-the-ha
http://stackoverflow.com/questions/19620642/failed-to-locate-the-winutils-binary-in-the-hadoop-binary-path从源代码编译请参考:
Windows 10编译Hadoop 2.6.0源码直接下载地址:
Hadoop 2.6.0 Windows 64-bit Binaries
Hadoop 2.6.0在Windows 10 64bit下的hadoop.dll和winutils.exe将下载的压缩包解压到合适的目录下。(本例中解压至D:\hadoop-2.6.0\bin)
D:\hadoop-2.6.0\bin> dir
2017/01/16 12:03 <DIR> .2017/01/16 12:03 <DIR> ..2016/08/23 13:24 5,479 hadoop2016/08/23 13:24 8,298 hadoop.cmd2016/08/23 13:24 96,256 hadoop.dll2016/08/23 13:24 21,457 hadoop.exp2016/08/23 13:24 541,112 hadoop.iobj2016/08/23 13:24 158,272 hadoop.ipdb2016/08/23 13:24 35,954 hadoop.lib2016/08/23 13:24 782,336 hadoop.pdb2016/08/23 13:24 11,142 hdfs2016/08/23 13:24 6,923 hdfs.cmd2016/08/23 13:24 13,312 hdfs.dll2016/08/23 13:24 224,220 hdfs.lib2016/08/23 13:24 1,452,126 libwinutils.lib2016/08/23 13:25 5,205 mapred2016/08/23 13:25 5,949 mapred.cmd2016/08/23 13:24 1,776 rcc2016/08/23 13:24 114,176 winutils.exe2016/08/23 13:24 510,194 winutils.iobj2016/08/23 13:24 199,872 winutils.ipdb2016/08/23 13:24 1,265,664 winutils.pdb2016/08/23 13:25 11,380 yarn2016/08/23 13:25 10,895 yarn.cmd 22 个文件 5,481,998 字节 2 个目录 62,457,643,008 可用字节
配置Spark运行环境
设置环境变量
配置SPARK_HOME环境变量
系统属性->环境变量…->新建将Spark运行文件加入到系统环境变量Path中
系统属性->环境变量…->双击”Path”环境变量配置HADOOP_HOME环境变量
系统属性->环境变量…->新建
测试Spark运行环境
D:\run-example SparkPi
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties17/01/17 12:48:37 INFO SparkContext: Running Spark version 2.1.0......17/01/17 12:48:38 INFO SparkEnv: Registering OutputCommitCoordinator17/01/17 12:48:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.17/01/17 12:48:38 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.10.104:404017/01/17 12:48:38 INFO SparkContext: Added JAR file:/D:/spark-2.1.0-bin-hadoop2.6/bin/../examples/jars/scopt_2.11-3.3.0.jar at spark://192.168.10.104:9476/jars/scopt_2.11-3.3.0.jar with timestamp 148462851866017/01/17 12:48:38 INFO SparkContext: Added JAR file:/D:/spark-2.1.0-bin-hadoop2.6/bin/../examples/jars/spark-examples_2.11-2.1.0.jar at spark://192.168.10.104:9476/jars/spark-examples_2.11-2.1.0.jar with timestamp 148462851866117/01/17 12:48:38 INFO Executor: Starting executor ID driver on host localhost17/01/17 12:48:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 9497.......17/01/17 12:48:40 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1128 bytes result sent to driver17/01/17 12:48:40 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1128 bytes result sent to driver17/01/17 12:48:40 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 852 ms on localhost (executor driver) (1/2)17/01/17 12:48:40 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 778 ms on localhost (executor driver) (2/2)17/01/17 12:48:40 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool17/01/17 12:48:40 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.886 s17/01/17 12:48:40 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.154531 s......Pi is roughly 3.139915699578498......17/01/17 12:48:40 INFO SparkUI: Stopped Spark web UI at http://192.168.10.104:404017/01/17 12:48:40 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!......
设置Eclipse IDE
新建项目
1) File菜单->New->Other
2) 新建Maven项目
3) 勾选”Create a simple project”
4) 填写Maven project的相关信息
5) 生成项目
编辑pom.xml
在Eclipse项目文件中双击pom.xml文件,选择“pom.xml”标签页。
将pom.xml替换为以下内容:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>fz.spark</groupId> <artifactId>spark-examples</artifactId> <version>0.0.1</version> <name>SparkExamples</name> <packaging>jar</packaging> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.1.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-launcher_2.11</artifactId> <version>2.1.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>2.1.0</version> </dependency> </dependencies> </project>
保存pom.xml文件之后,Maven开始自动获取依赖包。依赖包保存的位置在“%USERPROFILE%.m2\repository”目录下。
注:Maven缺省从主站点(http://repo1.maven.org/maven2/)下载依赖包,由于我们的网络受到了良好的保护,所以基本上全程保持龟速。性子急的兄弟们请改用国内的镜像站点。
参考:
eclipse设置maven加载国内镜像
maven国内镜像(maven下载慢的解决方法)
测试在Eclipse中运行Spark应用
1) 新建Example1类
package fz.spark;import java.util.Arrays;import java.util.List;import java.util.Map;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.sql.SparkSession;import scala.Tuple2;public class Example1 { /** * @param args */ public static void main(String[] args) { // Create the Java Spark Session by setting application name // and master node "local". try(final SparkSession spark = SparkSession .builder() .master("local") .appName("JavaLocalWordCount") .getOrCreate()) { // Create list of content (the summary of apache spark's README.md) final List<String> content = Arrays.asList( "Spark is a fast and general cluster computing system for Big Data. It provides", "high-level APIs in Scala, Java, Python, and R, and an optimized engine that", "supports general computation graphs for data analysis. It also supports a", "rich set of higher-level tools including Spark SQL for SQL and DataFrames,", "MLlib for machine learning, GraphX for graph processing,", "and Spark Streaming for stream processing." ); // Split the content into words, convert words to key, value with // key as word and value 1, and finally count the occurrences of a word @SuppressWarnings("resource") final Map<String, Long> wordsCount = new JavaSparkContext(spark.sparkContext()) .parallelize(content) .flatMap((x) -> Arrays.asList(x.split(" ")).iterator()) .mapToPair((x) -> new Tuple2<String, Integer>(x, 1)) .countByKey(); System.out.println(wordsCount); } }}
2) 运行Example1
在Eclipse项目文件中右键单击Example1.java文件,选择“Run As”->”Java Application”,或者快捷键“Alt + Shift + X, J”,运行Example1.class。
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties17/01/17 14:35:03 INFO SparkContext: Running Spark version 2.1.0............{GraphX=1, for=6, in=1, is=1, Data.=1, system=1, R,=1, Streaming=1, Scala,=1, processing,=1, data=1, set=1, optimized=1, a=2, provides=1, tools=1, rich=1, computing=1, APIs=1, MLlib=1, that=1, learning,=1, cluster=1, also=1, general=2, machine=1, processing.=1, supports=2, stream=1, Java,=1, high-level=1, Spark=3, It=2, fast=1, an=1, graph=1, Big=1, Python,=1, DataFrames,=1, graphs=1, analysis.=1, including=1, of=1, and=5, computation=1, higher-level=1, engine=1, SQL=2}............17/01/17 14:35:05 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!17/01/17 14:35:05 INFO SparkContext: Successfully stopped SparkContext17/01/17 14:35:05 INFO ShutdownHookManager: Shutdown hook called17/01/17 14:35:05 INFO ShutdownHookManager: Deleting directory C:\Users\farawayzheng\AppData\Local\Temp\spark-e9048a79-b48a-4f58-b8bc-4ce0fb65abd1
可以看到输出了WordCount结果,即表明单机运行Spark应用的环境配置成功。
(完)
- 使用Eclipse IDE搭建Apache Spark的Java开发环境
- Eclipse IDE搭建Spark+java开发环境
- java开发环境搭建、eclipse的使用
- 基于eclipse的spark开发环境搭建
- 使用eclipse IDE搭建C/C++开发环境
- eclipse ide for java ee developers 开发环境搭建
- eclipse ide for java ee developers 开发环境搭建
- Eclipse IDE for Java EE 搭建 Android开发环境
- spark-eclipse开发环境搭建
- 搭建eclipse+maven+scala-ide的scala web开发环境
- Scala IDE for Eclipse 之spark scala语言开发环境搭建------遇到问题记录
- spark本地java开发环境的搭建
- java开发spark应用程序的环境搭建
- Spark--01eclipse java spark环境搭建
- Java开发环境的搭建以及使用eclipse创建项目
- 使用Eclipse开发java web环境搭建
- 构建Spark的IDE开发环境
- Windows下基于eclipse的Spark应用开发环境搭建
- UIScrollView 的滑动研究
- 汇编语言(王爽)——第一次上机
- powerdesign 下ER模型中展示数据注释中文列
- 11.java字符char是否能装载一个汉字
- 小程序开发系列(二)九宫格
- 使用Eclipse IDE搭建Apache Spark的Java开发环境
- POJ 2778 DNA Sequence(AC自动机+矩阵建模)
- 深度学习最全优化方法总结比较(SGD,Adagrad,Adadelta,Adam,Adamax,Nadam)
- ResourceManager相关配置参数
- andorid studio 怎么添加ndk配置
- ORACLE中常用的知识点
- XSS学习笔记(入门篇)
- 5.1 Tomcat学习(servlet容器)
- Android studio异常Class not found using the boot class loader; no stack available