Getting Spark Setup in Eclipse
来源:互联网 发布:昆仑万维 知乎 编辑:程序博客网 时间:2024/06/16 08:46
http://syndeticlogic.net/?p=311
Spark is a new distributed programming framework for analyzing large data sets. It took me a few steps to get the system setup in Eclipse, so I thought I’d write them down. Hopefully this post saves someone a few minutes.
Fair warning, the Spark project seems to be moving fast, so this could get out of date quickly…
Building from Source
First download the sources from the Git repository. Then try to build it. To build it you need to specify a profile. Below are the commands I used to accomplish these steps.
$ git clone github.com:mesos.git/spark$ mvn -U -Phadoop2 clean install -DskipTests
Unfortunately, that didn’t just work for me. I have reason to believe the issue is environmental (see below), so it might work for you.
If this step works for you, then move on to the next section. Below is the build error I received.
[ERROR] Failed to execute goal on project spark-core: Could notresolve dependencies for projectorg.spark-project:spark-core:jar:0.7.1-SNAPSHOT: The followingartifacts could not be resolved: cc.spray:spray-can:jar:1.0-M2.1,cc.spray:spray-server:jar:1.0-M2.1,cc.spray:spray-json_2.9.2:jar:1.1.1: Could not find artifactcc.spray:spray-can:jar:1.0-M2.1 in jboss-repo(http://repository.jboss.org/nexus/content/repositories/releases/)
This error is bit misleading. The repository.jboss.org is just the last repo missing the artifacts. After inspecting spark/pom.xml, the problem is that mvn cannot download the jars from repo.spray.cc. The spark/pom.xml seems to be correct, and, surprisingly,repo.spray.cc seems to be okay too.
The spray docs indicate repo.spray.io is the maven repo. But both domains point the same IP address. For sanity, I tried it, but had the same problem.
The work around is to put the files in the .m2 repository manually. Below is the script I used.
for k in can io util server base; do dir="cc/spray/spray-$k/1.0-M2.1/" mkdir -p ~/.m2/repository/$dir cd ~/.m2/repository/$dir wget http://repo.spray.io/$dir/spray-$k-1.0-M2.1.pom wget http://repo.spray.io/$dir/spray-$k-1.0-M2.1.jardonedir="cc/spray/spray-json_2.9.2/1.1.1"mkdir -p ~/.m2/repository/$dircd ~/.m2/repository/$dirwget http://repo.spray.io/$dir/spray-json_2.9.2-1.1.1.jarwget http://repo.spray.io/$dir/spray-json_2.9.2-1.1.1.pomdir="cc/spray/twirl-api_2.9.2/0.5.2"mkdir -p ~/.m2/repository/$dircd ~/.m2/repository/$dirwget http://repo.spray.io/$dir/twirl-api_2.9.2-0.5.2.jarwget http://repo.spray.io/$dir/twirl-api_2.9.2-0.5.2.pom
This really sucks, but it works for this error. I found a stackoverflow regarding a similar mvn issue – 1 poster claimed that downgrading to java 6 fixed it. It seems strange that it would be a java 7 issue, but I’ve encountered stranger things. I didn’t test downgrading.
For reference, below is my environment.
james@minerva:~/spark$ mvn -versionApache Maven 3.0.4Maven home: /usr/share/mavenJava version: 1.7.0_17, vendor: Oracle CorporationJava home: /usr/lib/jvm/java-7-oracle/jreDefault locale: en_US, platform encoding: UTF-8OS name: "linux", version: "3.2.0-38-generic", arch: "amd64",family: "unix"
Eclipse Setup
The Eclipse setup is pretty straight forward. But if you’ve never done a Java/Scala Eclipse setup it can take a couple hours to figure out what needs to happen.
From within Eclipse, install EGit and the Scala IDE plugin. Pay attention to the version of Eclipse and Scala. At the time of this writing Spark is based on Scala 2.9.2 and I was running Juno.
I never, ever use the m2eclipse plugin. Some people I know use it successfully, but not me. I use mvn to generate the .project and .classpath files. I don’t know anyone that mixes these approaches.
Below is the command that I used to generate the project files.
$ mvn -Phadoop2 eclipse:clean eclipse:eclipse
Next, import the projects from Git (at this time that includes spark-core, spark-bagel, spark-repl, spark-streaming and spark-examples). To do this, select File->import->Projects from git.
Next, we need to connect the Scala IDE plugin with each project that has Scala source files (spark-core, spark-bagel, spark-repl and spark-streaming). To do so right-click on the project and select Configure->Add Scala Nature. Below is a picture.
Next, we need to add the Scala source folders to the build path (each src/main/scala and src/test/scala folder). To accomplish this, right-click on the folder and select Add to Build Path->Use As Source Folder.
Spark mixes .java and .scala files in a non-standard way that can confuse the Scala IDE plugin, so we need to make sure that all the source folders include .scala files in the classpath. To check if this is the case, look at the .classpath. It should have an entry like the following for all the scala source folders.
<classpathentry including="**/*.java|**/*.scala" kind="src" path="src/main/scala"/>
If the there is no **/*.scala in the classpathentry for any source folder with Scala code in it, then we need to add it. It can be added via Eclipse through the GUI, or we can edit the .classpath file directly.
Inclusion filters can be added from the Eclipse GUI by right-clicking on the source folder and selectiong Build Path->Configure Inclusion/Exclusion Filters and add **/*.scala.
Finally, add spark-core to the build path of spark-repl and spark-streaming. To do this, right-click on the project and Add to Build Path->Configure Build Path->Add projects(then select spark-core).
- Getting Spark Setup in Eclipse
- 译:Getting Started with Spark (in Python) Spark入门
- spark in eclipse---Spark学习笔记3
- Getting Started With Setup Projects
- Getting FTP to Work with Ant in Eclipse[转]
- Spark Getting started
- How to setup java remote debugging in eclipse
- Eclipse + XAMPP + Wordpress system setup in Windows Vista
- 216--Getting in Line
- 216 - Getting in Line
- UVaOJ216---Getting in Line
- UVA216 Getting in Line
- UVa216 Getting in Line
- UVA216-Getting in Line
- Getting in Line
- UVA Getting in Line
- Getting in Line
- uva216 Getting in Line
- 动态规划---最长公共子序列
- Ext动态改变fieldLabel的值
- POJ Blue Jeans(3080)-STL&&KMP
- 冒泡排序
- SDUTOJ2128 二叉排序树
- Getting Spark Setup in Eclipse
- Android加速度传感器的使用:摇一摇功能的实现
- 鸡尾酒排序
- live555杂谈系列(一)---source,sink简介
- 事件与委托
- 一个适合的火车站的idea
- 福利来了,晒书评送书活动启动了
- 地精排序
- 好人的路——读《追风筝的人》