eclipse IDEA maven scala spark 搭建 成功运行 sparkContext

来源:互联网 发布:mac注销用户快捷键 编辑:程序博客网 时间:2024/06/05 10:58

整了好几天,把eclipse弄能用.. 期间报各种错,进度也被耽误了…archetype和pom部分引用他人的,可惜调试的太多,没有记录下作者,这里歉意+感谢.

环境:
Hadoop–>2.6.4
Scala–>2.11.8
Spark–>2.2.0

IDE,
eclipseEE + scalaIDE插件–>oxygen:pom有报错,但是可用
scalaIDE–>4.7-RC:目前spark的本地/集群都可执行.
IDEA–>还有些问题,可运行,不完美.补充在最后,和eclipseEE有点像.

注意:
1,版本的一致.scala和spark的版本要对应,不然可能报class.Product$class错,报找不到类或者…错误好多,没头绪..
如:
下面pom的配置中的spark.version和scala.version和scala.binary.version还有scala Library Container中版本的匹配,
2,pom中添加scala-maven-plugin插件依赖,就不需要再添加scala的dependency,除非有特殊需求;同时要注意导入的Scala Library Container中的版本问题
3,善用maven 的update project和project 的clean还有项目右键中configure菜单中的功能
4,本方案为初级方案.有些问题还是未能解决,使用eclipse的话,就用scalaIDE版的吧.IDEA可用的话,编辑scala还是很顺手的,就是括号,引号的tab跳转不舒服

eclipse部分:

具体如下:同时适用于eclipseEE和scalaIDE,具体有说明.

  1. 创建maven Project,
    这里写图片描述

  2. next
    这里写图片描述
    http://repo1.maven.org/maven2/archetype-catalog.xml
    remote archetype

  3. 创建好后,修改
    src/main/java–>src/main/scala
    src/test/java–>src/test/scala

  4. 修改pom.xml.使用上面远程的archetype,会自动生成指定的pom,这里不使用.

    <properties>        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>        <spark.version>2.2.0</spark.version>        <scala.version>2.11.8</scala.version>        <scala.binary.version>2.11</scala.binary.version>        <hadoop.version>2.6.4</hadoop.version>    </properties>    <dependencies>        <!-- ==================================spark================================-->        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-core_${scala.binary.version}</artifactId>            <version>${spark.version}</version>            <scope>provided</scope>        </dependency>        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming_2.10 -->        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-streaming_${scala.binary.version}</artifactId>            <version>${spark.version}</version>            <scope>provided</scope>        </dependency>        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10 -->        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-sql_${scala.binary.version}</artifactId>            <version>${spark.version}</version>        </dependency>        <!-- ==================================hadoop================================-->        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->        <dependency>            <groupId>org.apache.hadoop</groupId>            <artifactId>hadoop-client</artifactId>            <version>${hadoop.version}</version>        </dependency>        <!-- ==================================other================================-->        <dependency>            <groupId>junit</groupId>            <artifactId>junit</artifactId>            <version>4.12</version>        </dependency>    </dependencies>    <!-- maven官方 http://repo1.maven.org/maven2/ 或 http://repo2.maven.org/maven2/         (延迟低一些) -->    <repositories>        <repository>            <id>central</id>            <name>Maven Repository Switchboard</name>            <layout>default</layout>            <url>http://repo2.maven.org/maven2</url>            <snapshots>                <enabled>false</enabled>            </snapshots>        </repository>    </repositories>    <build>        <sourceDirectory>src/main/scala</sourceDirectory>        <testSourceDirectory>src/test/scala</testSourceDirectory>        <plugins>            <plugin>                <!-- MAVEN 编译使用的JDK版本 -->                <groupId>org.apache.maven.plugins</groupId>                <artifactId>maven-compiler-plugin</artifactId>                <version>3.7.0</version>                <configuration>                    <source>1.8</source>                    <target>1.8</target>                    <encoding>UTF-8</encoding>                </configuration>            </plugin>            <plugin>                <groupId>net.alchim31.maven</groupId>                <artifactId>scala-maven-plugin</artifactId>                <version>3.3.1</version>                <executions>                    <execution>                        <goals>                            <goal>compile</goal>                            <goal>testCompile</goal>                        </goals>                        <configuration>                            <args>                                <arg>-make:transitive</arg>                                <arg>-dependencyfile</arg>                            </args>                        </configuration>                    </execution>                </executions>            </plugin>        </plugins>    </build>当当运行时出现” -make:transitive”错误时,注释掉这个<arg>.同时在dependencies添加(可选):    <!-- https://mvnrepository.com/artifact/org.specs2/specs2-junit_2.11 -->    <dependency>        <groupId>org.specs2</groupId>        <artifactId>specs2-junit_2.11</artifactId>        <version>3.9.4</version>        <scope>test</scope>    </dependency>

注意:
1. 这里在eclipseEE在市场中下载了scalaIDE插件后,依然会报错,不用管,能用.
2. pom中的依赖包的版本注意与自己的版本对应.
3. 恰当使用项目右键中maven的update Projcet功能和eclipse菜单栏projcet的clean和build Automatically.多试试.
4. 在使用maven 的update Projcet后成功识别pom中的scala插件配置后,项目右键会有scala的选项.在使用eclipseEE的maven的update Projcet后没有反应时,使用项目右键菜单中configure,有”add Scala Nature”,可给项目添加scala库容器这里写图片描述
5. 当下载scalaIDE插件或者使用最新的scalaIDE时,默认图中的(build-in)3个版本,要添加自己的scala版本时,在Window–Perferences–>这里写图片描述
6. 修改scala版本时,在scala Library Container 右键–> build path–> configure build path –> Libraries标签中,remove当前版本,add Library –> Scala Library–> 选择刚才添加的版本即可.

5,创建object对象,测试运行.注意eclipseEE在执行sparkjob时,我这里需要设置head memory,在的run configurations..中scala application的对应任务
这里写图片描述

测试代码:

    import org.apache.spark.SparkConf    import org.apache.spark.SparkContext    object a {        def main(args: Array[String]): Unit = {            val conf = new SparkConf()            //conf.setMaster("spark://mini2:7077")            conf.setMaster("local[4]")            conf.setAppName("test")            val sc = new SparkContext(conf)            val a = sc.parallelize(List(1,2,3), 2)            println(a.count())    //3            sc.stop()        }    }

**

IDEA部分

**
1,创建maven项目,修改

src/main/java-->src/main/scalasrc/test/java-->src/test/scala

2,改pom,配置内容与上面有些不同–

试了下,两个pom的部分不能互换,起码eclipse的部分不能放在IDEA这个版本里, 配置具体什么含义,会者不难.不会的人就照着这个来吧,起码能跑起来

<properties>    <maven.compiler.source>1.8</maven.compiler.source>    <maven.compiler.target>1.8</maven.compiler.target>    <encoding>UTF-8</encoding>    <scala.version>2.11.8</scala.version>    <spark.version>2.2.0</spark.version>    <hadoop.version>2.6.4</hadoop.version></properties><dependencies>    <dependency>        <groupId>org.apache.spark</groupId>        <artifactId>spark-core_2.11</artifactId>        <version>${spark.version}</version>    </dependency>    <dependency>        <groupId>org.apache.hadoop</groupId>        <artifactId>hadoop-client</artifactId>        <version>${hadoop.version}</version>    </dependency></dependencies><build>    <sourceDirectory>src/main/scala</sourceDirectory>    <testSourceDirectory>src/test/scala</testSourceDirectory>    <plugins>        <plugin>            <groupId>net.alchim31.maven</groupId>            <artifactId>scala-maven-plugin</artifactId>            <version>3.2.2</version>            <executions>                <execution>                    <goals>                        <goal>compile</goal>                        <goal>testCompile</goal>                    </goals>                    <configuration>                        <args>                            <!--<arg>-make:transitive</arg>-->                            <arg>-dependencyfile</arg>                            <arg>${project.build.directory}/.scala_dependencies</arg>                        </args>                    </configuration>                </execution>            </executions>        </plugin>        <plugin>            <groupId>org.apache.maven.plugins</groupId>            <artifactId>maven-shade-plugin</artifactId>            <version>3.1.0</version>            <executions>                <execution>                    <phase>package</phase>                    <goals>                        <goal>shade</goal>                    </goals>                    <configuration>                        <filters>                            <filter>                                <artifact>*:*</artifact>                                <excludes>                                    <exclude>META-INF/*.SF</exclude>                                    <exclude>META-INF/*.DSA</exclude>                                    <exclude>META-INF/*.RSA</exclude>                                </excludes>                            </filter>                        </filters>                    </configuration>                </execution>            </executions>        </plugin>    </plugins></build>

===>这里遇到了-make:transitive问题,注释掉这行,没有添加junit依赖,也可运行…………….

3,在Projcet Structure中Global Library修改本项目的scala版本,类似eclipse添加.见eclipse部分

4,善用侧边栏的Maven Project的左上角的刷新和项目右键的build和rebuild功能

5,我这里运行时,报出和eclipseEE相同的内存不足,在run configurations…中设置,如图:
eclipse+插件版本我没找到类似的设置,只能一个obj设置一个了;
scalaIDE不存在内存问题.
这里写图片描述

6,,修改scala编译顺序,javathenscala
这里写图片描述

阅读全文
0 0
原创粉丝点击