Spark 1.5.0 远程调试

来源:互联网 发布:nginx 反向代理缺点 编辑:程序博客网 时间:2024/05/22 10:33

Spark 1.5.0 远程调试

作者:摇摆少年梦 
微信号:zhouzhihubeyond

先决条件

  1. 已安装好Spark集群,本例子中使用的是spark-1.5.0. 安装方法参见:http://blog.csdn.net/lovehuangjiaju/article/details/48494737
  2. 已经安装好Intellij IDEA,本例中使用的是Intellij IDEA 14.1.4,具体安装方法参见:http://blog.csdn.net/lovehuangjiaju/article/details/48577281

远程调试过程描述

  1. 打开Intellij IDEA,File->New ->Project 
    这里写图片描述

  2. 选择Scala,然后next 
    这里写图片描述

  3. 配置好JDK、Scala版本,填入项目名称,然后Finish 
    这里写图片描述

这里写图片描述

4.导入Spark-assembly-1.5.0-hadoop2.4.0.jar

File->Prject Structure->Library 
这里写图片描述

这里写图片描述

点”+”号->选择Java 
这里写图片描述 
找到spark-1.5.0安装目录,选择spark-assembly-1.5.0-hadoop2.4.0.jar,我的机器上jar包目录为 
/hadoopLearning/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar,然后Finish 
这里写图片描述

这里写图片描述 
最后点击“OK”完成导入

5.关联spark-1.5.0源代码 
在Extended Library中展开spark-assembly-1.5.0-hadoop2.4.0.jar 
这里写图片描述 
找到org->apache->spark 
这里写图片描述 
点开下面包中的任意源文件,我在本机上选择”SparkContext.class”文件,默认情况下Intellij IDEA会为我们反编译.class文件,但源码里面没有注释,可以选择右上角的”Attach Sources” 
这里写图片描述

选择源码文件目录,我的机器上源码解压在/hadoopLearning/spark-1.5.0目录,完成后“OK” 
这里写图片描述 
完成后会提示根目录 
这里写图片描述 
全部选择后点击“OK”,此时显示的不是反编译后的代码,而是关联源代码后的代码,你会发现多了很多注释 
这里写图片描述

至此源码阅读环境构建完毕。

6.启动spark-1.5.0集群 
root@sparkmaster:/hadoopLearning/spark-1.5.0-bin-hadoop2.4/sbin# ./start-all.sh 
这里写图片描述

7.修改spark-class脚本 
本机器上的spark-class脚本位于/hadoopLearning/spark-1.5.0-bin-hadoop2.4/bin目录 
将脚本中的内容

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">done < <(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"$RUNNER"</span> -<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">cp</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"$LAUNCH_CLASSPATH"</span> org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.launcher</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.Main</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"$@"</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

修改为

<code class="hljs bash has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">done</span> < <(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$RUNNER</span>"</span> -cp <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$LAUNCH_CLASSPATH</span>"</span> org.apache.spark.launcher.Main <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$JAVA_OPTS</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$@</span>"</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

这里写图片描述

然后在命令行中执行下列语句 
export JAVA_OPTS="$JAVA_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005"

这里写图片描述

  1. 创建用于测试的Spark应用程序 
    选择项目中的src文件,然后右键 New->Scala Class 
    这里写图片描述 
    然后选择Object 
    这里写图片描述 
    命名为SparkWordCount,然后点击OK,输入如下内容
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">import org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.SparkContext</span>._import org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span>.{SparkConf, SparkContext}object SparkWordCount{  def main(args: Array[String]) {    if (args<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.length</span> == <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>) {      System<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.err</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.println</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Usage: SparkWordCount <inputfile> <outputfile>"</span>)      System<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.exit</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>)    }    val conf = new SparkConf()<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.setAppName</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"SparkWordCount"</span>)    val sc = new SparkContext(conf)    val file=sc<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.textFile</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"file:///hadoopLearning/spark-1.5.1-bin-hadoop2.4/README.md"</span>)    val counts=file<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.flatMap</span>(line=>line<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.split</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">" "</span>))      <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.map</span>(word=>(word,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>))      <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.reduceByKey</span>(_+_)    counts<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.saveAsTextFile</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"file:///hadoopLearning/spark-1.5.1-bin-hadoop2.4/countReslut.txt"</span>)  }}</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li></ul>

9 将Spark应用程序打包 
选择项目,File->Project Structure 
这里写图片描述 
选择 Artifacts 
这里写图片描述 
点击“+”号,然后选择”Jar”->”From modules with dependencies” 
这里写图片描述 
这里写图片描述

选择SparkWordCount作为MainClass 
这里写图片描述

这里写图片描述

Spark应用程序在运行是会自动加载spark-assembly-1.5.0-hadoop2.4.0.jar等jar包,为减少后期Jar包的体积,可以将spark-assembly-1.5.0-hadoop2.4.0.jar等jar包删除,这样打包时不会被打包进去。 
这里写图片描述 
完成后点击”OK”

再选择”Build”->”Build Artifacts” 
这里写图片描述 
Action中选择“Build” 
这里写图片描述

编译后在对应目录中可以看到生成的jar包文件,本机器上的目录是: 
/root/IdeaProjects/SparkRemoteDebugPeoject/out/artifacts/SparkRemoteDebugPeoject_jar

这里写图片描述

10 将代码利用spark-submit提交到集群

<code class="hljs coffeescript has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">root<span class="hljs-property" style="box-sizing: border-box;">@sparkmaster</span>:<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/hadoopLearning/spark-1.5.0-bin-hadoop2.4/bin# ./spark-submit --master spark:/</span><span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/sparkmaster:7077 --class SparkWordCount --executor-memory 1g /root/IdeaProjects/SparkRemoteDebugPeoject/out/artifacts/SparkRemoteDebugPeoject_jar hdfs:/</span><span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">/ns1/README.md hdfs:/</span>/ns1/SparkWordCountResult<span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">//</span>注意这一行语句Listening <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> transport dt_socket at <span class="hljs-attribute" style="box-sizing: border-box; color: rgb(0, 136, 0);">address</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5005</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>

这里写图片描述

11 Intellij IDEA中配置远程调试 
Run->Edit Configuration 
这里写图片描述 
找到Remote 
这里写图片描述 
点击”+“号,命名为Spark_Remote_Debug,其它配置默认,Intellij IDEA已为我们默认配置 
这里写图片描述 
完成后,点击OK

12 正式启动远程调试 
在源码中设置断点,本例中选择在SparkSubmit.scala文件中设置断点 
这里写图片描述

然后按 F9 
这里写图片描述 
选择Spark_Remote_Debug 
Spark控制台出现:Connected to the target VM, address: ‘localhost:5005’, transport: ‘socket’,如下图 
这里写图片描述 
在Debugger上可以看到 
这里写图片描述 
程序在运行SparkSubmit源码中设置断点处 
这里写图片描述

至此,远程调试正式开始,请畅游Spark源代码吧

最后说明一下调试参数: 
参见:http://www.thebigdata.cn/QiTa/12370.html

<code class="hljs haml has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">-<span class="ruby" style="box-sizing: border-box;"><span class="hljs-constant" style="box-sizing: border-box;">Xdebug</span> -<span class="hljs-constant" style="box-sizing: border-box;">Xrunjdwp</span><span class="hljs-symbol" style="color: rgb(0, 102, 102); box-sizing: border-box;">:transport=dt_socket</span>,server=y,suspend=y,address=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5005</span></span>参数说明:-<span class="ruby" style="box-sizing: border-box;"><span class="hljs-constant" style="box-sizing: border-box;">Xdebug</span> 启用调试特性</span>-<span class="ruby" style="box-sizing: border-box;"><span class="hljs-constant" style="box-sizing: border-box;">Xrunjdwp</span> 启用<span class="hljs-constant" style="box-sizing: border-box;">JDWP</span>实现,包含若干子选项:</span>transport=dt_socket JPDA front-end和back-end之间的传输方法。dt_socket表示使用套接字传输。address=5005 JVM在5005端口上监听请求,这个设定为一个不冲突的端口即可。server=y y表示启动的JVM是被调试者。如果为n,则表示启动的JVM是调试器。suspend=y y表示启动的JVM会暂停等待,直到调试器连接上才继续执行。suspend=n,则JVM不会暂停等待。</code>
0 0
原创粉丝点击