Flink 代码方式提交程序到远程集群运行
来源:互联网 发布:海博物流软件 编辑:程序博客网 时间:2024/05/16 05:42
在学习Flink时候,看到如下方法,可以获取到远程集群上的一个ExecutionEnvironment实例,便尝试使用一下,将本地IDE作业提交到集群运行,代码如下:
def createRemoteEnvironment(host: String, port: Int, jarFiles: String*): ExecutionEnvironment
package com.daxin.batchimport org.apache.flink.api.scala.ExecutionEnvironmentimport org.apache.flink.configuration.{ConfigConstants, Configuration}//important: this import is needed to access the 'createTypeInformation' macro functionimport org.apache.flink.api.scala._/** * Created by Daxin on 2017/4/17. * https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/types_serialization.html#type-information-in-the-scala-api */object RemoteJob { def main(args: Array[String]) { val env = ExecutionEnvironment.createRemoteEnvironment("node", 6123) val words = env.readTextFile("hdfs://node:9000/word/spark-env.sh") val data = words.flatMap(x => x.split(" ")).map(x => (x, 1)).groupBy(0).sum(1) println(data.count) //简单触发作业打印一下个数 }}
运行报错,百度,谷歌,bing搜索了老半天也没有解决。为了以后方便搜索到此错误,将粘出全部异常信息:
Submitting job with JobID: 2e9a9550e8352e8f6cfd579b3522a732. Waiting for job completion.Connected to JobManager at Actor[akka.tcp://flink@node:6123/user/jobmanager#950641914]04/19/2017 19:37:21Job execution switched to status RUNNING.04/19/2017 19:37:21CHAIN DataSource (at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:25) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:27)) -> Map (Map at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:27)) -> Combine(SUM(1))(1/1) switched to SCHEDULED 04/19/2017 19:37:21CHAIN DataSource (at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:25) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:27)) -> Map (Map at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:27)) -> Combine(SUM(1))(1/1) switched to DEPLOYING 04/19/2017 19:37:21CHAIN DataSource (at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:25) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:27)) -> Map (Map at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:27)) -> Combine(SUM(1))(1/1) switched to RUNNING 04/19/2017 19:37:21CHAIN DataSource (at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:25) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:27)) -> Map (Map at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:27)) -> Combine(SUM(1))(1/1) switched to FAILED java.lang.RuntimeException: The initialization of the DataSource's outputs caused an error: The type serializer factory could not load its parameters from the configuration due to missing classes.at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:92)at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655)at java.lang.Thread.run(Thread.java:744)Caused by: java.lang.RuntimeException: The type serializer factory could not load its parameters from the configuration due to missing classes.at org.apache.flink.runtime.operators.util.TaskConfig.getTypeSerializerFactory(TaskConfig.java:1145)at org.apache.flink.runtime.operators.util.TaskConfig.getOutputSerializer(TaskConfig.java:551)at org.apache.flink.runtime.operators.BatchTask.getOutputCollector(BatchTask.java:1216)at org.apache.flink.runtime.operators.BatchTask.initOutputs(BatchTask.java:1295)at org.apache.flink.runtime.operators.DataSourceTask.initOutputs(DataSourceTask.java:286)at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:90)... 2 moreCaused by: java.lang.ClassNotFoundException: com.daxin.batch.RemoteJob$$anon$2$$anon$1at java.net.URLClassLoader$1.run(URLClassLoader.java:366)at java.net.URLClassLoader$1.run(URLClassLoader.java:355)at java.security.AccessController.doPrivileged(Native Method)at java.net.URLClassLoader.findClass(URLClassLoader.java:354)at java.lang.ClassLoader.loadClass(ClassLoader.java:425)at java.lang.ClassLoader.loadClass(ClassLoader.java:358)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:270)at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:66)at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:292)at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:250)at org.apache.flink.api.java.typeutils.runtime.RuntimeSerializerFactory.readParametersFromConfig(RuntimeSerializerFactory.java:76)at org.apache.flink.runtime.operators.util.TaskConfig.getTypeSerializerFactory(TaskConfig.java:1143)... 7 more04/19/2017 19:37:21Job execution switched to status FAILING.java.lang.RuntimeException: The initialization of the DataSource's outputs caused an error: The type serializer factory could not load its parameters from the configuration due to missing classes.at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:92)at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655)at java.lang.Thread.run(Thread.java:744)Caused by: java.lang.RuntimeException: The type serializer factory could not load its parameters from the configuration due to missing classes.at org.apache.flink.runtime.operators.util.TaskConfig.getTypeSerializerFactory(TaskConfig.java:1145)at org.apache.flink.runtime.operators.util.TaskConfig.getOutputSerializer(TaskConfig.java:551)at org.apache.flink.runtime.operators.BatchTask.getOutputCollector(BatchTask.java:1216)at org.apache.flink.runtime.operators.BatchTask.initOutputs(BatchTask.java:1295)at org.apache.flink.runtime.operators.DataSourceTask.initOutputs(DataSourceTask.java:286)at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:90)... 2 moreCaused by: java.lang.ClassNotFoundException: com.daxin.batch.RemoteJob$$anon$2$$anon$1at java.net.URLClassLoader$1.run(URLClassLoader.java:366)at java.net.URLClassLoader$1.run(URLClassLoader.java:355)at java.security.AccessController.doPrivileged(Native Method)at java.net.URLClassLoader.findClass(URLClassLoader.java:354)at java.lang.ClassLoader.loadClass(ClassLoader.java:425)at java.lang.ClassLoader.loadClass(ClassLoader.java:358)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:270)at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:66)at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:292)at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:250)at org.apache.flink.api.java.typeutils.runtime.RuntimeSerializerFactory.readParametersFromConfig(RuntimeSerializerFactory.java:76)at org.apache.flink.runtime.operators.util.TaskConfig.getTypeSerializerFactory(TaskConfig.java:1143)... 7 more04/19/2017 19:37:21Reduce (SUM(1))(1/1) switched to CANCELED 04/19/2017 19:37:21DataSink (org.apache.flink.api.java.Utils$CountHelper@516be40f)(1/1) switched to CANCELED 04/19/2017 19:37:21Job execution switched to status FAILED.Exception in thread "main" org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed.at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101)at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:362)at org.apache.flink.client.RemoteExecutor.executePlanWithJars(RemoteExecutor.java:211)at org.apache.flink.client.RemoteExecutor.executePlan(RemoteExecutor.java:188)at org.apache.flink.api.java.RemoteEnvironment.execute(RemoteEnvironment.java:172)at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)at org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:672)at org.apache.flink.api.scala.DataSet.count(DataSet.scala:529)at com.daxin.batch.RemoteJob$.main(RemoteJob.scala:29)at com.daxin.batch.RemoteJob.main(RemoteJob.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)Caused by: java.lang.RuntimeException: The initialization of the DataSource's outputs caused an error: The type serializer factory could not load its parameters from the configuration due to missing classes.at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:92)at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655)at java.lang.Thread.run(Thread.java:744)Caused by: java.lang.RuntimeException: The type serializer factory could not load its parameters from the configuration due to missing classes.at org.apache.flink.runtime.operators.util.TaskConfig.getTypeSerializerFactory(TaskConfig.java:1145)at org.apache.flink.runtime.operators.util.TaskConfig.getOutputSerializer(TaskConfig.java:551)at org.apache.flink.runtime.operators.BatchTask.getOutputCollector(BatchTask.java:1216)at org.apache.flink.runtime.operators.BatchTask.initOutputs(BatchTask.java:1295)at org.apache.flink.runtime.operators.DataSourceTask.initOutputs(DataSourceTask.java:286)at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:90)... 2 moreCaused by: java.lang.ClassNotFoundException: com.daxin.batch.RemoteJob$$anon$2$$anon$1at java.net.URLClassLoader$1.run(URLClassLoader.java:366)at java.net.URLClassLoader$1.run(URLClassLoader.java:355)at java.security.AccessController.doPrivileged(Native Method)at java.net.URLClassLoader.findClass(URLClassLoader.java:354)at java.lang.ClassLoader.loadClass(ClassLoader.java:425)at java.lang.ClassLoader.loadClass(ClassLoader.java:358)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:270)at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:66)at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:292)at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:250)at org.apache.flink.api.java.typeutils.runtime.RuntimeSerializerFactory.readParametersFromConfig(RuntimeSerializerFactory.java:76)at org.apache.flink.runtime.operators.util.TaskConfig.getTypeSerializerFactory(TaskConfig.java:1143)... 7 more
注意到有一行异常信息:
java.lang.RuntimeException: The initialization of the DataSource's outputs caused an error: The type serializer factory could not load its parameters from the configuration due to missing classes.
总以为是序列化的问题,反复查看文档也没有找到解决访问!最后又回头查了一下Api文档,发现createRemoteEnvironment方法的第三个参数是一个可变参数,并不是有默认值,这个被Scala函数可以提供默认值给思维定势了,后来加上第三个参数为作业程序的Jar之后便可以正确提交到远程集群运行了!
正确代码如下:
package com.daxin.batchimport org.apache.flink.api.scala.ExecutionEnvironmentimport org.apache.flink.configuration.{ConfigConstants, Configuration}//important: this import is needed to access the 'createTypeInformation' macro functionimport org.apache.flink.api.scala._/** * Created by Daxin on 2017/4/17. * https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/types_serialization.html#type-information-in-the-scala-api */object RemoteJob { def main(args: Array[String]) { val env = ExecutionEnvironment.createRemoteEnvironment("node", 6123,"C://logs//flink-lib//flinkwordcount.jar") val words = env.readTextFile("hdfs://node:9000/word/spark-env.sh") val data = words.flatMap(x => x.split(" ")).map(x => (x, 1)).groupBy(0).sum(1) println(data.count) //简单触发作业打印一下个数 }}
最后注意:如果是为了方便本地代码打包在集群中运行的话,最好保持代码和jar一致性,言外之意就是修改之后最好也从新打jar包
0 0
- Flink 代码方式提交程序到远程集群运行
- scala编写的Spark程序远程提交到服务器集群上运行
- 本地Spark程序提交到hadoop集群运行流程
- Eclipse提交代码到Spark集群上运行
- 在JAVA应用中远程提交MapReduce程序至Hadoop集群运行
- ecliplse 远程提交程序到虚拟机 hadoop集群 , 修改Hadoop的源代码---NativeIO问题解决!
- Spark on yarn--几种提交集群运行spark程序的方式
- mac电脑的eclipse把mapreduce程序提交到hadoop2.x集群虚拟机上运行
- 在Windows下的Eclipse中如何将WordCount程序提交到集群运行
- 在window上提交spark代码到远程测试环境上运行
- ios提交代码到远程仓库
- git命令行提交代码到远程仓库
- egit提交代码到远程仓库
- 使用git提交代码到远程服务器
- GIT提交代码到远程创库
- git提交工程代码到远程库
- Git代码提交到CSDN远程仓库
- eget无法提交代码到远程仓库
- android WebView 加载重定向页面无法后退解决方案
- 动态规划--最长上升子序列
- 开发视频教程的下载地址
- C标准
- github 本地代码上传到github上
- Flink 代码方式提交程序到远程集群运行
- sprintf()函数标准化输出实用实例
- POJ 1026-Cipher(置换群-K次置换 取模循环节长度)
- 第三届ACM省赛 房间安排
- java File类文件过滤
- HTTPS(Secure Hypertext Transfer Protocol)安全超文本传输协议
- 记一次php开发个人微信订阅号
- POJ2366 Wireless Network(并查集)
- 凸包——Luogu1452 Beauty Contest