Spark组件之SparkR学习2--使用spark-submit向集群提交R代码文件dataframe.R
来源:互联网 发布:运动装 休闲装 知乎 编辑:程序博客网 时间:2024/06/05 04:45
更多代码请见:https://github.com/xubo245/SparkLearning
环境:
spark1.5.2,R-3.2.1
1.examples1 dataframe.R
1.1 文件来源:参考【1】
./bin/spark-submit examples/src/main/r/dataframe.R中代码运行有问题:
hadoop@Master:~/cloud/testByXubo/spark/R$ spark-submit dataframe.R WARNING: ignoring environment value of R_HOMELoading required package: methodsAttaching package: ‘SparkR’The following objects are masked from ‘package:stats’: filter, na.omitThe following objects are masked from ‘package:base’: intersect, rbind, sample, subset, summary, table, transformroot |-- name: string (nullable = true) |-- age: double (nullable = true)16/04/20 11:14:25 ERROR RBackendHandler: jsonFile on 1 failedError in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : java.io.IOException: No input paths specified in jobat org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201)at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)at scala.Option.getOrElse(Option.scala:120)at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)at scala.Option.getOrElse(Option.scala:120)at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)at org.apache.spark.rdd.RDD$$anonfun$partitions$2Calls: jsonFile -> callJMethod -> invokeJavaExecution halted
1.2 代码:
hadoop@Master:~/cloud/testByXubo/spark/R$ cat dataframe.R ## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements. See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License. You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.#library(SparkR)# Initialize SparkContext and SQLContextsc <- sparkR.init(appName="SparkR-DataFrame-example")sqlContext <- sparkRSQL.init(sc)# Create a simple local data.framelocalDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))# Convert local data frame to a SparkR DataFramedf <- createDataFrame(sqlContext, localDF)# Print its schemaprintSchema(df)# root# |-- name: string (nullable = true)# |-- age: double (nullable = true)# Create a DataFrame from a JSON filepath <- file.path("/examples/src/main/resources/people.json")peopleDF <- jsonFile(sqlContext, path)printSchema(peopleDF)# Register this DataFrame as a table.registerTempTable(peopleDF, "people")# SQL statements can be run by using the sql methods provided by sqlContextteenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")# Call collect to get a local data.frameteenagersLocalDF <- collect(teenagers)# Print the teenagers in our dataset print(teenagersLocalDF)# Stop the SparkContext nowsparkR.stop()
1.3 运行指令:
spark-submit --master spark://Master:7077 dataframe.R
或
spark-submit dataframe.R
1.4 运行结果:
hadoop@Master:~/cloud/testByXubo/spark/R$ spark-submit --master spark://Master:7077 dataframe.R WARNING: ignoring environment value of R_HOMELoading required package: methodsAttaching package: ‘SparkR’The following objects are masked from ‘package:stats’: filter, na.omitThe following objects are masked from ‘package:base’: intersect, rbind, sample, subset, summary, table, transformroot |-- name: string (nullable = true) |-- age: double (nullable = true)root |-- age: long (nullable = true) |-- name: string (nullable = true) name 1 Justin
或者默认:
hadoop@Master:~/cloud/testByXubo/spark/R$ spark-submit dataframe.R WARNING: ignoring environment value of R_HOMELoading required package: methodsAttaching package: ‘SparkR’The following objects are masked from ‘package:stats’: filter, na.omitThe following objects are masked from ‘package:base’: intersect, rbind, sample, subset, summary, table, transformroot |-- name: string (nullable = true) |-- age: double (nullable = true)root |-- age: long (nullable = true) |-- name: string (nullable = true) name1 Justin
1.5 分析
1.5.1 默认是本地执行:
App ID App Name Started Completed Duration Spark User Last Updated local-1461125367768 SparkR-DataFrame-example 2016/04/20 12:09:25 2016/04/20 12:09:32 7 s hadoop 2016/04/20 12:09:32 app-20160420111855-0007 SparkR-DataFrame-example 2016/04/20 11:18:52 2016/04/20 11:19:10 17 s hadoop 2016/04/20 11:19:10
1.5.2 执行时有3个stage:
参考:
【1】 https://github.com/apache/spark/tree/master/R
0 0
- Spark组件之SparkR学习2--使用spark-submit向集群提交R代码文件dataframe.R
- Spark组件之SparkR学习3--使用spark-submit向集群提交R代码文件data-manipulation.R
- Spark组件之SparkR学习5--R语言函数调用(跨文件调用)
- SparkR (R on Spark)
- Spark组件之SparkR学习4--Eclipse下R语言环境搭建
- 通过SparkR在R上运行Spark
- SparkR principle | R spark 集成原理
- spark-submit提交集群命令
- 蜗龙徒行-Spark学习笔记【四】Spark集群中使用spark-submit提交jar任务包实战经验
- Spark组件之SparkR学习1--安装与测试
- Spark-submit提交任务到集群
- spark-submit 提交作业到集群
- spark-submit提交任务到集群-案例
- spark-submit提交任务到集群
- Spark R
- Spark集群中使用spark-submit提交jar任务包实战经验
- 使用spark-submit提交jar包到spark standalone集群(续)
- Spark 之 spark submit
- 解决/lib64/libc.so.6: version `GLIBC_2.14' not found
- 一招解决全局键盘遮挡输入框问题
- JavaScript DOM加载
- 使用python爬虫时,遇到多页,需要翻页,下一页时怎么处理
- 编译原理:用bison实现输入二进制数,输出十进制数
- Spark组件之SparkR学习2--使用spark-submit向集群提交R代码文件dataframe.R
- 小波分析: 一、一维小波级数
- LeetCode *** 69. Sqrt(x) 牛顿迭代法
- Excel Sheet Column Number
- Android自定义ViewGroup:实现简单的垂直方向线性布局(2)
- wc命令
- Laravel5.2多级一对多的实现
- [精]读览天下免费阅读平台
- 锁,同步,可重入锁,读写锁