Spark学习手札(一):SparkSQL的registerAsTable与registerTempTable
来源:互联网 发布:淘宝网半身雪仿长裙 编辑:程序博客网 时间:2024/06/03 23:47
今天在学习SparkSQL时,按照教程上的代码,在注册数据库表时,使用registerAsTable函数注册table:
教程源码:
val sqlContext=new org.apache.spark.sql.SQLContext(sc)import sqlContext._case class Person(name:String,age:Int)val people=sc.textFile("File:/home/hadoop/examples/people.txt").map(_.split(",")).map(p=>Person(p(0),p(1).trim().toInt)).toDF()people.registerAsTable("people")val teenagers = sqlContext.sql("select name,age from people")teenagers.map { t => t(0)+" "+t(1) } collect() foreach { println }
但是当运行到people.registerAsTable(“people”)时,提示如下问题:
经过查找相关文章资料,终于找到问题的解决方法,故在此做个记录,以备后患。
原因分析:
(1)、函数不同:
由于源码教程使用的是Spark1.3之前的版本,而我使用的是Spark1.6版本,版本不同,所以相应的函数也就不一样了,在Spark1.3之后,注册table使用的是registerTempTable,在Spark1.6.1Documentation的DataFrame类目录下查找该函数可以得到以下解释:
从函数解释中可以知道,这个函数在Spark1.3之后替代了registerAsTable函数。
找到这个原因之后,将函数改为registerTempTable继续运行,此时源码如下:
val sqlContext=new org.apache.spark.sql.SQLContext(sc)import sqlContext._case class Person(name:String,age:Int)val people=sc.textFile("File:/home/hadoop/examples/people.txt").map(_.split(",")).map(p=>Person(p(0),p(1).trim().toInt)).toDF()people.registerTempTable("people")val teenagers = sqlContext.sql("select name,age from people")teenagers.map { t => t(0)+" "+t(1) } collect() foreach { println }
但是当运行到people.registerTempTable(“people”)时,同样提示无法找到函数。
继续查找问题原因,提示找不到函数,是否是import错误,尝试修改import包。
(2)、函数包不同:
通过查找函数包,可以知道新版函数registerTempTable的包是qlContext.implicits._,更改源码:
val sqlContext=new org.apache.spark.sql.SQLContext(sc)import sqlContext.implicits._case class Person(name:String,age:Int)val people=sc.textFile("File:/home/hadoop/examples/people.txt").map(_.split(",")).map(p=>Person(p(0),p(1).trim().toInt)).toDF()people.registerTempTable("people")val teenagers = sqlContext.sql("select name,age from people")teenagers.map { t => t(0)+" "+t(1) } collect() foreach { println }
程序运行成功,没有报出故障,此时运行结果如下:
scala> teenagers.map { t => t(0)+" "+t(1) } collect() foreach { println }16/04/17 08:20:03 INFO mapred.FileInputFormat: Total input paths to process : 116/04/17 08:20:04 INFO spark.SparkContext: Starting job: collect at <console>:3516/04/17 08:20:04 INFO scheduler.DAGScheduler: Got job 0 (collect at <console>:35) with 1 output partitions16/04/17 08:20:04 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (collect at <console>:35)16/04/17 08:20:04 INFO scheduler.DAGScheduler: Parents of final stage: List()16/04/17 08:20:04 INFO scheduler.DAGScheduler: Missing parents: List()16/04/17 08:20:04 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[7] at map at <console>:35), which has no missing parents16/04/17 08:20:04 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 7.4 KB, free 89.4 KB)16/04/17 08:20:04 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.8 KB, free 93.2 KB)16/04/17 08:20:04 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:34132 (size: 3.8 KB, free: 517.4 MB)16/04/17 08:20:04 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:100616/04/17 08:20:04 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[7] at map at <console>:35)16/04/17 08:20:04 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks16/04/17 08:20:04 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2139 bytes)16/04/17 08:20:04 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)16/04/17 08:20:04 INFO rdd.HadoopRDD: Input split: file:/home/hadoop/examples/people.txt:0+3216/04/17 08:20:04 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id16/04/17 08:20:04 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id16/04/17 08:20:04 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap16/04/17 08:20:04 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition16/04/17 08:20:04 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id16/04/17 08:20:05 INFO codegen.GenerateUnsafeProjection: Code generated in 348.055303 ms16/04/17 08:20:05 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 2224 bytes result sent to driver16/04/17 08:20:05 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 788 ms on localhost (1/1)16/04/17 08:20:05 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/04/17 08:20:05 INFO scheduler.DAGScheduler: ResultStage 0 (collect at <console>:35) finished in 0.851 s16/04/17 08:20:05 INFO scheduler.DAGScheduler: Job 0 finished: collect at <console>:35, took 1.190281 sMichael 29Andy 30Justin 19
至此,问题解决。
0 0
- Spark学习手札(一):SparkSQL的registerAsTable与registerTempTable
- Spark-SparkSQL深入学习系列一(转自OopsOutOfMemory)
- SparkSQL的registerTempTable方法时出现错误MissingRequirementError
- spark学习-SparkSQL--12-SparkSession与SparkContext
- Hibernate学习手札(一)
- spark中的dataframe与sparksql的实例
- SparkSQL与Hive on Spark的比较
- SparkSQL与Hive on Spark的比较
- SparkSQL与Hive on Spark的比较
- SparkSQL与Hive on Spark的比较
- SparkSQL与Hive on Spark的比较
- Spark(三) -- Shark与SparkSQL
- Spring 学习手札(一)Spring框架的组成
- spark学习-SparkSQL--10-spark的一些异常
- Spark学习-SparkSQL--01-SparkSQL CLI
- spark学习-SparkSQL--13-java版JavaRDD与JavaPairRDD的互相转换
- Spring 学习手札(三)BeanFactory与ApplicationContext的区别
- SparkSQL中DataFrame registerTempTable源码浅析
- 9. Palindrome Number-数字的回文
- leetcode_089 Gray Code
- mybatis-粗读
- 对象、类、包
- WeCenter3.1.7 blind xxe
- Spark学习手札(一):SparkSQL的registerAsTable与registerTempTable
- Handler详解
- mysql-5.6.30-linux-glibc2.5-x86_64.tar.gz 安装
- 错误日志之Android Studio的application installation failed
- 图解密码技术笔记(三)混合密码系统——用对称密钥提高速度,用公钥密码保护会话密钥
- 【第四次自考】实战总结
- C# SQL带传入、输出参数及返回值的存储过程
- 使用WINDOW批处理更新本地所有SVN目录
- Adapter总结