Spark命令行测试转换RDD to DataFrame报Relative path in absolute URI错误-Win7

来源:互联网 发布:java 动态代理参数 编辑:程序博客网 时间:2024/05/22 02:09

无聊玩了一下Spark命令行模式的命令,测试读写Parquet格式的操作,发现执行personRDD.toDF时候报以下错误:

org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient        at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)        at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)        at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171)        ......

然后发现是由于一下错误引起的:

Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/Users/Administrator/spark-warehouse  at java.net.URI.checkPath(URI.java:1823)  at java.net.URI.<init>(URI.java:745)  at org.apache.hadoop.fs.Path.initialize(Path.java:203)  ... 96 more

看了一下源码:

  private void initialize(String scheme, String authority, String path,      String fragment) {    try {      this.uri = new URI(scheme, authority, normalizePath(scheme, path), null, fragment)        .normalize();    } catch (URISyntaxException e) {      throw new IllegalArgumentException(e);    }  }

折腾了一下(几分钟吧),然后发现是读取hive-site.xml时候,${system:java.io.tmpdir}和${system:user.name}替换的问题,需要在hive-site.xml里面配置绝对路径(并事先创建):

  <property>    <name>system:java.io.tmpdir</name>    <value>C:/Users</value>    <description/>  </property>  <property>    <name>system:user.name</name>    <value>Administrator</value>    <description/>  </property>

重新执行personRDD.toDF,问题解决 :)


下面是具体的测试命令行:

scala> case class Person(firstName: String, lastName: String, age:Int)scala> val personRDD = sc.textFile("hdfs://localhost:9000/person").map(line => line.split(",")).map(p => Person(p(0),p(1),p(2).toInt))scala> val personDF = personRDD.toDFscala> personDF.registerTempTable("person")scala> val people = sql("select * from person")scala> people.collect.foreach(println)


我hdfs里面的测试文件是person.txt,内容比较简单:

C:\Users\Administrator>hdfs dfs -cat /person/person.txtBarack,Obama,53George,Bush,68Bill,Clinton,68

scala> people.collect.foreach(println)的执行结果是:

scala> people.collect.foreach(println)[Stage 2:> (0 + 0) / 2][Barack,Obama,53][George,Bush,68][Bill,Clinton,68]


0 0
原创粉丝点击