sparkcookbook阅读笔记

来源:互联网 发布:亚瑟士k24 知乎 编辑:程序博客网 时间:2024/05/17 01:37

创建HiveContext

val sc: SparkContextval sqlContext = new org.apache.spark.sql.SQLContext(sc)

To enable Hive functionality, make sure that you have Hive enabled (-Phive) assembly JAR
is available on all worker nodes; also, copy hive-site.xml into the conf directory of the
Spark installation. It is important that Spark has access to hive-site.xml; otherwise, it will
create its own Hive metastore and will not connect to your existing Hive warehouse.
把hive配置的hive-site.xml复制到spark/conf目录下

默认sparkSQL创建的表都有hive管理,即hive控制一个表的生命周期,包括用drop删除。

spark-shell --driver-memory 1Gscala> val hc = new org.apache.spark.sql.hive.HiveContext(sc)hc.sql("create table if not exists person(first_name string,last_name string,age int)row format delimited fields terminated by ','")

报错:

scala> hc.sql("create table if not exists person(first_name string,last_name string,age int)row format delimited fields terminated by ','")15/08/27 13:56:01 INFO parse.ParseDriver: Parsing command: create table if not exists person(first_name string,last_name string,age int)row format delimited fields terminated by ','15/08/27 13:56:01 INFO parse.ParseDriver: Parse Completed15/08/27 13:56:01 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 0.13.1 using Spark classes.15/08/27 13:56:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable15/08/27 13:56:02 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore15/08/27 13:56:02 INFO metastore.ObjectStore: ObjectStore, initialize called15/08/27 13:56:02 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored15/08/27 13:56:02 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored15/08/27 13:56:02 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)    at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:116)    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)    at org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:172)    at org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:168)    at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:213)    at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:176)    at org.apache.spark.sql.hive.HiveContext$$anon$2.<init>(HiveContext.scala:371)at org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:371)at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:370)at org.apache.spark.sql.hive.HiveContext$$anon$1.<init>(HiveContext.scala:383)    at org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:383)    at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:382)    at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:931)    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131)    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755)    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:35)    at $iwC$$iwC$$iwC.<init>(<console>:37)    at $iwC$$iwC.<init>(<console>:39)    at $iwC.<init>(<console>:41)    at <init>(<console>:43)    at .<init>(<console>:47)    at .<clinit>(<console>)    at .<init>(<console>:7)    at .<clinit>(<console>)    at $print(<console>)    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:606)    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)at org.apache.spark.repl.Main$.main(Main.scala:31)at org.apache.spark.repl.Main.main(Main.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)    at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)    ... 65 moreCaused by: java.lang.reflect.InvocationTargetException    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)    ... 70 moreCaused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factoryNestedThrowables:java.lang.reflect.InvocationTargetException    at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)    at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788)    at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)    at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)    at java.lang.reflect.Method.invoke(Method.java:606)    at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)    at java.security.AccessController.doPrivileged(Native Method)    at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)    at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)    at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)    at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)    at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310)    at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339)    at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248)    at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223)    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)    at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)    at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:497)    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:475)    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523)    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397)    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356)    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)    at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944)    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171)    ... 75 moreCaused by: java.lang.reflect.InvocationTargetException    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)    at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)    at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)    at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)    at org.datanucleus.store.AbstractStoreManager.<init>(AbstractStoreManager.java:240)    at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:286)    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)    at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)    at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)    at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)    at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)    at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)    ... 104 moreCaused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "dbcp-builtin" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.    at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:259)    at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)    at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:85)    ... 122 moreCaused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.    at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)    at org.datanucleus.store.rdbms.connectionpool.DBCPBuiltinConnectionPoolFactory.createConnectionPool(DBCPBuiltinConnectionPoolFactory.java:49)    at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)    ... 124 more
0 0
原创粉丝点击