在windowns下安装Anaconda3运行spark
来源:互联网 发布:淘宝手机模板 编辑:程序博客网 时间:2024/06/05 00:51
1. 准备工作
1.1需要的软件:
Anaconda3-5.0.0-Windows-x86_64
hadoop-2.7.4
jdk1.8+
spark-2.2.0-bin-hadoop2.7
1.2下载软件
Anaconda 官网下载地址:https://www.continuum.io/downloads
目前最新版本是 python 3.6,默认下载也是 Python 3.6,百度网盘下载地址:http://pan.baidu.com/s/1jIePjPc 密码是:robu 当然,也可以在官网下载最新版本的 Anaconda3,然后根据自己需要设置成 python 3.6
Hadoop 官网下载地址:http://hadoop.apache.org/releases.html
Spark 官网下载地址:http://spark.apache.org/downloads.html
jdk 下载官网地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html
2.安装并在windowns下配置环境变量
Anaconda 安装较为简单,基本都是下一步,为了避免不必要的麻烦,最后默认安装路径,具体安装过程为:
双击安装文件,启动安装程序
点击I Agree
进行下一步操作
点击Next
进行下一步
如果系统只有一个用户选择默认的第一个即可,如果有多个用户而且都要用到 Anaconda ,则选择第二个选项。
为了避免之后不必要的麻烦,建议默认路径安装即可,需要占用空间大约 1.8 G左右。
安装需要一段时间,等待安装完成即可。
到这里就安装完成了,可以将“Learn more about Aanaconda Cloud”Learn more about Aanaconda Support”前的对号去掉,然后点击“Finish”即可。
jdk1.8+也解压到默认的路径下;hadoop-2.7.4和spark-2.2.0-bin-hadoop2.7可以装在任意磁盘下
在windowns下配置环境变量(hadoop/spark/Java)
Java环境变量:
hadoop环境变量:
spark环境变量:
配置path:
上述操作之后,剩下的就是一直点”确定”,这样环境变量就配置好了
4.启动 spark
在启动之前需要在hadoop-2.7.4的bin目录下,安装winutils.exe文件,否则就会报错,错误如下
E:\>spark-shellUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.propertiesSetting default log level to "WARN".To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).17/06/05 21:34:43 ERROR Shell: Failed to locate the winutils binary in the hadoop binary pathjava.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:2327) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:365) at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:229) at org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:991) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:92) at $line3.$read$$iw$$iw.<init>(<console>:15) at $line3.$read$$iw.<init>(<console>:42) at $line3.$read.<init>(<console>:44) at $line3.$read$.<init>(<console>:48) at $line3.$read$.<clinit>(<console>) at $line3.$eval$.$print$lzycompute(<console>:7) at $line3.$eval$.$print(<console>:6) at $line3.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681) at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37) at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:105) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909) at org.apache.spark.repl.Main$.doMain(Main.scala:69) at org.apache.spark.repl.Main$.main(Main.scala:52) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
或者不用装hadoop,直接安装一个winustil.exe 但是必须配置环境变量.接下来在”开始”菜单键找到”jupyter notebook”,双击运行
启动之后会看到如下图所示:
在浏览器上会出现如下图所示:
之后在页面的右上角找到”New”创建”Python3”,如图所示:
之后在输入如下代码,启动spark
import osimport sysspark_home = os.environ.get('SPARK_HOME', None)if not spark_home: raise ValueError('SPARK_HOME environment variable is not set')sys.path.insert(0, os.path.join(spark_home, 'python'))sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.10.4-src.zip'))comm=os.path.join(spark_home, 'python/lib/py4j-0.10.4-src.zip')print ('start spark....',comm)exec(open(os.path.join(spark_home, 'python/pyspark/shell.py')).read())
如图所示:
至此,spark启动成功
5.启动时出现的错误及解决办法
1.错误如下:
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState': at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981) at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96) ... 47 elidedCaused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978) ... 58 moreCaused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169) at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86) at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101) at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101) at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100) at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157) at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32) ... 63 moreCaused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: --------- at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166) ... 71 moreCaused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: --------- at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262) at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66) ... 76 moreCaused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: --------- at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188) ... 84 moreCaused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: --------- at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) ... 85 more
错误消息中提示零时目录 /tmp/hive 没有写的权限:
The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
所以我们需要更新E:/tmp/hive的权限(我在E盘下运行的spark-shell命令,如果在其他盘运行,就改成对应的盘符+/tmp/hive)。运行如下命令:
E:\>C:\winutils\bin\winutils.exe chmod 777 E:\tmp\hive
再次运行spark-shell,spark启动成功。此时可以通过 http://localhost:4040 来访问Spark UI
解决错误的参考博客:https://yq.aliyun.com/articles/96424?t=t1
http://www.cnblogs.com/czm1032851561/p/5751722.html
- 在windowns下安装Anaconda3运行spark
- SDL 在windowns下安装
- 在Windowns环境安装Maven
- 关于anaconda3下安装opencv3
- windowns下进行Python的安装
- Windowns、Ubuntu17.10 下安装 MongoDB
- 使用Anaconda3安装tensorflow,opencv,使其可以在spyder中运行
- Linux 下安装anaconda3.6小结
- Anaconda3下XGBoost的安装与配置
- window+anaconda3+python3.5下xgboost安装
- CentOS7下安装Anaconda3和Tensorflow
- 如何在Spark下运行python文件
- 在Ubuntu上用anaconda3安装opencv3
- Windows下在Anaconda3中安装python版的XGBoost库
- 安装Anaconda3
- spark安装:在hadoop YARN上运行spark-shell
- windowns下qt5.2.0(mingw)安装glut(opengl)工具包
- windowns下解压缩安装mysql-5.7.11-winx64
- 2017-10-16 集训总结
- 基于Qt的收银点餐系统之UI的改进——QStackedLayout和QScrollArea的使用
- View 的绘图流程 (二)
- ARC 076
- 事务的隔离级别
- 在windowns下安装Anaconda3运行spark
- 算法001号:栈顶的压入弹出序列
- 2017.10.16
- hive表信息查询:查看表结构、表操作等
- OpenGL_6:用两个VBO和VAO分别画2个三角形
- 第二十一课、结构体
- Android View深入解析(二)事件分发机制
- hdu 4751 Divide Groups(图的二染色)
- 关于java的注解