hadoop 开发调试环境

来源:互联网 发布:python bytes 编辑:程序博客网 时间:2024/06/16 09:57

一 目标

  虚拟机安装ubuntu14.04(64位),然后安装hadoop 2.6.0(伪分布),pig、hive和mahout,用作开发调试。

二 安装

1. 配置ssh 

ssh-keygen -t rsacd ~/.sshcat id_dsa.pub >> ~/.ssh/authorized_keys

2.软件准备

Jdk和mysql-server 直接用apt-get 安装sudo apt-get install openjdk-7-jresudo apt-get install openjdk-7-jdksudo apt-get install mysql-serverhadoop-2.6.0.tar.gzpig-0.15.0.tar.gzapache-hive-1.1.1-bin.tar.gzapache-mahout-distribution-0.9.tar.gzmysql-connector-java-5.1.39.tar.gzsynthetic_control.data
3.设置环境变量

将软件解压缩,拷贝到/usr/local目录下,编辑.bashrc增加以下设置

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64export HADOOP_HOME=/usr/local/hadoopexport HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoopexport PIG_HOME=/usr/local/pigexport PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop/export HIVE_HOME=/usr/local/hiveexport HIVE_CLASSPATH=/$HADOOP_HOME/etc/hadoop/export MAHOUT_HOME=/usr/local/mahoutexport MAHOUT_CONFI_DIR=/usr/loca/mahout/confexport PATH=.:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PIG_HOME/bin:$HIVE_HOME/bin:$MAHOUT_HOME/bin:$PATH
检查mysql的安装

sudo /etc/init.d/mysql status

检查java运行情况

java -version
4.配置hadoop伪分布

core-site.xml

<configuration>        <property>             <name>hadoop.tmp.dir</name>             <value>file:/usr/local/hadoop/tmp</value>             <description>Abase for other temporary directories.</description>        </property>        <property>             <name>fs.defaultFS</name>             <value>hdfs://localhost:9000</value>        </property></configuration>
hdfs-site.xml

<configuration>    <property>        <name>dfs.replication</name>        <value>1</value>    </property>    <property>        <name>dfs.namenode.name.dir</name>        <value>file:/usr/local/hadoop/tmp/dfs/name</value>    </property>    <property>        <name>dfs.datanode.data.dir</name>        <value>file:/usr/local/hadoop/tmp/dfs/data</value>    </property></configuration>
初始化

hadoop namenose -format

运行hdfs和yarn

start-dfs.sh 和 start-yarn.sh

检查运行状态

jps

用浏览器查看 

http://localhost:50070
5.配置pig

不需要专门配置,用以下命令验证是否可用

hdfs dfs -put /etc/passwd /user/oliver/passwdpig -x mapreduceA = load 'passwd' using PigStorage(':');B = foreach A generate $0 as id;dump B
6.配置hive

生成配置文件

cp hive-env.sh.template hive-env.shcp hive-default.xml.template hive-site.xml

编辑hive-env.sh

exportHADOOP_HOME=/usr/local/hadoopexport HIVE_CONF_DIR=/usr/local/hive/conf

编辑hive-site.xml

<property>   <name>javax.jdo.option.ConnectionURL </name>   <value>jdbc:mysql://localhost:3306/hive </value></property><property>   <name>javax.jdo.option.ConnectionDriverName </name>   <value>com.mysql.jdbc.Driver </value></property><property>   <name>javax.jdo.option.ConnectionPassword </name>   <value>hive </value></property><property>    <name>javax.jdo.option.ConnectionUserName</name>    <value>hive</value>    <description>Username to use against metastore database</description>  </property><property>    <name>hive.exec.local.scratchdir</name>    <value>/tmp/hive </value>    <description>Local scratch space for Hive jobs</description>  </property>  <property>    <name>hive.downloaded.resources.dir</name>    <value>/tmp/hive</value>    <description>Temporary local directory for added resources in the remote file system.</description>  </property>

复制jar文件

cp mysql-connector-java-5.1.39-bin.jar /usr/local/hive/libcp jline-2.12.jar /usr/local/hadoop/share/hadoop/yarn/lib

pig不能用jline-2.12.jar,需换回原来的包

建库

insert into mysql.user(Host,User,Password) values("localhost","hive",password("hive"));create database hive;grant all on hive.* to hive@'%'  identified by 'hive';grant all on hive.* to hive@'localhost'  identified by 'hive';flush privileges; 

用hive的命令初始化数据库

schematool -dbType mysql –initSchema

检查数据库

mysql –uhive –phiveuse hiveshow tables

启动metastore服务,启动正常则表示安装好了

hive -service metastore
7.配置mahout

将软件解压后拷贝到/usr/local/mahout,设置环境变量即可。

下载数据并进行测试

wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.datahdfs dfs -mkdir /testdatahdfs dfs -put ./synthetic_control.data /testdatahadoop jar /usr/local/mahout/mahout-examples-0.9-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

至此,开发调试环境安装完毕。















0 0
原创粉丝点击