hadoop尝试

来源：互联网发布：制作小游戏软件编辑：程序博客网时间：2024/04/30 12:09

版本：1.2.1

http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html

该页面有详细介绍。

1.解压hadoop

2.编辑hadoop/conf/hadoop-env.xml,添加一行export JAVA_HOME=java安装路径

3.执行bin下面的hadoop可以查看帮助。

第一个hadoop程序。

在hadoop目录下

1.mkdir input

2.cp conf/*.xml input

3.bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

hadoop处理中。。。。。。。。。。

4.cat output/*

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the givenoutput directory.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ cat output/*

继续尝试伪分布式。

原文中写的很清楚。一步一步来就是了。

1.编辑配置文件。

conf/core-site.xml

conf/hdfs-site.xml

conf/mapred-site.xml

内容改为如下所示：

conf/core-site.xml

<configuration>     <property>         <name>fs.default.name</name>         <value>hdfs://localhost:9000</value>     </property></configuration>

conf/hdfs-site.xml:

<configuration>     <property>         <name>dfs.replication</name>         <value>1</value>     </property></configuration>

conf/mapred-site.xml:

<configuration>     <property>         <name>mapred.job.tracker</name>         <value>localhost:9001</value>     </property></configuration>

2.配置ssh免密钥登录

先检查一下
$ ssh localhost

如果没返回refused，登录成功，则跳过下面的两步。

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

这样就生成了密钥文件。

启动hadoop

格式化namenode
$ bin/hadoop namenode -format

启动服务
$ bin/start-all.sh

执行$jps

如果发现有namenode

｛问题！

｝

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:
$ bin/stop-all.sh

0 0