记录我的hadoop学习历程2--运行 wordcount

来源:互联网 发布:关于网络暴力的新闻 编辑:程序博客网 时间:2024/05/17 06:23

首先启动

sh /usr/local/hadoop/sbin/start-all.sh

导入数据到hdfs(当前位置为 hadoop 根目录)

1、创建数据仓库目录

./bin/hadoop dfs -mkdir -p /user/guoyakui/hadoopfile即:./bin/hadoop dfs -mkdir -p /user/用户名/自定义文件夹

2、拷贝数据到数据仓库

./bin/hadoop dfs -copyFromLocal  /Users/guoyakui/Desktop/hadoop/data  /user/guoyakui/hadoopfile即:./bin/hadoop dfs -copyFromLocal  本地数据地址  数据仓库地址(上面建立的目录)

3、拷贝完成之后可以查看一下

./bin/hadoop dfs -ls /user/guoyakui/hadoopfile即:  ./bin/hadoop dfs -ls /user/用户名/自定义目录
输出:┌─[guoyakui@guoyakuideMBP] - [/usr/local/hadoop] - [  5 23, 15:47]└─[$] <> ./bin/hadoop dfs -ls /user/guoyakui/hadoopfile/data-rw-r--r--   1 guoyakui supergroup    1580879 2017-05-23 14:56 /user/guoyakui/hadoopfile/data/4300-0.txt-rw-r--r--   1 guoyakui supergroup    1428841 2017-05-23 14:56 /user/guoyakui/hadoopfile/data/5000-8.txt-rw-r--r--   1 guoyakui supergroup     674570 2017-05-23 14:56 /user/guoyakui/hadoopfile/data/pg20417.txt

4、运行examples-wordcount

./bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.8.0-sources.jar org.apache.hadoop.examples.WordCount  /user/guoyakui/hadoopfile/data /user/guoyakui/hadoopfile/data-output即: ./bin/hadoop jar example的jar包地址  具体的功能 输入源 输出地
输出:(输出内容较多,只截取了一部分)17/05/23 15:50:16 INFO mapred.LocalJobRunner: reduce > reduce17/05/23 15:50:16 INFO mapred.Task: Task 'attempt_local1414386995_0001_r_000000_0' done.17/05/23 15:50:16 INFO mapred.LocalJobRunner: Finishing task: attempt_local1414386995_0001_r_000000_017/05/23 15:50:16 INFO mapred.LocalJobRunner: reduce task executor complete.17/05/23 15:50:17 INFO mapreduce.Job:  map 100% reduce 100%17/05/23 15:50:17 INFO mapreduce.Job: Job job_local1414386995_0001 completed successfully17/05/23 15:50:17 INFO mapreduce.Job: Counters: 35    File System Counters        FILE: Number of bytes read=4121088        FILE: Number of bytes written=8782066        FILE: Number of read operations=0        FILE: Number of large read operations=0        FILE: Number of write operations=0        HDFS: Number of bytes read=11959179        HDFS: Number of bytes written=879197        HDFS: Number of read operations=33        HDFS: Number of large read operations=0        HDFS: Number of write operations=6    Map-Reduce Framework        Map input records=78096        Map output records=629882        Map output bytes=6091113        Map output materialized bytes=1454541        Input split bytes=403        Combine input records=629882        Combine output records=100609        Reduce input groups=81942        Reduce shuffle bytes=1454541        Reduce input records=100609        Reduce output records=81942        Spilled Records=201218        Shuffled Maps =3        Failed Shuffles=0        Merged Map outputs=3        GC time elapsed (ms)=14        Total committed heap usage (bytes)=1789919232    Shuffle Errors        BAD_ID=0        CONNECTION=0        IO_ERROR=0        WRONG_LENGTH=0        WRONG_MAP=0        WRONG_REDUCE=0    File Input Format Counters        Bytes Read=3684290    File Output Format Counters        Bytes Written=879197

5、查看运行结果

a、查看输出的文件
./bin/hadoop dfs -ls /user/guoyakui/hadoopfile/data-output
┌─[guoyakui@guoyakuideMBP] - [/usr/local/hadoop] - [  5 23, 15:50]└─[$] <> ./bin/hadoop dfs -ls /user/guoyakui/hadoopfile/data-outputFound 2 items-rw-r--r--   1 guoyakui supergroup          0 2017-05-23 15:21 /user/guoyakui/hadoopfile/data-output/_SUCCESS-rw-r--r--   1 guoyakui supergroup     879197 2017-05-23 15:21 /user/guoyakui/hadoopfile/data-output/part-r-00000
b、查看文件内容
./bin/hadoop dfs -cat /user/guoyakui/hadoopfile/data-output/part-r-00000
输出:(内容较多,截取一少部分)—A  40—About  2—Adiutorium 1—Afraid 2After  7After, 1—Afterwits, 1—Again, 1—Agonising  1—Ah 3—Ah,    10—Aha!   1—Aha... 1—Ahem!  1—Alas,  1All    8—Am 2