hadoop 最小化运行

来源:互联网 发布:西安交通大学网络学校 编辑:程序博客网 时间:2024/05/01 02:09
hadoop 笔记1 最小化运行

1.安装

Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Configuration

Use the following:

conf/core-site.xml:

<configuration>     <property>         <name>fs.default.name</name>         <value>hdfs://localhost:9000</value>     </property></configuration>

conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>

conf/mapred-site.xml:
<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Execution

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Start the hadoop daemons:
$ bin/start-all.sh

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

  • NameNode - http://localhost:50070/
  • JobTracker - http://localhost:50030/

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

or

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:
$ bin/stop-all.sh




2 建立测试mapReduce类
 package org.myorg; 

import java.io.IOException; 
import java.util.*; 

import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.conf.*; 
import org.apache.hadoop.io.*; 
import org.apache.hadoop.mapred.*; 
import org.apache.hadoop.util.*; 
 
public class WordCount { 
 
    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { 
      private final static IntWritable one = new IntWritable(1); 
      private Text word = new Text(); 
 
      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 
        String line = value.toString(); 
        StringTokenizer tokenizer = new StringTokenizer(line); 
        while (tokenizer.hasMoreTokens()) { 
          word.set(tokenizer.nextToken()); 
          output.collect(word, one); 
        } 
      } 
    } 
 
    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 
      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 
        int sum = 0; 
        while (values.hasNext()) { 
          sum += values.next().get(); 
        } 
        output.collect(key, new IntWritable(sum)); 
      } 
    } 
 
    public static void main(String[] args) throws Exception { 
      JobConf conf = new JobConf(WordCount.class); 
      conf.setJobName("wordcount"); 
 
      conf.setOutputKeyClass(Text.class); 
      conf.setOutputValueClass(IntWritable.class); 
 
      conf.setMapperClass(Map.class); 
      conf.setCombinerClass(Reduce.class); 
      conf.setReducerClass(Reduce.class); 
 
      conf.setInputFormat(TextInputFormat.class); 
      conf.setOutputFormat(TextOutputFormat.class); 
 
      FileInputFormat.setInputPaths(conf, new Path(args[0])); 
      FileOutputFormat.setOutputPath(conf, new Path(args[1])); 
 
      JobClient.runJob(conf); 
    } 


把类打成JAR包

Usage

Assuming HADOOP_HOME is the root of the installation and HADOOP_VERSION is the Hadoop version installed, compile WordCount.java and create a jar:

1.
     $ mkdir wordcount_classes
2.
     $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d wordcount_classes WordCount.java
3.
     $ jar -cvf /usr/joe/wordcount.jar -C wordcount_classes/ .

4. hadoop fs -mkdir /usr/joe/worldcount/input

  • /usr/joe/wordcount/input - input directory in HDFS
  • /usr/joe/wordcount/output - output directory in HDFS

Sample text-files as input:

$ bin/hadoop dfs -ls /usr/joe/wordcount/input/
/usr/joe/wordcount/input/file01
/usr/joe/wordcount/input/file02

$ bin/hadoop dfs -cat /usr/joe/wordcount/input/file01
Hello World Bye World

$ bin/hadoop dfs -cat /usr/joe/wordcount/input/file02
Hello Hadoop Goodbye Hadoop

Run the application:

$ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input /usr/joe/wordcount/output

Output:

$ bin/hadoop dfs -cat /usr/joe/wordcount/output/part-00000
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2

原创粉丝点击