HDPCD-Java-复习笔记(3)-lab
来源:互联网 发布:淘宝舞蹈服装大全集 编辑:程序博客网 时间:2024/06/06 17:10
Java Lab Booklet
Lab: Understanding Block Storage
1.The block size needs to be at least (1,048,576 bytes) according to the dfs.namenode.fs - limits.min -block-size property。
2.The block size must be a multiple of 512 bytes (the checksum size).
View the number of blocks
hdfs fsck /user/root/stocks.csv
-files --- the names of the files on the DataNodes.
-blocks --- the block IDs of the file.
-locations --- the IP addresses of the DataNodes.
Lab: Configuring a Hadoop Development Environment
hdfs dfsadmin -report --- Verify DataNodes in cluster.
yarn node -list -- Verify NodeManagers in cluster.
Lab: Putting Files in HDFS with Java
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(configuration);
Path path = new Path("counties");
Path localPath = null;
Path destPath = null;
if (!fs.exists(path)) {
fs.mkdirs(path);
}
String filename = null;
for (int i = 1; i <= 4; i++) {
filename = "counties_" + i + ".csv";
localPath = new Path("counties/" + filename);
destPath = new Path("counties/" + filename);
fs.copyFromLocalFile(localPath, destPath);
}
build.gradle:
project.ext.mainclass = 'hdfs.InputCounties'
project.ext.archiveName = 'inputcounties.jar'
apply from: '/root/java/labs/build.gradle'
yarn jar inputcounties.jar hdfs.InputCounties
Demo: Understanding MapReduce
Why are the words sorted alphabetically?
The words are the keys, and keys get sorted during the shuffle/sort phase.
Lab: Word Count
WordCountMapper
package wordcount;
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final IntWritable ONE = new IntWritable(1);
private Text outputKey = new Text();
@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
String lineStr = value.toString();
String[] words = StringUtils.split(lineStr, ' ');
//super.map(key, value, context);
for (String word : words) {
outputKey.set(word);
context.write(outputKey, ONE);
}
}
@Override
protected void setup(
Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
super.setup(context);
}
@Override
protected void cleanup(
Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
super.cleanup(context);
}
}
WordCountReducer
package wordcount;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable outputValue = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
outputValue.set(sum);
context.write(key, outputValue);
}
}
WordCountJob
package wordcount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class WordCountJob extends Configured implements Tool {
public static void main(String[] args) throws Exception {
int result = ToolRunner.run(new Configuration(), new WordCountJob(), args);
System.exit(result);
}
@Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf(), "WordCountJob");
Configuration configuration = job.getConfiguration();
job.setJarByClass(getClass());
Path in = new Path(args[0]);
Path out = new Path(args[1]);
out.getFileSystem(configuration).delete(out, true);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true) ? 0 : 1;
}
}
- HDPCD-Java-复习笔记(3)-lab
- HDPCD-Java-复习笔记(7)- lab
- HDPCD-Java-复习笔记(8)- lab
- HDPCD-Java-复习笔记(9)-lab
- HDPCD-Java-复习笔记(10)-lab
- HDPCD-Java-复习笔记(13)- lab
- HDPCD-Java-复习笔记(14)- lab
- HDPCD-Java-复习笔记(21)- lab
- HDPCD-Java-复习笔记(22)- lab
- HDPCD-Java-复习笔记(23)- lab
- HDPCD-Java-复习笔记(1)
- HDPCD-Java-复习笔记(2)
- HDPCD-Java-复习笔记(4)
- HDPCD-Java-复习笔记(5)
- HDPCD-Java-复习笔记(6)
- HDPCD-Java-复习笔记(11)
- HDPCD-Java-复习笔记(12)
- HDPCD-Java-复习笔记(15)
- 每周一本书之《驾驭大数据》:如何实现大数据的应用性
- 智能合约不智能
- 2017杭州·云栖大会召开在即,世界都在聆听的声音,数据猿将带来全景式记录
- 荣之联发布大数据平台DataZoo,阿里巴巴将在欧洲建第二个数据中心 | 大数据24小时
- 金融科技&大数据产品推荐:金融魔方 ---专业的金融SaaS服务平台
- HDPCD-Java-复习笔记(3)-lab
- 【案例】享宇金服:区块链授权存证体系
- maven——将本地jar包放入maven的本地仓库中
- Spring Boot实战与原理分析视频课程 共27课
- PASCAL VOC数据集分析
- 学习JavaScript数据结构与算法(七)——散列表(二)
- javascript重新刷新页面
- k-shingle
- 人工智能将给医生和患者了带来什么?