Hadoop

来源：互联网发布：网易数据广州编辑：程序博客网时间：2024/06/05 18:50

Class InputFormat<K,V>

Map-Reduce framework :Split-up the input file(s) into logical InputSplits, each of which is then assigned to an individual Mapper.

Map-Reduce framework 分割输入文件到逻辑的InputSplits，每一个InputSplit都被赋值给个人的Mapper.

RecordReader implementation to be used to glean input records from the logical InputSplit for processing by the Mapper

RecordReader实现成为了Mapper处理，用于从逻辑的InputSplit收集记录

the FileSystem blocksize of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapreduce.input.fileinputformat.split.minsize.

输入文件的最大系统块是分割的上线，下线可以通过mapreduce.input.fileinputformat.split.minsize设置

JOB

It allows the user to configure the job, submit it, control its execution, and query the state

允许用户配置作业，提交他，控制它的执行，和查询状态

Java抽象类org.apache.hadoop.fs.FileSystem定义了hadoop的一个文件系统接口

FileCopyWithProgress---Copies a local file to a Hadoop filesystem 展现如何拷贝本地文件到Hadoop文件系统

FileSystemCat /FileSystemDoubleCat--Displays files from a Hadoop filesystem on standard output by using the FileSystem directly 通过直接使用文件系统显示hadoop文件系统的文件到标准输出上。

URLCat--- Displays files from a Hadoop filesystem on standard output using a URLStreamHandler. 使用URLStreamHandler显示hadoop 文件系统的文件到标准输出上。

Hadoop中的FileStatus类可以用来查看HDFS中文件或者目录的元信息

FileStatus[] status = fs.listStatus(paths);

阅读全文

0 0