HDPCD-Java-复习笔记(1)
来源:互联网 发布:数控车床车圆弧编程 编辑:程序博客网 时间:2024/05/21 16:58
1.Understand Hadoop HDFS
Pig -- A scripting language that simplifies the creation of MapReduce jobs and excels at exploring and transforming data.
Hive -- Provides SQL-like access to your Big Data.
HBase -- A Hadoop database.
Accumulo -- A robust, scalable, high performance data storage and retrieval system built on Hadoop and Zookeeper.
Ambari -- For provisioning, managing, and monitoring Apache Hadoop clusters.
Sqoop -- For efficiently transferring bulk data between Hadoop and relation databases.
Falcon -- A data processing and management solution for Hadoop , designed for data motion,coordination of data pipelines, life cycle management, and data discovery.
Oozie -- A workflow scheduler system to manage Apache Hadoop jobs.
Solr -- A standalone enterprise search server with a REST-like API.
Flume -- For efficiently collecting, aggregating, and moving large amounts of log data.
ZooKeeper -- An open-source server which enables highly reliable distributed coordination.
Mahout -- An Apache project whose goal is to build scalable machine learning libraries.
The ApacheHadoop 2.x project consists of the followingmodules:
Hadoop Common -- The utilities that provide support for the other Hadoop modules.
HDFS -- The Hadoop Distributed File System
YARN -- A framework for job scheduling and cluster resource management.
MapReduce -- For processing large data sets in a scalable and parallel fashion.
YARN splits up the functionality of the JobTracker in Hadoop 1.x into two separate processes:
ResourceManager -- A daemon process that allocates cluster resources to applications.
ApplicationMaster -- A per-application process that provides the runtime for executing applications.
Putting a file into HDFS involves the following steps:
1)A client application sends a request to the NameNode that specifies where they want to put the file in the file system.
2)The NameNode determines how the data is broken down into blocks and which DataNodes will be used to store those blocks. That information is given to the client application.
3)The client application communicates directly with each DataNode, writing the blocks onto the DataNode.
4)The DataNode then replicates the newly-created block to 2 others DataNodes (assuming the replication factor is 3).
The NameNode has the following characteristics:
It is the master of the DataNodes and executes file system namespace operations like opening, closing, and renaming files and directories.
It determines the mapping of blocks to DataNodes and maintains the file system namespace.
The NameNode performs these tasks by maintaining two files:
fsimage_N -- Contains the entire file system namespace, including the mapping of blocks to files and file system properties.edits_N -- A transaction log that persistently records every change that occurs to file system metadata.
The DataNodes are responsible for:
Handling read and write requests from application clients.
Performing block creation, deletion, and replication upon instruction from the NameNode.
Sending heartbeats to the NameNode.
Sending a Blockreport to the NameNode.
Overview of HDFS High Availability(NameNode HA)
Quorum Journal Manager
All Namespace modifications are logged durably to a majority of the JournalNode daemons (hence the name quorum).
As the Standby Node sees the edits in the JournalNodes, it applies them to its own namespace.
Configuring Automatic Failover
ZKFailoverController(ZKFC) -- A new component that is a ZooKeeper client that monitors and manages the state of a NameNode.
HDFS Commands
ls, du, count, chgrp, chown, chmod, stat, cat, text ,tail, get, copyFromLocal, put, copyToLocal, getmerge, mv, cp, mkdir, rm, rm -R, touchz
test -- Checks if a file exists.
expunge -- Empties the user’s Trash folder.
The Hadoop Filesystem API
- Configuration conf = new Configuration();
- Path dir = new Path("results");
- FileSystem fs = FileSystem.get(conf);
- if(!fs.exists(dir)) {
- fs.mkdirs(dir);
- }
- HDPCD-Java-复习笔记(1)
- HDPCD-Java-复习笔记(2)
- HDPCD-Java-复习笔记(3)-lab
- HDPCD-Java-复习笔记(4)
- HDPCD-Java-复习笔记(5)
- HDPCD-Java-复习笔记(6)
- HDPCD-Java-复习笔记(7)- lab
- HDPCD-Java-复习笔记(8)- lab
- HDPCD-Java-复习笔记(9)-lab
- HDPCD-Java-复习笔记(10)-lab
- HDPCD-Java-复习笔记(11)
- HDPCD-Java-复习笔记(12)
- HDPCD-Java-复习笔记(13)- lab
- HDPCD-Java-复习笔记(14)- lab
- HDPCD-Java-复习笔记(15)
- HDPCD-Java-复习笔记(16)
- HDPCD-Java-复习笔记(17)
- HDPCD-Java-复习笔记(18)
- 专业评测-Hcash超级现金,各显卡挖矿速度数据出炉
- try...catch语句(matlab)
- STEP BY STEP
- 炸AEL.Mining.Services.Tie-Up.v1.5.4.14矿业软件
- mysql 数据库表的操作
- HDPCD-Java-复习笔记(1)
- 自定义圆形进度条
- SpringBoot 快速整合MyBatis
- 【机器学习实战】第14章 利用SVD简化数据
- ORA-01033:ORACLE initialization or shutdown in progress
- Maximum product of consecutive subsequence(最大连续子序列乘积)
- HashMap和Hashtable的区别
- spring管理事物
- java生成二维码