2016/10/19

来源:互联网 发布:java架构师工作 编辑:程序博客网 时间:2024/05/22 04:30

MapReduce的过程:把原来的数据分成块,一条一条记录使用MAP函数生成键值对,以键值把把键值对归类形成集合,再把这些集合进行排序。
开发者定义四个过程:输入-》键值对,MAP,REDUCE,键值对=》输出

炼数成金

  • hadoop不是数据库,因为它不提供数据库的基本功能

- **hadoop不适用实时计算,因为有时间差?什么是实时计算?分析股票行情出发一些动作,红绿灯调度问题

Pro Apache Hadoop

  • namenode:metadata
  • Configuration file
    • default file and site file
    • -
  • secondary namenode: not backup, housekeeping
    • merge edit and fsimage
    • edits: accumulate the change since the last changepoint
    • fsimage:last checkpoing
    • fstime: contains the timestamp of the last checkpoint
  • Task Tracker:
    • accepts requests for task such as map, reduce ad shuffle
    • slota= cores on the machine
    • ???多处理器和多核的区别???
    • hearbeat: tell whether healthy and how many free slots are available
  • Job Tracker:
    • schedule: close to the data block
    • determin number of taks
  • YERN
    • the idea is to have a global resource manager and a per-application Application Master.
    • components
      • global resouece mannager
        • primaly a schedular
        • ensure uptimal cluster utilization
      • node manager
        • local resource manager
        • slave service.
        • take requests form resource manager and allocates containers to application
        • eachnode has its own node manager
      • application-specific application master
        • is the key defferentiatorbetween the older MapReduce v1 framework and YARN
        • each type has an application master
        • improved scalability
        • a more generic framework
      • scheduler
      • container
        • CPU and memory
      • -
0 0