Big Data Ecosystem and Components
来源:互联网 发布:网络喷子 编辑:程序博客网 时间:2024/04/30 02:42
- Apache Spark Components
- Spark Core Component
- special data structure RDD - basic I/O functionalities - jobs and task scheduling and monitoring - memory management - fault recovery - interacting with storage systems - so on
Spark SQL component
Spark Streaming
GraphX
MLlib
- clustering - classification - decomposition - regression - collaborative filtering
- Zookeeper
Coordination
- Oozie
Workflow and Scheduling
- Pig
Scripting, data access
- Mahout
Machine Learing library.
- Hive
Query
- Hbase
NoSQL Database
- Ambari
Management and Monitoring,提供Hadoop集群的部署、管理和监控等功能,为运维人员管理Hadoop集群提供了强大的Web界面。
- MapReduce
Distributed Processing
- Sqoop
Data Integration, importing or exporting data.
- Mesos
open source cluster managers.
Hadoop
Cassandra
a free open-source distributed database management system designed to handle large amounts of data across many commodity servers, providig high availability with no single point of failure.
- Hadoop YARN
open source cluster managers.
- Amazon EC2
Amazon Elastic Compute Cloud(Amazon EC2) is a web service that provides resizable compute capacity in the cloud.
- Flume
Gathering and aggregate large amounts of data.
- Simba
A distributed in-memory spatial analytics engine based on Apache Spark.
- Alluxio
Open source memory speed virtual distributed storage.
- airflow
Airflow is a platform to programmatically author, schedule and monitor workflows.
- Apache Oozie
Oozie, Workflow Engine for Hadoop.
- Apache Kafka
public-subscribe messaging system
- Tachyon
now is Alluxio.
- BlinkDB
a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data.
- Shark
stop update
- RabbitMQ
An open source message broker software (message-orented middleware) that implements the Advanced Message Queuing Protocol(AMQP).
- Impala
高层语言
- RHadoop
机器学习库
- Flume
数据传输工具, 可用于日志数据收集、处理和传输,功能类似于Chukwa,但比Chukwa更小巧实用。
- Avro
数据序列化系统,用于大批量数据实时动态交换,它是新的数据序列化与传输工具,估计会逐步取代Hadoop原有的RPC机制。
- Chukwa
数据传输工具,它可以将各种各样类型的数据收集与导入Hadoop。
- Sqoop
数据传输工具,将一个关系型数据库(MySQL 、Oracle 、Postgres等)中的数据导入Hadoop的HDFS中,也可以将HDFS的数据导入关系型数据库中。
- Hue
Hadoop及其生态圈组件的Web编辑工具。实现对HDFS、Yarn、MapReduce、Hbase、Hive、Pig等的Web化操作。
- BigTop
针对Hadoop及其周边组件的打包、分发和测试工具。解决组件间版本依赖、冲突问题,实际上当用户用rpm或yum方式部署时,脚本内部会用到它。
,alluxio
,airflow-homepage
,oozie
,simba
apache-spark-ecosystem-components
mllib-statistics
google-math
programming-guide
tuning-spark
amazon-ec2
AirFlow-Joins-Apache-Incubator
Data Workflow Management Using Airbnb’s Airflow
You-Tube: Airflow An open source platform to author and monitor data pipelines
why-airflow-blog
incubator-airflow-github
解密Airbnb的数据流编程神器:Airflow中的技巧和陷阱
blinkdb-homepage
- Big Data Ecosystem and Components
- Hadoop ecosystem notes (all components)
- ColdFusion Components and Data Abstraction @ JDJ
- Designing Data Tier Components and Passing Data Through Tiers
- Designing Data Tier Components and Passing Data Through Tiers
- Bringing Big Data and Smart Energy Together
- Big Data and CDN: Content Delivery Network
- Web Intelligence and Big Data 笔记
- Web Intelligence and Big Data--Final Exam
- Big Data Ingestion and streaming product introduction
- Linux memory manager and your big data
- Big data and its developer fallout
- Big Data Processing:Map and Reduce
- Hadoop ecosystem HDFS and HDFS2
- Graduate Programs in Big Data Analytics and Data Science
- Service-generated Big Data and Big Data-as-a-Service: An Overview
- BIG DATA
- Big Data
- Ionic ion-header-bar、bar-subheader、ion-tabs
- 2016年工作总结
- 数字天堂HBuilder+MUI(四)Native.js示例汇总
- 三大主流开源 NoSQL 数据库和两大主流传统 SQL 数据库对比
- Caffe中Flages的安装与使用!
- Big Data Ecosystem and Components
- (39.1) Spring Boot Shiro权限管理【从零开始学Spring Boot】
- 07、一步一步学thinkjs之实现注销以及判断是否登录
- [Paper note] Learning from Simulated and Unsupervised Images through Adversarial Training
- 路径规划
- STL排序算法之swap()
- 第一个微信小程序demo
- Linux之守护进程
- maven 镜像地址