Apache Hive翻译①--简介

来源:互联网 发布:淘宝cos店妆品推荐 编辑:程序博客网 时间:2024/06/15 05:55
Apache Hive

原地址:https://cwiki.apache.org/confluence/display/Hive/Home

The Apache HiveTM data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache HadoopTM , it provides
hive数据仓库软件帮助查询和管理分布式系统中的大数据集。基于Hadoop,他提供
  • Tools to enable easy data extract/transform/load (ETL)
  • 允许简单数据提取/变换和载入的工具
  • A mechanism to impose structure on a variety of data formats
  • 一种机制用来支持多数据形式
  • Access to files stored either directly in Apache HDFSTM or in other data storage systems such as Apache HBaseTM
  • 入口可以是文件存储或者直接使用HDFS或者使用其他存储系统,如HBASE
  • Query execution via MapReduce
  • 通过MapReduce执行查询

Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language. QL can also be extended with custom scalar functions (UDF's), aggregations (UDAF's), and table functions (UDTF's).

Hive定义一个简单的类SQL查询语言,叫QL,这允许用户使用熟悉的sql来查询数据。同时,这个语言也允许书序MapReduce框架的程序员插入他们自定义的mapper和reducer来进行更加复杂的内建功能不支持的分析。QL也可以扩展自定义标量函数(PS:标量函数就是接受0到多个参数,返回一个标量值作为结果的函数),聚合函数,和表函数。
关于表函数:
表函数时sql:2003新加入的。表函数是一个返回表的sql调用函数,标准的规范定义是,返回类型是一个多行的mulitset(允许重复元素的集合),虽然不是一个真正的表,但是可以像表那样查询。
表函数例子:
CREATE FUNCTION weather() 
  RETURNS TABLE ( 
    CITY VARCHAR(25), 
    TEMP_IN_F INTEGER, 
    HUMIDITY INTEGER, 
    WIND VARCHAR(5), 
    FORECAST CHAR(25) ) 
  NOT DETERMINISTIC 
  NO SQL 
  LANGUAGE C 
  EXTERNAL 
  PARAMETER STYLE SQL; 
oracle关于表函数的文档:
http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dcitblfns.htm

Hive does not mandate read or written data be in the "Hive format"---there is no such thing. Hive works equally well on Thrift, control delimited, or your specialized data formats. Please see File Format and SerDe in the Developer Guide for details.
Hive不授予数据的读写权限。Hive在thrift,分割控制,control delimited,或自定义数据类型上表现很好。请参见 File Format and SerDe in the Developer Guide获取详细信息。

Thrift一个跨语言开发的软件框架
http://thrift.apache.org/

Hive is not designed for OLTP workloads and does not offer real-time queries or row-level updates. It is best used for batch jobs over large sets of append-only data (like web logs). What Hive values most are scalability (scale out with more machines added dynamically to the Hadoop cluster), extensibility (with MapReduce framework and UDF/UDAF/UDTF), fault-tolerance, and loose-coupling with its input formats.
Hive并不是给OLTP做的,不提供实时查询,或者行级更新。他适合的是大数据集追加式数据(如web日志)的批量处理。Hive意味着更高的可扩展(通过动态向Hadoop集群添加机器进行扩展),更高的延展性(MapReduce框架和UDF/UDAF/UDTF),更高容错率,注重实时性和输入格式的低耦合。

On-Line Transaction Processing联机事务处理系统(OLTP) 



  • Getting Started
  • Presentations and Papers about Hive
  • A List of Sites and Applications Powered by Hive
  • FAQ
  • hive-users Mailing List
  • Hive IRC Channel: #hive on irc.freenode.net
  • About This Wiki

User Documentation

  • Hive Tutorial
  • HiveQL Language Manual (Queries, DML, DDL, and CLI)
  • Hive Operators and Functions
  • Hive Web Interface
  • Hive Client (JDBC, ODBC, Thrift, etc)
  • HiveServer2 Client
  • Hive Change Log
  • Avro SerDe

Administrator Documentation

  • Installing Hive
  • Configuring Hive
  • Setting Up Metastore
  • Setting Up Hive Web Interface
  • Setting Up Hive Server (JDBC, ODBC, Thrift, etc.)
  • Hive on Amazon Web Services
  • Hive on Amazon Elastic MapReduce

Resources for Contributors

  • Hive Developer FAQ
  • How to Contribute
  • Hive Contributors Meetings
  • Hive Developer Guide
  • Plugin Developer Kit
  • Unit Test Parallel Execution
  • Hive Performance
  • Hive Architecture Overview
  • Hive Design Docs
  • Roadmap/Call to Add More Features
  • Full-Text Search over All Hive Resources
  • Becoming a Committer
  • How to Commit
  • How to Release
  • Build Status on Jenkins (Formerly Hudson)
  • Project Bylaws

For more information, please see the official Hive website.

Apache Hive, Apache Hadoop, Apache HBase, Apache HDFS, Apache, the Apache feather logo, and the Apache Hive project logo are trademarks of The Apache Software Foundation.

Child Pages (13) 

  Hide Child Pages  |  Reorder Pages
Page: AboutThisWiki 
Page: AvroSerDe 
Page: Bylaws 
Page: Dependent Tables 
Page: Hadoop-compatible Input-Output Format for Hive 
Page: HiveAmazonElasticMapReduce 
Page: HiveAwsEmr 
Page: HiveChangeLog 
Page: HiveDeveloperFAQ 
Page: HiveServer2 Clients 
Page: OperatorsAndFunctions 
Page: PluginDeveloperKit 
Page: RCFileCat 
原创粉丝点击