Hive的简单操作

来源:互联网 发布:算法设计是什么 编辑:程序博客网 时间:2024/05/10 08:42

使用hive之前先启动hadoop

进入到安装的路径:/opt/hadoop/hadoop-2.7.3/sbin

执行命令:

./start-dfs.sh

./start-yarn.sh

./hadoop-daemon.sh start datanode

./mr-jobhistory-daemon.sh start historyserver

执行完命令之后使用jps查看

[root@tiancunPC sbin]# jps
4992 org.eclipse.equinox.launcher_1.3.100.v20150511-1540.jar
16066 JobHistoryServer
15237 DataNode
12837 ResourceManager
16102 Jps
12427 NameNode
12620 SecondaryNameNode
15503 RunJar


启动hive,进入到安装目录bin中

执行./hive

进入到hive的命令行:

hive>

1、查看有多少数据库:

hive> show databases;
OK
default
Time taken: 0.016 seconds, Fetched: 1 row(s)
hive>


2、使用数据库:

hive> use default;
OK
Time taken: 0.041 seconds
hive>


3、查看有多少张表:

hive> show tables;
OK
Time taken: 0.256 seconds
hive>


4、创建表:

hive> create table student(id int,name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 5.467 seconds
hive>

其中ROW 和 FIELDS指定分割符,指定行和字段的分割;


5、往表中放几条数据

创建了一个/opt/datas 文件夹用来放student表的数据,创建一个文件student.txt

编辑student.txt

1001    zhangsan

1002    lisi

1003    wangwu

现在要把student.txt文件中的数据加载到 student表中去

hive> load data local inpath '/opt/hadoop/datas/student.txt' into table student;

指定这个命令可能会报一个错误:

Failed with exception File /tmp/hive-root/hive_2016-11-12_15-03-17_165_7992996511195552405-1/-ext-10000/student.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1571)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)


是datanode没有启动,重新执行命令./hadoop-daemon.sh start datanode  启动一下即可


6、查询语句

select * from student; 发现没有问题,但是select id from student;就报错了

Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1478941270145_0012, Tracking URL = http://tiancunPC:8088/proxy/application_1478941270145_0012/
Kill Command = /opt/hadoop/hadoop-2.7.3/bin/hadoop job  -kill job_1478941270145_0012
Interrupting... Be patient, this might take some time.
Press Ctrl+C again to kill JVM
killing job with: job_1478941270145_0012
Exiting the JVM
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2016-11-12 22:59:56,214 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1478941270145_0012 with errors
Error during job, obtaining debugging information...
killing job with: job_1478941270145_0012
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0:  HDFS Read: 0 HDFS Write: 0 FAIL


解决办法:转载地址 https://www.iteblog.com/archives/831

说明:

如果你想查询某个表的某一列,Hive默认是会启用MapReduce Job来完成这个任务,如下:

hive> SELECT id, money FROM m limit10;
Total MapReduce jobs =1
Launching Job 1 out of 1
Number of reduce tasks is set to0 since there's no reduce operator
Cannot run job locally: Input Size (=235105473) is larger than
hive.exec.mode.local.auto.inputbytes.max (=134217728)
Starting Job = job_1384246387966_0229, Tracking URL =
http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/
Kill Command = /home/q/hadoop-2.2.0/bin/hadoop job 
-kill job_1384246387966_0229
hadoop job informationfor Stage-1: number of mappers:1;
number of reducers:0
2013-11-1311:35:16,167Stage-1 map = 0%,  reduce =0%
2013-11-1311:35:21,327Stage-1 map = 100%,  reduce =0%,
 Cumulative CPU1.26 sec
2013-11-1311:35:22,377Stage-1 map = 100%,  reduce =0%,
 Cumulative CPU1.26 sec
MapReduce Total cumulative CPU time:1 seconds 260 msec
Ended Job = job_1384246387966_0229
MapReduce Jobs Launched:
Job 0: Map:1   Cumulative CPU:1.26 sec  
HDFS Read: 8388865 HDFS Write: 60 SUCCESS
Total MapReduce CPU Time Spent:1 seconds 260 msec
OK
1       122
1       185
1       231
1       292
1       316
1       329
1       355
1       356
1       362
1       364
Time taken: 16.802 seconds, Fetched:10 row(s)

  我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题,从Hive0.10.0版本开始,对于简单的不需要聚合的类似SELECT <col> from <table> LIMIT n语句,不需要起MapReduce job,直接通过Fetch task获取数据,可以通过下面几种方法实现:
  方法一:

hive> set hive.fetch.task.conversion=more;
hive> SELECT id, money FROM m limit10;
OK
1       122
1       185
1       231
1       292
1       316
1       329
1       355
1       356
1       362
1       364
Time taken: 0.138 seconds, Fetched: 10 row(s)

上面 set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job!
  方法二:

bin/hive --hiveconf hive.fetch.task.conversion=more

  方法三:
上面的两种方法都可以开启了Fetch任务,但是都是临时起作用的;如果你想一直启用这个功能,可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置:

<property>
  <name>hive.fetch.task.conversion</name>
  <value>more</value>
  <description>
    Some select queries can be converted to single FETCH task
    minimizing latency.Currently the query should be single
    sourced not having any subquery and should not have
    any aggregations or distincts (which incurrs RS),
    lateral views and joins.
    1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
    2. more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
  </description>
</property>









0 0