Hive的两个问题

来源:互联网 发布:武装突袭3低配优化 编辑:程序博客网 时间:2024/05/21 07:59
Hive的两个问题:
问题一:Too Many Small Partitions
It can be tempting to partition your data into many small partitions to try to increase speed and concurrency. 
However, Hive functions best when data is partitioned into larger partitions. 
For example, consider partitioning a 100 TB table into 10,000 partitions, each 10 GB in size. In addition, 
do not use more than 10,000 partitions per table. Having too many small partitions puts significant strain on the Hive MetaStore and does not 
improve performance.


问题二:Hive Queries Fail with "Too many counters" Error

Hive operations use various counters while executing MapReduce jobs. 
These per-operator counters are enabled by the configuration setting hive.task.progress. 
This is disabled by default; if it is enabled, Hive may create a large number of counters (4 counters per operator, plus another 20).


Note:
If dynamic partitioning is enabled, Hive implicitly enables the counters during data load.
By default, CDH restricts the number of MapReduce counters to 120. 
Hive queries that require more counters will fail with the "Too many counters" error.
What To Do
If you run into this error, set mapreduce.job.counters.max in mapred-site.xml to a higher value.
0 0