使用Hive进行手机流量统计
来源:互联网 发布:工业机械手控制软件 编辑:程序博客网 时间:2024/04/29 03:36
最近面试,发现很多公司在使用hive对数据进行处理。
hive是hadoop家族成员,是一种解析like sql语句的框架。它封装了常用MapReduce任务,让你像执行sql一样操作存储在HDFS的表。
hive的表分为两种,内表和外表。
Hive 创建内部表时,会将数据移动到数据仓库指向的路径;若创建外部表,仅记录数据所在的路径,不对数据的位置做任何改变。
在删除表的时候,内部表的元数据和数据会被一起删除, 而外部表只删除元数据,不删除数据。这样外部表相对来说更加安全些,数据组织也更加灵活,方便共享源数据。
Hive的内外表,还有一个Partition的分区的知识点,用于避免全表扫描,快速检索。后期的文章会提到。
接下来开始正式开始《Hive统计手机流量》
原始数据:
1363157985066 13726230503 00-FD-07-A4-72-B8:CMCC 120.196.100.82 i02.c.aliimg.com 24 27 2481 24681 2001363157995052 13826544101 5C-0E-8B-C7-F1-E0:CMCC 120.197.40.4 4 0 264 0 2001363157991076 13926435656 20-10-7A-28-CC-0A:CMCC 120.196.100.99 2 4 132 1512 2001363154400022 13926251106 5C-0E-8B-8B-B1-50:CMCC 120.197.40.4 4 0 240 0 2001363157993044 18211575961 94-71-AC-CD-E6-18:CMCC-EASY 120.196.100.99 iface.qiyi.com 瑙.?缃.. 15 2 1527 2106 2001363157995074 84138413 5C-0E-8B-8C-E8-20:7DaysInn 120.197.40.4 122.72.52.12 20 16 4116 1432 2001363157993055 13560439658 C4-17-FE-BA-DE-D9:CMCC 120.196.100.99 18 15 1116 954 2001363157995033 15920133257 5C-0E-8B-C7-BA-20:CMCC 120.197.40.4 sug.so.360.cn 淇℃.瀹.. 20 20 156 2936 200
操作步骤:
#配置好Hive之后,使用hive命令启动hive框架。hive启动属于懒加载模式,会比较慢hive;#使用show databases命令查看当前数据库信息hive> show databases;OKdefaulthive Time taken: 3.389 seconds#使用 use hive命令,使用指定的数据库 hive数据库是我之前创建的use hive;#创建表,这里是创建内表。内表加载hdfs上的数据,会将被加载文件中的内容剪切走。#外表没有这个问题,所以在实际的生产环境中,建议使用外表。create table ll(reportTime string,msisdn string,apmac string,acmac string,host string,siteType string,upPackNum bigint,downPackNum bigint,upPayLoad bigint,downPayLoad bigint,httpStatus string)row format delimited fields terminated by '\t';#加载数据,这里是从hdfs加载数据,也可用linux下加载数据 需要local关键字load data inpath'/HTTP_20130313143750.dat' into table ll;#数据加载完毕之后,hdfs的#执行hive 的like sql语句,对数据进行统计select msisdn,sum(uppacknum),sum(downpacknum),sum(uppayload),sum(downpayload) from ll group by msisdn;
执行结果如下:
hive> select msisdn,sum(uppacknum),sum(downpacknum),sum(uppayload),sum(downpayload) from ll group by msisdn;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number>In order to set a constant number of reducers: set mapred.reduce.tasks=<number>Starting Job = job_201307160252_0006, Tracking URL = http://hadoop0:50030/jobdetails.jsp?jobid=job_201307160252_0006Kill Command = /usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=hadoop0:9001 -kill job_201307160252_0006Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12013-07-17 19:51:42,599 Stage-1 map = 0%, reduce = 0%2013-07-17 19:52:40,474 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:41,690 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:42,693 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:43,698 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:44,702 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:45,707 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:46,712 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:47,715 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:48,721 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:49,758 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:50,763 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 48.5 sec2013-07-17 19:52:51,772 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 50.0 sec2013-07-17 19:52:52,775 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 50.0 sec2013-07-17 19:52:53,779 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 50.0 secMapReduce Total cumulative CPU time: 50 seconds 0 msecEnded Job = job_201307160252_0006MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 50.0 sec HDFS Read: 2787075 HDFS Write: 16518 SUCCESSTotal MapReduce CPU Time Spent: 50 seconds 0 msecOK13402169727171108112861302301341580747720671683169668199418113416127574150110941619638027561341617182011399106303212013417106524160128186881308813418002498240256221368689613418090588456351989346747013418117364264152294364996613418173218376804834822612867315972213418666750224322648213956483973555213420637670202014801480......Time taken: 75.24 seconds
使用MapReduce进行手机流量统计:http://www.maoxiangyi.cn/index.php/archives/256
0 0
- 使用Hive进行手机流量统计
- hive实战演练:手机流量统计
- 使用ADB进行流量统计
- 统计手机流量
- 手机总流量统计
- 使用百度统计对网站进行流量分析和统计
- Android 获取手机整体流量使用情况以及某个应用的流量的统计
- Android中进行流量统计
- 手机站点流量统计系统分较
- pig实战演练:手机流量统计
- MapReduce实现手机上网流量统计
- 静态手机网站如何统计流量
- MapReduce实战练习一:手机流量统计
- 一个hive小案例:使用HIVE进行单词统计, 并把结果存入mysql
- 使用tcpdump统计andorid流量
- php根据ip来进行流量统计
- 使用burpsuit捕获手机流量
- 使用CBWFQ进行流量控制
- 中介者模式(Mediator)
- AJAX中文乱码总结
- hdu 5037——Frog
- 第一次被拒(悲剧)
- vim常用命令
- 使用Hive进行手机流量统计
- VB.net学习笔记之串口通讯:System.IO.Ports.SerialPort
- 一起学DNS系列(十五)DNS查询工具之NSLOOKUP的使用
- PowerManager 介绍
- andriodSDK文件目录
- Android image 拍照显示
- C#开发微信门户及应用(17)-微信企业号的通讯录管理开发之部门管理
- 判断一台机器是大端序还是小端序
- asp.net中执行exe应用程序