hadoop集群之间迁移分区表
来源:互联网 发布:jmeter响应数据乱码 编辑:程序博客网 时间:2024/06/05 10:29
这里集群的分区表是指的hive/impala表, 表存储格式是parquet.
迁移的时候是指文件的拷贝。下面我做一个案例演示。 如果有大量的表要迁移,可以写一个java程序,多线程控制。
1.查看源集群的表位置
[root@slave01 ~]# hadoop fs -du -h /user/hive/warehouse/prestat.db/dt_differ_users_pre_xdr299.0 K 896.9 K /user/hive/warehouse/prestat.db/dt_differ_users_pre_xdr/day=20170601[root@slave01 ~]#
2.把源集群的文件down到源服务器上
[root@slave01 ~]# hadoop fs -get /user/hive/warehouse/prestat.db/dt_differ_users_pre_xdr/day=20170601/minute=0000 /root
3.把文件通过ftp down到本机或者scp到 目标集群。
[root@slave01 ~]# hadoop fs -put /root/dt_differ_users_pre_xdr/ /user/hive/warehouse/prestat.db/[root@slave01 ~]# hadoop fs -du -h /user/hive/warehouse/prestat.db^C[root@slave01 ~]# hadoop fs -du -h /user/hive/warehouse/prestat.db/dt_differ_users_pre_xdr159.7 M 159.7 M /user/hive/warehouse/prestat.db/dt_differ_users_pre_xdr/day=201709010 0 /user/hive/warehouse/prestat.db/dt_differ_users_pre_xdr/day=201710010 0 /user/hive/warehouse/prestat.db/dt_differ_users_pre_xdr/day=20171101
20170901 有数据,我们创建该月分区 (我直接在impala执行的,避免元数据没有同步刷新)
[slave01:21000] > alter table prestat.dt_differ_users_pre_xdr add IF NOT EXISTS partition(day=20170901,minute=cast('0000' as char(4)));Query: alter table prestat.dt_differ_users_pre_xdr add IF NOT EXISTS partition(day=20170901,minute=cast('0000' as char(4)))Fetched 0 row(s) in 1.63s[slave01:21000] > show partitions prestat.dt_differ_users_pre_xdr ;Query: show partitions prestat.dt_differ_users_pre_xdr+----------+--------+-------+--------+----------+--------------+-------------------+---------+-------------------+---------------------------------------------------------------------------------------------+| day | minute | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |+----------+--------+-------+--------+----------+--------------+-------------------+---------+-------------------+---------------------------------------------------------------------------------------------+| 20170901 | 0000 | -1 | 2 | 159.72MB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://myha/user/hive/warehouse/prestat.db/dt_differ_users_pre_xdr/day=20170901/minute=0000 || Total | | -1 | 2 | 159.72MB | 0B | | | | |+----------+--------+-------+--------+----------+--------------+-------------------+---------+-------------------+---------------------------------------------------------------------------------------------+Fetched 2 row(s) in 0.03s[slave01:21000] >
5.查看数据
[slave01:21000] > select day,minute,count(*) from prestat.dt_differ_users_pre_xdr group by day,minute;Query: select day,minute,count(*) from prestat.dt_differ_users_pre_xdr group by day,minuteQuery submitted at: 2017-12-18 16:07:34 (Coordinator: http://slave01:25000)Query progress can be monitored at: http://slave01:25000/query_plan?query_id=124365f0388d9b90:2af6db6900000000+----------+--------+----------+| day | minute | count(*) |+----------+--------+----------+| 20170901 | 0000 | 3235946 |+----------+--------+----------+Fetched 1 row(s) in 0.14s[slave01:21000] >
6.如果查询没有数据,可以执行invalidate metadata xxx; refresh xxx;
[slave01:21000] > invalidate metadata prestat.dt_differ_users_pre_xdr;Query: invalidate metadata prestat.dt_differ_users_pre_xdrQuery submitted at: 2017-12-18 16:09:17 (Coordinator: http://slave01:25000)Query progress can be monitored at: http://slave01:25000/query_plan?query_id=2242885efbd4c93d:a0ea048500000000Fetched 0 row(s) in 0.42s[slave01:21000] > refresh prestat.dt_differ_users_pre_xdr;Query: refresh prestat.dt_differ_users_pre_xdrQuery submitted at: 2017-12-18 16:09:24 (Coordinator: http://slave01:25000)Query progress can be monitored at: http://slave01:25000/query_plan?query_id=f5483ec91c70dd59:fc2a589f00000000Fetched 0 row(s) in 0.67s[slave01:21000] >
阅读全文
0 0
- hadoop集群之间迁移分区表
- 不同hadoop集群之间迁移hive数据
- hadoop跨集群之间迁移hive数据
- hadoop集群数据迁移
- Hadoop新旧集群迁移
- hadoop集群数据迁移
- Hadoop集群迁移合并
- Hadoop集群磁盘数据迁移
- 集群之间数据的迁移
- hadoop集群之间数据传输
- hadoop集群间的hbase数据迁移
- Hadoop distcp 跨集群迁移数据
- hadoop集群间迁移数据DataX
- Hadoop集群间HBase数据迁移
- Hadoop 集群迁移历程 心得体会(一)
- 两个haoop集群之间迁移数据:
- 非分区表迁移到分区表
- hadoop不同版本之间的集群复制
- unix c编程:不带缓冲的文件 I/O(文件描述符)
- 算法导论代码集(一)
- jsonp实现跨域
- cpu之stall_ctr
- 安卓内存-UI流畅度
- hadoop集群之间迁移分区表
- JavaWeb项目为什么我们要放弃jsp?为什么要前后端解耦?为什么要动静分离?
- 分布式id生成器
- MySQL异常:ERROR 1045 (28000): Unknown error 1045
- Mybatis代码生成器:mybatis-generator (maven方式)
- SSM框架集成(另有增删改查案例,及数据库备份)
- [js]文件上传
- Java编程规范
- 游戏开发学习笔记(十一)装备的穿戴卸下处理