Oracle 的数据批量读取
来源:互联网 发布:excel拆分工作簿 软件 编辑:程序博客网 时间:2024/06/05 03:01
有这么一个大表假设数亿条记录,纯数据文本百Gb以上,批量读取的话咋整?
SQLLoader,算是一个选择吧。嗯,开并行,数据表本身也许已经partition过了。可是如果自己批量unload数据应该怎么做呢?多线程并行是个好主意,但是怎么并行呢?如果你在想什么数字主键均分的想法,那你也许不会有好的结果。原因,主键的值域范围不一定是均匀分布的,这会造成你某些工作线程很忙,有些却草草结束了。深层次的原因,你也许会碰到不同线程同时争用同一块数据文件的情况,这不是也许,可以说是肯定会发生的。那还有更直接的方法吗?不幸的是,还真有!
思路就是,Oracle存放的数据文件都是有据可查的,包括文件编号,blocks个数等等。你要做的就是指定相关数据文件的起止ROWID就可以了。一般来说表的扩展都是自动均匀的,比如20000条记录一个等等。这样我们基本上可以达到均匀分配任务,而且又避免了同一个数据文件的IO争用问题。
给出两个SQL,一个是取自Quest 的 Sqoop Oracle 插件程序,另一个取自前人的经验,(你知道我在干嘛了)。
SELECT data_object_id, file_id, relative_fno, file_batch, MIN (start_block_id) start_block_id, MAX (end_block_id) end_block_id, SUM (blocks) blocks FROM (SELECT o.data_object_id, e.file_id, e.relative_fno, e.block_id start_block_id, e.block_id + e.blocks - 1 end_block_id, e.blocks, CEIL ( SUM ( e.blocks) OVER (PARTITION BY o.data_object_id, e.file_id ORDER BY e.block_id ASC) / (SUM (e.blocks) OVER (PARTITION BY o.data_object_id, e.file_id) / :numchunks)) file_batch FROM dba_extents e, dba_objects o, dba_tab_subpartitions tsp WHERE o.owner = :owner AND o.object_name = :object_name AND e.owner = :owner AND e.segment_name = :object_name AND o.owner = e.owner AND o.object_name = e.segment_name AND (o.subobject_name = e.partition_name OR (o.subobject_name IS NULL AND e.partition_name IS NULL)) AND o.owner = tsp.table_owner(+) AND o.object_name = tsp.table_name(+) AND o.subobject_name = tsp.subpartition_name(+) ) GROUP BY data_object_id, file_id, relative_fno, file_batch ORDER BY data_object_id, file_id, relative_fno, file_batch ;
select grp, dbms_rowid.rowid_create( 1, data_object_id, lo_fno, lo_block, 0 ) min_rid, dbms_rowid.rowid_create( 1, data_object_id, hi_fno, hi_block, 10000 ) max_rid from (select distinct grp, first_value(relative_fno) over (partition by grp order by relative_fno, block_id rows between unbounded preceding and unbounded following) lo_fno, first_value(block_id ) over (partition by grp order by relative_fno, block_id rows between unbounded preceding and unbounded following) lo_block, last_value(relative_fno) over (partition by grp order by relative_fno, block_id rows between unbounded preceding and unbounded following) hi_fno, last_value(block_id+blocks-1) over (partition by grp order by relative_fno, block_id rows between unbounded preceding and unbounded following) hi_block, sum(blocks) over (partition by grp) sum_blocks from (select relative_fno, block_id, blocks, trunc( (sum(blocks) over (order by relative_fno, block_id)-0.01) / (sum(blocks) over ()/:numchunks) ) grp from dba_extents where segment_name = upper(:object_name) and owner = :owner order by block_id ) ), (select data_object_id from dba_objects where owner=:owner and object_name = upper(:object_name) ) order by grp;
http://agstamy.blogspot.sg/2011/11/spliting-very-large-table-to-pieces.html
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:29986713177238
http://www.quest.com/data-connector-for-oracle-and-hadoop/
https://github.com/QuestSoftwareTCD/OracleSQOOPconnector
- Oracle 的数据批量读取
- Oracle数据的批量更新
- Oracle 批量数据的导入
- Oracle 中使用 fetch bulk collect into 批量效率的读取游标数据
- Oracle 中使用 fetch bulk collect into 批量效率的读取游标数据【转】
- Oracle 中使用 fetch bulk collect into 批量效率的读取游标数据
- Oracle 中使用 fetch bulk collect into 批量效率的读取游标数据(转载)
- Oracle 中使用 fetch bulk collect into 批量效率的读取游标数据
- Oracle 中使用 fetch bulk collect into 批量效率的读取游标数据
- Oracle 中使用 fetch bulk collect into 批量效率的读取游标数据
- Oracle 中使用 fetch bulk collect into 批量效率的读取游标数据
- Oracle 中使用 fetch bulk collect into 批量效率的读取游标数据
- Oracle 中使用 fetch bulk collect into 批量效率的读取游标数据
- Oracle 中使用fetch bulk collect into 批量效率的读取游标数据
- Oracle 中使用fetch bulk collect into 批量效率的读取游标数据【mark还没看】
- 读取Excel数据、批量导入到Oracle数据库
- 批量导入oracle数据的bat文件
- oracle 批量数据修改的监控
- 树形DP总结
- 微信根据openId查询微信用户信息,这个功能需要认证服务号
- 为什么世界需要OpenStreetMap
- Ubuntu12.04下安装mysql5.5(系统包安装)
- Linux驱动模块编译进内核中
- Oracle 的数据批量读取
- 如何成为一个偷懒又高效的Android开发人员
- Active Diretory 全攻略(一)--目录服务
- Oracle_day1(小知识点)
- 常用PWM控制函数
- 针对Android系统的骁龙 SDK开发工具包
- 简单三步,实现LINUX 下虚拟光驱
- adsf
- js中得到rich:calendar的值