数据仓库的隐患-标量子查询
来源:互联网 发布:2016电话数据资源购买 编辑:程序博客网 时间:2024/05/23 17:28
首先,来理解一下标量子查询:处于select之后from之前的子查询称为标量子查询 .比如:select num1,cal,(select name from t2 where t2.id = t1.id)from t1
标量子查询的缺点十分明显:驱动表固定是外表t1, t1返回的结果集传值t2得到结果。所以如果t1表过大。将会引起很大的性能问题【数据仓库跑批流程应该禁用标量子查询】
今天这条SQL运行了5个小时没出结果。。。。。。。(不得不说耐心真好,我一般最多等十分钟)
SELECT /*+ NO_USE_HASH(C,B)*/ C.CUST_ACCT_NO, C.PRIM_ACCT, ACCOUNT_SYSTEM, CUSTOMER_TYPE, CUSTOMER_STATUS, CREATE_DT, HOME_BRANCH_NO, COMPANY_SIZE, NOTICE_IND, NOTICE_CUST_NO, STMT_FREQUENCY, STMT_CYCLE, STMT_DAY, ID_NO, ID_TYPE, SHORT_NAME, EMAIL_ADD1, EMAIL_ADD2, CREDIT_RANKING, TITLE_CODE, NAME1, ADD1, POSTCODE, PHONE_NO_RES, PHONE_RES_EXT, PHONE_NO_BUS, PHONE_BUS_EXT, FAX_NO, TELEX_NO, PCODE_RGSTER, REGSTR_ADD1, REGSTR_ADD2, PHONE_RGSTR_NO, PHONE_RGSTR_EXT, BIRTH_DATE_1, SEX_CODE, EMPLOYER_NAME, EMPLOYED_FROM, EMPLOYER_ADDR, OCCUP_DESCRIP, OCCUPATION_CODE, INCOME, INCOME_WMY, COMPANY_NO, BUSINESS_NO, LICENCE_NO, BOSS_NAME, BOSS_BDAY, BUS_RGSTR_DATE, CAPITAL_AMT, CONTACT_REL_1, PHONE_NO_1, ADD2, ADD3, ADD4, MOBILE_NO, FXSP_TYPE, INDUSTRY_CODE, BUS_SECTOR_CODE, CUST_SUB_TYPE, DEP_STMT_TYPE, ID_ISSUE_DATE, ID_EXP_DATE, REGISTRY_ADD, ID_ISSUE_PLAC, LST_MNT_DATE, B.BRANCH_NO FROM CUSM_T C INNER JOIN (SELECT DISTINCT CUSTOMER_NO, (SELECT SJJGM FROM JGDY H WHERE H.JGM = CB_ACCT.BRANCH_NO) BRANCH_NO FROM CB_ACCT) B ON C.CUST_ACCT_NO = B.CUSTOMER_NO;SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 2079508004 ---------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |---------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 18M| 14G| | 1793K (1)| 00:04:41 ||* 1 | INDEX SKIP SCAN | JGDY_IDX3 | 1 | 8 | | 1 (0)| 00:00:01 || 2 | MERGE JOIN | | 18M| 14G| | 1793K (1)| 00:04:41 || 3 | SORT JOIN | | 18M| 397M| | 147K (1)| 00:00:24 || 4 | VIEW | | 18M| 397M| | 147K (1)| 00:00:24 || 5 | HASH UNIQUE | | 18M| 380M| 1107M| 147K (1)| 00:00:24 || 6 | TABLE ACCESS STORAGE FULL| CB_ACCT2 | 36M| 760M| | 71431 (1)| 00:00:12 ||* 7 | SORT JOIN | | 19M| 14G| 35G| 1645K (1)| 00:04:18 || 8 | TABLE ACCESS STORAGE FULL | CUSM_T | 19M| 14G| | 306K (1)| 00:00:48 |--------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id):--------------------------------------------------- 1 - access("H"."JGM"=:B1) filter("H"."JGM"=:B1) 7 - access("C"."CUST_ACCT_NO"="B"."CUSTOMER_NO") filter("C"."CUST_ACCT_NO"="B"."CUSTOMER_NO")通过SQL和PLAN都可以很容易的找出标量子查询
1.select之后from之前,当然这个SQL的标量隐藏在了内联视图里面了
(SELECT DISTINCT CUSTOMER_NO, (SELECT SJJGM FROM JGDY H WHERE H.JGM = CB_ACCT.BRANCH_NO) BRANCH_NO FROM CB_ACCT) B
2.PLAN的id=1和2那两步,缩进一样,而且没有连接方式的父亲节点
|* 1 | INDEX SKIP SCAN | JGDY_IDX3 | 1 | 8 | | 1 (0)| 00:00:01 || 2 | MERGE JOIN | | 18M| 14G| | 1793K (1)| 00:04:41 |
通过这两点均可以判断SQL里面包含标量.如果SQL特别长就直接看PLAN就行了
标量是否产生性能问题,注意取决于主表(外表)返回的行数.其实我们都知道这种数据仓库跑批的表不可能小。象征性的查一下
我之前的博客里面发过这个脚本 http://blog.csdn.net/skybig1988/article/details/71125223 也可以自己定制,很简单
可以看出表的行数很大,不适合走标量(>10000行)
对于标量子查询,只能通过改写【标量子查询可以等价改写为外连接】
当然此处的标量改写十分简单.有些复杂的比如 聚合类、不等值、树形查询的标量千万需要注意改写前后是否等价
SELECT /*+ NO_USE_HASH(C,B)*/ C.CUST_ACCT_NO, C.PRIM_ACCT, ACCOUNT_SYSTEM, CUSTOMER_TYPE, CUSTOMER_STATUS, CREATE_DT, HOME_BRANCH_NO, COMPANY_SIZE, NOTICE_IND, NOTICE_CUST_NO, STMT_FREQUENCY, STMT_CYCLE, STMT_DAY, ID_NO, ID_TYPE, SHORT_NAME, EMAIL_ADD1, EMAIL_ADD2, CREDIT_RANKING, TITLE_CODE, NAME1, ADD1, POSTCODE, PHONE_NO_RES, PHONE_RES_EXT, PHONE_NO_BUS, PHONE_BUS_EXT, FAX_NO, TELEX_NO, PCODE_RGSTER, REGSTR_ADD1, REGSTR_ADD2, PHONE_RGSTR_NO, PHONE_RGSTR_EXT, BIRTH_DATE_1, SEX_CODE, EMPLOYER_NAME, EMPLOYED_FROM, EMPLOYER_ADDR, OCCUP_DESCRIP, OCCUPATION_CODE, INCOME, INCOME_WMY, COMPANY_NO, BUSINESS_NO, LICENCE_NO, BOSS_NAME, BOSS_BDAY, BUS_RGSTR_DATE, CAPITAL_AMT, CONTACT_REL_1, PHONE_NO_1, ADD2, ADD3, ADD4, MOBILE_NO, FXSP_TYPE, INDUSTRY_CODE, BUS_SECTOR_CODE, CUST_SUB_TYPE, DEP_STMT_TYPE, ID_ISSUE_DATE, ID_EXP_DATE, REGISTRY_ADD, ID_ISSUE_PLAC, LST_MNT_DATE, B.BRANCH_NO FROM CUSM_T C INNER JOIN (SELECT DISTINCT CUSTOMER_NO, sjjgm BRANCH_NO FROM CB_ACCT LEFT JOIN jgdy ON cb_acct.branch_no=jgm ) B ON C.CUST_ACCT_NO = B.CUSTOMER_NO;Plan hash value: 2285049241 ----------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |----------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 36M| 28G| | 1834K (1)| 00:04:47 || 1 | MERGE JOIN | | 36M| 28G| | 1834K (1)| 00:04:47 || 2 | SORT JOIN | | 36M| 829M| | 188K (1)| 00:00:30 || 3 | VIEW | | 36M| 829M| | 188K (1)| 00:00:30 || 4 | HASH UNIQUE | | 36M| 1037M| 1384M| 188K (1)| 00:00:30 ||* 5 | HASH JOIN RIGHT OUTER | | 36M| 1037M| | 71501 (1)| 00:00:12 || 6 | INDEX FULL SCAN | JGDY_IDX3 | 1241 | 9928 | | 1 (0)| 00:00:01 || 7 | TABLE ACCESS STORAGE FULL| CB_ACCT2 | 36M| 760M| | 71431 (1)| 00:00:12 ||* 8 | SORT JOIN | | 19M| 14G| 35G| 1645K (1)| 00:04:18 || 9 | TABLE ACCESS STORAGE FULL | CUSM_T | 19M| 14G| | 306K (1)| 00:00:48 |---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id):--------------------------------------------------- 5 - access("CB_ACCT"."BRANCH_NO"="JGM"(+)) 8 - access("C"."CUST_ACCT_NO"="B"."CUSTOMER_NO") filter("C"."CUST_ACCT_NO"="B"."CUSTOMER_NO")改写之后标量消失.SQL运行了7分钟出结果。但是这个SQL里面没有不等值连接,走MERGE JOIN显然毫无意义。明显走HASH是最好的选择
一直不理解SQL上面的/*+ NO_USE_HASH(C,B)*/ 的意义,最后开发回应说这个 HINT是为了让SQL走嵌套循环,因为走NL比较快。听到这个理由我也是呵呵了!
这里我简单的说一下NL、HASH、SMJ在实际工作中该如何选择:
嵌套循环:
看SQL语句的返回条数 太大的话一般都是错误的
看驱动表返回的行数 一般不能超过1w 最好在1k 以内
看被驱动表的链接列 是否包含在索引里面 (必须包含在索引里面)
看到distinct ,group by ,sum()一般不走嵌套循环(数据量超级多才去group by)当然数据量少的话也可以走NL
哈希连接只能用于等值连接
排序合并连接唯一的作用:非等值连接
去掉/*+ NO_USE_HASH(C,B)*/ 之后.SQL运行了30秒便出结果
SELECT C.CUST_ACCT_NO, C.PRIM_ACCT, ACCOUNT_SYSTEM, CUSTOMER_TYPE, CUSTOMER_STATUS, CREATE_DT, HOME_BRANCH_NO, COMPANY_SIZE, NOTICE_IND, NOTICE_CUST_NO, STMT_FREQUENCY, STMT_CYCLE, STMT_DAY, ID_NO, ID_TYPE, SHORT_NAME, EMAIL_ADD1, EMAIL_ADD2, CREDIT_RANKING, TITLE_CODE, NAME1, ADD1, POSTCODE, PHONE_NO_RES, PHONE_RES_EXT, PHONE_NO_BUS, PHONE_BUS_EXT, FAX_NO, TELEX_NO, PCODE_RGSTER, REGSTR_ADD1, REGSTR_ADD2, PHONE_RGSTR_NO, PHONE_RGSTR_EXT, BIRTH_DATE_1, SEX_CODE, EMPLOYER_NAME, EMPLOYED_FROM, EMPLOYER_ADDR, OCCUP_DESCRIP, OCCUPATION_CODE, INCOME, INCOME_WMY, COMPANY_NO, BUSINESS_NO, LICENCE_NO, BOSS_NAME, BOSS_BDAY, BUS_RGSTR_DATE, CAPITAL_AMT, CONTACT_REL_1, PHONE_NO_1, ADD2, ADD3, ADD4, MOBILE_NO, FXSP_TYPE, INDUSTRY_CODE, BUS_SECTOR_CODE, CUST_SUB_TYPE, DEP_STMT_TYPE, ID_ISSUE_DATE, ID_EXP_DATE, REGISTRY_ADD, ID_ISSUE_PLAC, LST_MNT_DATE, B.BRANCH_NO FROM CUSM_T C INNER JOIN (SELECT DISTINCT CUSTOMER_NO, sjjgm BRANCH_NO FROM CB_ACCT LEFT JOIN jgdy ON cb_acct.branch_no=jgm ) B ON C.CUST_ACCT_NO = B.CUSTOMER_NOPlan hash value: 967350049 ---------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |---------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 36M| 28G| | 1059K (1)| 00:02:46 ||* 1 | HASH JOIN | | 36M| 28G| 1244M| 1059K (1)| 00:02:46 || 2 | VIEW | | 36M| 829M| | 188K (1)| 00:00:30 || 3 | HASH UNIQUE | | 36M| 1037M| 1384M| 188K (1)| 00:00:30 ||* 4 | HASH JOIN RIGHT OUTER | | 36M| 1037M| | 71501 (1)| 00:00:12 || 5 | INDEX FULL SCAN | JGDY_IDX3 | 1241 | 9928 | | 1 (0)| 00:00:01 || 6 | TABLE ACCESS STORAGE FULL| CB_ACCT2 | 36M| 760M| | 71431 (1)| 00:00:12 || 7 | TABLE ACCESS STORAGE FULL | CUSM_T | 19M| 14G| | 306K (1)| 00:00:48 |--------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id):--------------------------------------------------- 1 - access("C"."CUST_ACCT_NO"="B"."CUSTOMER_NO") 4 - access("CB_ACCT"."BRANCH_NO"="JGM"(+))其实这个SQL还可以继续优化,ID=5这一步INDEX FULL SCAN是单块读改成全表扫描可以提升100+倍,加上一体机本身的全表扫描优化TABLE ACCESS STORAGE FULL。提升会更多!!!
如上可知标量子查询是一个非常恐怖的用法。当外部表返回的数据量不大时。完全不会引起性能问题。但是此时隐患已经埋下
随着外部表数据量的增加。标量的性能会慢慢受到影响,一旦过了这个临界值。性能下降的非常明显和可怕。所以在数据仓库
中应该用外连接代替标量,避免给程序埋下隐患。
- 数据仓库的隐患-标量子查询
- 经典的标量子查询
- SQL优化-标量子查询的改写
- 标量子查询
- 标量子查询
- 标量子查询
- 标量子查询
- sql 标量子查询
- 标量子查询
- oracle标量子查询
- 标量子查询改写
- 优化mysql标量子查询
- 标量子查询SQL改写
- [Oracle]高效的PL/SQL程序设计(二)--标量子查询
- [Oracle]高效的PL/SQL程序设计(二)--标量子查询
- [Oracle]高效的PL/SQL程序设计(二)--标量子查询
- 彻底搞懂oracle的标量子查询
- 内联视图、标量子查询、WITH子查询分解的示例
- C++ 类 静态成员变量和静态成员函数
- 2. Add Two Numbers LeetCode题解
- swagger2集成springMvc生成在线API
- 题目1024:畅通工程 九度OJ
- react+redux+router+webpack+immutable.js框架
- 数据仓库的隐患-标量子查询
- Struts与Hibernate整合完成一个小案例
- AngularJS和jQuery的区别分析
- TPS不稳定,队列产生大量的TIME_WAIT?
- runtime原理和使用场景
- 分布式系统唯一ID生成方案汇总
- 【自留地】linux使用记录
- CodeForces 733 E.Sleep in Class(队列)
- java 最新面试题 (腾讯 百度 阿里巴巴 华为)精选