Oracle “CONNECT BY” 使用

来源:互联网 发布:linux jdk rpm 编辑:程序博客网 时间:2024/05/21 10:42

 

最基本的connect by的用法:

需求1: 我需要下钻所有的树得到level和parent_name
create table test_lvl1 (id number, parent_id number, name varchar2(10));
insert into test_lvl1 values (1,null,'SLI1');
insert into test_lvl1 values (2,1,'SLI2');
insert into test_lvl1 values (3,1,'SLI3');
insert into test_lvl1 values (4,null,'SLI4');
insert into test_lvl1 values (5,2,'SLI5');
insert into test_lvl1 values (6,3,'SLI6');
insert into test_lvl1 values (7,5,'SLI7');
insert into test_lvl1 values (8,7,'SLI8');
insert into test_lvl1 values (9,4,'SLI9');
insert into test_lvl1 values (10,3,'SLI10');
insert into test_lvl1 values (11,1,'SLI11');

kl@k01> select * from test_lvl1;

        ID  PARENT_ID NAME
---------- ---------- ----------
         1            SLI1
         2          1 SLI2
         3          1 SLI3
         4            SLI4
         5          2 SLI5
         6          3 SLI6
         7          5 SLI7
         8          7 SLI8
         9          4 SLI9
        10          3 SLI10
        11          1 SLI11

11 rows selected.

select a.*, b.name parent_name
from
(select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with parent_id is null
connect by prior id=PARENT_ID) a,
test_lvl1 b
where a.parent_id=b.id(+)

NAME               ID  PARENT_ID      LEVEL PARENT_NAM
---------- ---------- ---------- ---------- ----------
SLI11              11          1          2 SLI1
SLI3                3          1          2 SLI1
SLI2                2          1          2 SLI1
SLI5                5          2          3 SLI2
SLI10              10          3          3 SLI3
SLI6                6          3          3 SLI3
SLI9                9          4          2 SLI4
SLI7                7          5          4 SLI5
SLI8                8          7          5 SLI7
SLI4                4                     1
SLI1                1                     1

11 rows selected.

* 解释一下:
* 1. Start with表示从那一层开始的,后面跟表达式,如: pid=0,
寻找继承关系时,指定的顶点,如果需要对整个表进行整理,比较常用
的作法是:  PID is null. 例如这种情况顶点的Parent_ID显然是NULL,
所以从PID is null开始无疑是最完整的。验证如下:
* 2. prior 表示返回所以符合这种条件(如id=pid)的connect by操作结果.
即以循环的显示所有的记录和继承关系。可以试一下如果把prior去掉,那么
只有两条记录,验证如下:

select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with parent_id is null
connect by prior id=PARENT_ID;

--- 如果用Chile ID或者其他列作为开始列,会丢掉分支, 如下查询,可以自己验证
select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with id =1
connect by prior id=PARENT_ID(+);

select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with NAME='SLI1'
connect by prior id(+)=PARENT_ID;

* 另外一个很有趣的现象,对上述start with条件不同,而且结果记录数也不同的查询查看执行计划
竟是完全相同的。测试版本是Oracle 10.2.0.1
测试过程如下,节约篇幅,结果不述:
1. Set autotrace on;
2. 执行上面的query,查看返回值和执行计划;


需求2,我需要每个从根节点到当前孩子节点的路径:
create table test_parent1 (id number, name varchar2(20), pid number, parent_name varchar2(20));
insert into test_parent1 values (1,'JEFF',null,null);
insert into test_parent1 values (2,'LARRY',1,'JEFF');
insert into test_parent1 values (3,'BILL',1,'JEFF');
insert into test_parent1 values (4,'DELL',1,'JEFF');
insert into test_parent1 values (5,'MARK',3,'LARRY');
insert into test_parent1 values (6,'KART',5,'MARK');
insert into test_parent1 values (7,'ANDY',6,'KART');
insert into test_parent1 values (8,'CANDY',null,null);
insert into test_parent1 values (9,'LOUIS',8,'CANDY');
commit;

--- 得到如下表:
kl@k01> select * from test_parent1;

        ID NAME                        PID PARENT_NAME
---------- -------------------- ---------- --------------------
         1 JEFF
         2 LARRY                         1 JEFF
         3 BILL                          1 JEFF
         8 CANDY
         4 DELL                          1 JEFF
         5 MARK                          3 LARRY
         6 KART                          5 MARK
         7 ANDY                          6 KART
         9 LOUIS                         8 CANDY
        
8 rows selected.


可以看出来一个简单的层次关系,如果我希望得到每个人的直接主管,应该就足够了,
但是要得到每个人的顶级BOSS,即变成如下:
         1 JEFF
         2 LARRY                         1
         3 BILL                          1
         8 CANDY
         4 DELL                          1
         5 MARK                          1
         6 KART                          1
         7 ANDY                          1
         9 LOUIS                         8

我们还需要做如下处理,使用SYS_CONNECT_BY_PATH,这个函数只使用于9i和9i以上的版本,
她可以帮我返回一个字符串,这个字符串是一个from-root-to-node的字串。如下所示:
col PID_PATH for a20
col parent_name for a30
select ID, NAME, SYS_CONNECT_BY_PATH(pid,'/') as pid_path, SYS_CONNECT_BY_PATH(parent_name,'/') as parent_name
from test_parent1
start with pid=1 connect by prior id=pid;

        ID NAME                 PID_PATH             PARENT_NAME
---------- -------------------- -------------------- ------------------------------
         2 LARRY                /1                   /JEFF
         3 BILL                 /1                   /JEFF
         5 MARK                 /1/3                 /JEFF/LARRY
         6 KART                 /1/3/5               /JEFF/LARRY/MARK
         7 ANDY                 /1/3/5/6             /JEFF/LARRY/MARK/KART
         4 DELL                 /1                   /JEFF

6 rows selected.

需求3, 字符串分组方法(这里参考了zhouwf0726在ITPUB里面的研究结果 http://www.itpub.net/thread-614563-1-1.html):
++++++++++++ Input ++++++++++++++++++++
ID         MC     
---------- ---------
1          11111  
1          22222  
2          11111  
2          22222  
3          11111  
3          22222  
3          33333   

++++++++++++ output ++++++++++++++++++++
ID         ADD_MC           
---------- -------------------
1          11111;22222
2          11111;22222
3          11111;22222;33333
++++++++++++++++++++++++++++++++++++++++
实现方法和试验步骤如下:
drop table test purge;
create table test(id varchar2(10),mc varchar2(50));
insert into test values('1','11111');
insert into test values('1','22222');
insert into test values('2','11111');
insert into test values('2','22222');
insert into test values('3','11111');
insert into test values('3','22222');
insert into test values('3','33333');

commit;
* 先看看怎么利用connect by, 如果要利用connect by,必须考虑使用sys_connect_by_path(上面有详细介绍)
关键是如何选择connect by的条件,为什么要制定rn_id,我们需要对ID进行分组,从而把同组的MC组合起来,
而对于RN,
col MC for a10
select id, mc, row_number() over(partition by id order by id) rn_id,
               row_number() over(order by id)+id rn
from test;

ID         MC              RN_ID         RN
---------- ---------- ---------- ----------
1          11111               1          2
1          22222               2          3
2          11111               1          5
2          22222               2          6
3          11111               1          8
3          22222               2          9
3          33333               3         10

7 rows selected.

col add_mc for a50
select id,ltrim(max(sys_connect_by_path(mc,';')),';') add_mc from (
select id, mc, row_number() over(partition by id order by id) rn_id,
row_number() over(order by id)+id rn from test)
start with rn_id=1 connect by rn-1=prior rn
group by id
order by id
/

ID         ADD_MC
---------- --------------------------------------------------
1          11111;22222
2          11111;22222
3          11111;22222;33333

3 rows selected.
## 解释:  首先要熟悉分析函数 row_number() over(partition by id order by id)
表示以id进行分组,显示每条记录中id的数量,order by id是对id进行排序。
rn是顺序的,如果直接connect by一组顺序的数,将一直连续下去,所以因为一直满足rn - 1= rn,
所以要使rn为 row_number() over(order by id)+id,因为每组的id都不一样,这样才能避免连续的数字出现。





需求4,我需要在输出的时候就对结果进行格式化:

lpad和level配合决定错行缩进显示:

select level||'layer' layer,lpad(' ',level*5)||id id
from test_parent1
start with pid=1 connect by prior id=pid;

LAYER                ID
-------------------- -----------------------------------
1layer                    2
1layer                    3
2layer                         5
3layer                              6
4layer                                   7
1layer                    4

6 rows selected.


需求4, 如果需要构建一种循环处理机制,比如把一个字符的每个字母输出,不用通过
PLSQL来实现,直接用connect by即可,当着这里已经有点跑题了,权当兴趣吧:

select substr('JEFFREY',rownum, 1) Letter from dual
con
LETT
----
J
E
F
F
R
E

6 rows selected.

介绍这个例子也是为了引申出两个简单常用的技巧:
1) 构建大数据量的测试数据:
   select rownum a from dual connect by rownum<=10000;   --- 10000条顺序的纪录就轻松产生了。
   或者:  select level a from dual connect by level<=10000;
2) 结合case ... when优化query, 通过这种方法在列转行的时候,减少全表扫描的次数:
比如,我要把ID1,ID2,ID3合并起来,他们对应的CHARGE不变:
      TYPE ID1         ID2              ID3              CHARGE
---------- ------------------------------ ------------------------------ ------------------------------ ----------
         1 0000001     0000002          0000003          100
         1 0000006     0                0000008          900
         1 0000009     0                0                400
合并结果如下:
               TYPE ID                                 CHARGE
---------- ------------------------------ ----------
         1 0000001                               100
         1 0000006                               900
         1 0000009                               400
         1 0000002                               100
         1 0000003                               100
         1 0000008                               900

++++++++++++  test table ++++++++++++++++++
drop table test;
create table test
( type number,
  id1 varchar2(10) default 0 not null,
  id2 varchar2(10) default 0 not null,
  id3 varchar2(10) default 0 not null,
  charge number
);
insert into test values (1,'0000001','0000002','0000003',100);
insert into test values (1,'0000006',0,'0000008',900);
insert into test values (1,'0000009',0,0,400);

commit;
+++++++++++++++++++++++++++++++++++++++++++
常规方法可能是:
select type, id1 id, charge from test where id1<>0
union all
select type, id2 id, charge from test where id2<>0
union all
select type, id3 id, charge from test where id3<>0
;

但如果test纪录size很大,三次FTS肯定cost很高,所以我们必须考虑通过case .. when的方法减少FTS的次数:
select type,
       case a when 1 then id1
              when 2 then id2
              when 3 then id3
       end
       id,charge
from test, (select level a from dual connect by level<=3) lvl
where case a when 1 then id1
             when 2 then id2
             when 3 then id3
             end <>0;
从下面的执行计划看,只有一次FTS,而不是之前的三次。          
-------------------- Execution Plan ----------------------------------------------
----------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      |     1 |    60 |     5   (0)|
|   1 |  NESTED LOOPS                  |      |     1 |    60 |     5   (0)|
|   2 |   VIEW                         |      |     1 |    13 |     2   (0)|
|   3 |    CONNECT BY WITHOUT FILTERING|      |       |       |            |
|   4 |     FAST DUAL                  |      |     1 |       |     2   (0)|
|*  5 |   TABLE ACCESS FULL            | TEST |     1 |    47 |     3   (0)|
0 0
原创粉丝点击