Oracle “CONNECT BY” 使用

来源：互联网发布：linux jdk rpm 编辑：程序博客网时间：2024/05/21 10:42

最基本的connect by的用法:

需求1: 我需要下钻所有的树得到level和parent_name
create table test_lvl1 (id number, parent_id number, name varchar2(10));
insert into test_lvl1 values (1,null,'SLI1');
insert into test_lvl1 values (2,1,'SLI2');
insert into test_lvl1 values (3,1,'SLI3');
insert into test_lvl1 values (4,null,'SLI4');
insert into test_lvl1 values (5,2,'SLI5');
insert into test_lvl1 values (6,3,'SLI6');
insert into test_lvl1 values (7,5,'SLI7');
insert into test_lvl1 values (8,7,'SLI8');
insert into test_lvl1 values (9,4,'SLI9');
insert into test_lvl1 values (10,3,'SLI10');
insert into test_lvl1 values (11,1,'SLI11');

kl@k01> select * from test_lvl1;

        ID PARENT_ID NAME
---------- ---------- ----------
         1            SLI1
         2          1 SLI2
         3          1 SLI3
         4            SLI4
         5          2 SLI5
         6          3 SLI6
         7          5 SLI7
         8          7 SLI8
         9          4 SLI9
        10          3 SLI10
        11          1 SLI11

11 rows selected.

select a.*, b.name parent_name
from
(select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with parent_id is null
connect by prior id=PARENT_ID) a,
test_lvl1 b
where a.parent_id=b.id(+)

NAME               ID PARENT_ID      LEVEL PARENT_NAM
---------- ---------- ---------- ---------- ----------
SLI11              11          1          2 SLI1
SLI3                3          1          2 SLI1
SLI2                2          1          2 SLI1
SLI5                5          2          3 SLI2
SLI10              10          3          3 SLI3
SLI6                6          3          3 SLI3
SLI9                9          4          2 SLI4
SLI7                7          5          4 SLI5
SLI8                8          7          5 SLI7
SLI4                4                     1
SLI1                1                     1

11 rows selected.

* 解释一下:
* 1. Start with表示从那一层开始的，后面跟表达式,如: pid=0,
寻找继承关系时，指定的顶点，如果需要对整个表进行整理，比较常用
的作法是: PID is null. 例如这种情况顶点的Parent_ID显然是NULL,
所以从PID is null开始无疑是最完整的。验证如下:
* 2. prior 表示返回所以符合这种条件(如id=pid)的connect by操作结果.
即以循环的显示所有的记录和继承关系。可以试一下如果把prior去掉，那么
只有两条记录,验证如下:

select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with parent_id is null
connect by prior id=PARENT_ID;

--- 如果用Chile ID或者其他列作为开始列，会丢掉分支, 如下查询，可以自己验证
select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with id =1
connect by prior id=PARENT_ID(+);

select name, ID, PARENT_ID, LEVEL
from test_lvl1
start with NAME='SLI1'
connect by prior id(+)=PARENT_ID;

* 另外一个很有趣的现象，对上述start with条件不同，而且结果记录数也不同的查询查看执行计划
竟是完全相同的。测试版本是Oracle 10.2.0.1
测试过程如下，节约篇幅，结果不述:
1. Set autotrace on;
2. 执行上面的query,查看返回值和执行计划;

需求2，我需要每个从根节点到当前孩子节点的路径:
create table test_parent1 (id number, name varchar2(20), pid number, parent_name varchar2(20));
insert into test_parent1 values (1,'JEFF',null,null);
insert into test_parent1 values (2,'LARRY',1,'JEFF');
insert into test_parent1 values (3,'BILL',1,'JEFF');
insert into test_parent1 values (4,'DELL',1,'JEFF');
insert into test_parent1 values (5,'MARK',3,'LARRY');
insert into test_parent1 values (6,'KART',5,'MARK');
insert into test_parent1 values (7,'ANDY',6,'KART');
insert into test_parent1 values (8,'CANDY',null,null);
insert into test_parent1 values (9,'LOUIS',8,'CANDY');
commit;

--- 得到如下表:
kl@k01> select * from test_parent1;

        ID NAME                        PID PARENT_NAME
---------- -------------------- ---------- --------------------
         1 JEFF
         2 LARRY                         1 JEFF
         3 BILL                          1 JEFF
         8 CANDY
         4 DELL                          1 JEFF
         5 MARK                          3 LARRY
         6 KART                          5 MARK
         7 ANDY                          6 KART
         9 LOUIS                         8 CANDY

8 rows selected.

可以看出来一个简单的层次关系，如果我希望得到每个人的直接主管，应该就足够了，
但是要得到每个人的顶级BOSS，即变成如下:
         1 JEFF
         2 LARRY                         1
         3 BILL                          1
         8 CANDY
         4 DELL                          1
         5 MARK                          1
         6 KART                          1
         7 ANDY                          1
         9 LOUIS                         8

我们还需要做如下处理,使用SYS_CONNECT_BY_PATH,这个函数只使用于9i和9i以上的版本，
她可以帮我返回一个字符串，这个字符串是一个from-root-to-node的字串。如下所示:
col PID_PATH for a20
col parent_name for a30
select ID, NAME, SYS_CONNECT_BY_PATH(pid,'/') as pid_path, SYS_CONNECT_BY_PATH(parent_name,'/') as parent_name
from test_parent1
start with pid=1 connect by prior id=pid;

        ID NAME                 PID_PATH             PARENT_NAME
---------- -------------------- -------------------- ------------------------------
         2 LARRY                /1                   /JEFF
         3 BILL                 /1                   /JEFF
         5 MARK                 /1/3                 /JEFF/LARRY
         6 KART                 /1/3/5               /JEFF/LARRY/MARK
         7 ANDY                 /1/3/5/6             /JEFF/LARRY/MARK/KART
         4 DELL                 /1                   /JEFF

6 rows selected.

需求3, 字符串分组方法(这里参考了zhouwf0726在ITPUB里面的研究结果 http://www.itpub.net/thread-614563-1-1.html):
++++++++++++ Input ++++++++++++++++++++
ID         MC
---------- ---------
1          11111
1          22222
2          11111
2          22222
3          11111
3          22222
3          33333

++++++++++++ output ++++++++++++++++++++
ID         ADD_MC
---------- -------------------
1          11111;22222
2          11111;22222
3          11111;22222;33333
++++++++++++++++++++++++++++++++++++++++
实现方法和试验步骤如下:
drop table test purge;
create table test(id varchar2(10),mc varchar2(50));
insert into test values('1','11111');
insert into test values('1','22222');
insert into test values('2','11111');
insert into test values('2','22222');
insert into test values('3','11111');
insert into test values('3','22222');
insert into test values('3','33333');

commit;
* 先看看怎么利用connect by, 如果要利用connect by，必须考虑使用sys_connect_by_path(上面有详细介绍)
关键是如何选择connect by的条件,为什么要制定rn_id，我们需要对ID进行分组，从而把同组的MC组合起来，
而对于RN,
col MC for a10
select id, mc, row_number() over(partition by id order by id) rn_id,
               row_number() over(order by id)+id rn
from test;

ID         MC              RN_ID         RN
---------- ---------- ---------- ----------
1          11111               1          2
1          22222               2          3
2          11111               1          5
2          22222               2          6
3          11111               1          8
3          22222               2          9
3          33333               3         10

7 rows selected.

col add_mc for a50
select id,ltrim(max(sys_connect_by_path(mc,';')),';') add_mc from (
select id, mc, row_number() over(partition by id order by id) rn_id,
row_number() over(order by id)+id rn from test)
start with rn_id=1 connect by rn-1=prior rn
group by id
order by id
/

ID         ADD_MC
---------- --------------------------------------------------
1          11111;22222
2          11111;22222
3          11111;22222;33333

3 rows selected.
## 解释: 首先要熟悉分析函数 row_number() over(partition by id order by id)
表示以id进行分组，显示每条记录中id的数量，order by id是对id进行排序。
rn是顺序的，如果直接connect by一组顺序的数，将一直连续下去，所以因为一直满足rn - 1= rn,
所以要使rn为 row_number() over(order by id)+id，因为每组的id都不一样，这样才能避免连续的数字出现。

需求4，我需要在输出的时候就对结果进行格式化:

lpad和level配合决定错行缩进显示:

select level||'layer' layer,lpad(' ',level*5)||id id
from test_parent1
start with pid=1 connect by prior id=pid;

LAYER                ID
-------------------- -----------------------------------
1layer                    2
1layer                    3
2layer                         5
3layer                              6
4layer                                   7
1layer                    4

6 rows selected.

需求4, 如果需要构建一种循环处理机制，比如把一个字符的每个字母输出，不用通过
PLSQL来实现，直接用connect by即可，当着这里已经有点跑题了，权当兴趣吧:

select substr('JEFFREY',rownum, 1) Letter from dual
con
LETT
----
J
E
F
F
R
E

6 rows selected.

介绍这个例子也是为了引申出两个简单常用的技巧:
1) 构建大数据量的测试数据:
   select rownum a from dual connect by rownum<=10000;   --- 10000条顺序的纪录就轻松产生了。
   或者: select level a from dual connect by level<=10000;
2) 结合case ... when优化query, 通过这种方法在列转行的时候，减少全表扫描的次数:
比如，我要把ID1,ID2,ID3合并起来，他们对应的CHARGE不变:
      TYPE ID1         ID2              ID3              CHARGE
---------- ------------------------------ ------------------------------ ------------------------------ ----------
         1 0000001     0000002          0000003          100
         1 0000006     0                0000008          900
         1 0000009     0                0                400
合并结果如下:
               TYPE ID                                 CHARGE
---------- ------------------------------ ----------
         1 0000001                               100
         1 0000006                               900
         1 0000009                               400
         1 0000002                               100
         1 0000003                               100
         1 0000008                               900

++++++++++++ test table ++++++++++++++++++
drop table test;
create table test
( type number,
id1 varchar2(10) default 0 not null,
id2 varchar2(10) default 0 not null,
id3 varchar2(10) default 0 not null,
charge number
);
insert into test values (1,'0000001','0000002','0000003',100);
insert into test values (1,'0000006',0,'0000008',900);
insert into test values (1,'0000009',0,0,400);

commit;
+++++++++++++++++++++++++++++++++++++++++++
常规方法可能是:
select type, id1 id, charge from test where id1<>0
union all
select type, id2 id, charge from test where id2<>0
union all
select type, id3 id, charge from test where id3<>0
;

但如果test纪录size很大，三次FTS肯定cost很高，所以我们必须考虑通过case .. when的方法减少FTS的次数:
select type,
       case a when 1 then id1
              when 2 then id2
              when 3 then id3
       end
       id,charge
from test, (select level a from dual connect by level<=3) lvl
where case a when 1 then id1
             when 2 then id2
             when 3 then id3
             end <>0;
从下面的执行计划看，只有一次FTS，而不是之前的三次。
-------------------- Execution Plan ----------------------------------------------
----------------------------------------------------------------------------
| Id | Operation                      | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      |     1 |    60 |     5   (0)|
|   1 | NESTED LOOPS                  |      |     1 |    60 |     5   (0)|
|   2 |   VIEW                         |      |     1 |    13 |     2   (0)|
|   3 |    CONNECT BY WITHOUT FILTERING|      |       |       |            |
|   4 |     FAST DUAL                  |      |     1 |       |     2   (0)|
|* 5 |   TABLE ACCESS FULL            | TEST |     1 |    47 |     3   (0)|

0 0