聚簇索引对数据插入的影响

来源:互联网 发布:精灵虚拟光驱激活mac 编辑:程序博客网 时间:2024/06/07 11:44

背景

开发人员反馈系统执行某存储过程特别慢,经排查是由于存储过程执行过程中需要向新建的任务表插入大量数据,该任务表的主键是聚簇索引造成的。聚簇索引为什么会导致插入慢呢?聚簇索引会对数据插入造成多大影响呢?

原理

  • 在非聚簇索引中,物理数据的存储顺序与索引不同,索引的最低级别包含指向数据页上的行的指针。

这里写图片描述

  • 在聚簇索引中,物理数据的存储顺序与索引相同,索引的最低级别包含实际的数据页。
    聚簇索引导致数据记录必须按照键大小顺序存储,插入和删除须进行移动数据记录,导致额外的磁盘IO。

这里写图片描述

测试

一、基本环境信息

  • 查看操作系统版本
[root@npfydev01 home]# lsb_release -aLSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarchDistributor ID: CentOSDescription:    CentOS release 6.4 (Final)Release:    6.4Codename:   Final
  • 查看磁盘信息
[root@npfydev01 home]# cat /proc/scsi/scsiAttached devices:Host: scsi0 Channel: 02 Id: 00 Lun: 00  Vendor: IBM      Model: ServeRAID M5110  Rev: 3.24  Type:   Direct-Access                    ANSI  SCSI revision: 05Host: scsi1 Channel: 00 Id: 00 Lun: 00  Vendor: IBM SATA Model: DEVICE 81Y3674   Rev: IB01  Type:   CD-ROM                           ANSI  SCSI revision: 05
  • 查看磁盘读写速度
[root@npfydev01 home]# time dd if=/dev/zero of=/home/4kb.1GBFILE bs=4k count=262144 262144+0 records in262144+0 records out1073741824 bytes (1.1 GB) copied, 1.58541 s, 677 MB/sreal    0m1.589suser    0m0.050ssys 0m1.533s
  • 查看数据库版本
1> select @@version2> go --------------------------------------------------------------------------------------  Adaptive Server Enterprise/15.7/EBF 21708 SMP SP110 /P/x86_64/Enterprise Linux/ase157sp11x/3546/64-bit/FBO/Fri Nov  8 05:39:38 2013                                                                                                                             (1 row affected)

二、数据准备

  • 建立聚簇索引表(sybase主键默认为聚簇索引)
USE DB_TASKGOCREATE TABLE T_TASKITEM_CI (    C_BH char(32) primary key,    C_BH_TASK char(32) null,    C_BH_AJ varchar(32) null,    N_AJBS numeric(15,0) null,    C_AJLB varchar(6) null,    N_JBFY int null,    N_ZT int null,    C_AH varchar(75) null) go
  • 建立非聚簇索引表
USE DB_TASKgoCREATE TABLE T_TASKITEM_NCI (    C_BH char(32) NOT NULL,    C_BH_TASK char(32) null,    C_BH_AJ varchar(32) null,    N_AJBS numeric(15,0) null,    C_AJLB varchar(6) null,    N_JBFY int null,    N_ZT int null,    C_AH varchar(75) null) goCREATE UNIQUE INDEX PK_TASKITEM ON DB_TASK.dbo.T_TASKITEM_NCI (C_BH) go
  • 构造数据
    构造一张同构的数据表T_TASKITEM_CC,使用如下SQL向该张表构造50W左右数据。
SELECT newid ()    , a.C_BH    , 1 AS N_ZT    , a.N_AJBS    , a.N_JBFY    , '5813b6d7ce8847d68b34daa956776659' AS C_BH_TASK    , (CASE WHEN (a.N_YWLX = 20100) THEN '0201' WHEN (a.N_YWLX = 20200) THEN '0202' WHEN (a.N_YWLX = 20304) THEN '0207' WHEN (a.N_YWLX = 20501) THEN '0210' WHEN (a.N_YWLX = 20801) THEN '0224' WHEN (a.N_YWLX = 20601) THEN '0214' WHEN (a.N_YWLX = 20603) THEN '0216' WHEN (a.N_YWLX = 20602) THEN '0215' END) AS C_AJLB    , a.C_AHFROM YWST..T_XS_AJ a

数据量为 501132

三、插入对比

  • 非聚簇索引表
1> insert into T_TASKITEM_NCI SELECT newid(),C_BH_TASK,C_BH_AJ,N_AJBS,C_AJLB,N_JBFY,N_ZT,C_AH FROM T_TASKITEM_CC2> GOParse and Compile Time 0.Adaptive Server cpu time: 0 ms.Parse and Compile Time 0.Adaptive Server cpu time: 0 ms.Table: T_TASKITEM_NCI scan count 0, logical reads: (regular=2025588 apf=0 total=2025588), physical reads: (regular=0 apf=0 total=0), apf IOs used=0Table: T_TASKITEM_CC scan count 1, logical reads: (regular=10957 apf=27 total=10984), physical reads: (regular=0 apf=0 total=0), apf IOs used=0Total writes for this command: 3538Execution Time 97.Adaptive Server cpu time: 9688 ms.  Adaptive Server elapsed time: 13381 ms.(501132 rows affected)
  • 聚簇索引表
1> insert into T_TASKITEM_CI SELECT newid(),C_BH_TASK,C_BH_AJ,N_AJBS,C_AJLB,N_JBFY,N_ZT,C_AH FROM T_TASKITEM_CC2> GOParse and Compile Time 0.Adaptive Server cpu time: 0 ms.Parse and Compile Time 0.Adaptive Server cpu time: 0 ms.Table: T_TASKITEM_CI scan count 0, logical reads: (regular=6422447 apf=0 total=6422447), physical reads: (regular=0 apf=0 total=0), apf IOs used=0Table: T_TASKITEM_CC scan count 1, logical reads: (regular=10957 apf=27 total=10984), physical reads: (regular=0 apf=0 total=0), apf IOs used=0Total writes for this command: 11945Execution Time 176.Adaptive Server cpu time: 17350 ms.  Adaptive Server elapsed time: 28206 ms.(501132 rows affected)
类别 聚簇索引 非聚簇索引 写入 11945 3538 读入 6422447 2025588 执行时间 28206 ms 13381 ms

结论:插入同样的数据量,非聚簇索引表比聚簇索引表时间上快一倍,IO减小2/3。

四、删除对比

  • 构造删除数据

按照索引字段C_BH排序,获取物理位置为于100行、200行…5000行的C_BH,将要删除的编号分别存储在T_DELETE_CI_BH 和 T_DELETE_NCI_BH表中。

select C_BH,N_ORDER = identity(10) INTO T_ALL_CI_BH FROM T_TASKITEM_CI ORDER BY C_BH ascSELECT C_BH,N_ORDER INTO T_DELETE_CI_BH FROM T_ALL_CI_BH WHERE N_ORDER%100 = 0select C_BH,N_ORDER = identity(10) INTO T_ALL_NCI_BH FROM T_TASKITEM_NCI ORDER BY C_BH ascSELECT C_BH,N_ORDER INTO T_DELETE_NCI_BH FROM T_ALL_NCI_BH WHERE N_ORDER%100 = 0
  • 聚簇索引表执行删除
1> DELETE FROM T_TASKITEM_CI where C_BH IN (SELECT C_BH FROM T_DELETE_CI_BH)2> goParse and Compile Time 0.Adaptive Server cpu time: 0 ms.Parse and Compile Time 0.Adaptive Server cpu time: 0 ms.Parse and Compile Time 0.Adaptive Server cpu time: 0 ms.Table: T_TASKITEM_CI scan count 0, logical reads: (regular=20004 apf=0 total=20004), physical reads: (regular=0 apf=0 total=0), apf IOs used=0Table: T_DELETE_CI_BH scan count 1, logical reads: (regular=31 apf=0 total=31), physical reads: (regular=0 apf=0 total=0), apf IOs used=0Table: T_TASKITEM_CI scan count 5001, logical reads: (regular=15070 apf=0 total=15070), physical reads: (regular=0 apf=0 total=0), apf IOs used=0Total writes for this command: 241Execution Time 1.Adaptive Server cpu time: 128 ms.  Adaptive Server elapsed time: 379 ms.(5001 rows affected)
  • 非聚簇索引表执行删除
1> DELETE FROM T_TASKITEM_NCI where C_BH IN (SELECT C_BH FROM T_DELETE_NCI_BH)2> goParse and Compile Time 0.Adaptive Server cpu time: 0 ms.Parse and Compile Time 0.Adaptive Server cpu time: 0 ms.Parse and Compile Time 0.Adaptive Server cpu time: 0 ms.Table: T_TASKITEM_NCI scan count 0, logical reads: (regular=20004 apf=0 total=20004), physical reads: (regular=0 apf=0 total=0), apf IOs used=0Table: T_DELETE_NCI_BH scan count 1, logical reads: (regular=31 apf=0 total=31), physical reads: (regular=0 apf=0 total=0), apf IOs used=0Table: T_TASKITEM_NCI scan count 5001, logical reads: (regular=15070 apf=0 total=15070), physical reads: (regular=0 apf=0 total=0), apf IOs used=0Total writes for this command: 242Execution Time 1.Adaptive Server cpu time: 128 ms.  Adaptive Server elapsed time: 403 ms.(5001 rows affected)

结论:按照索引字段删除,聚簇索引和非聚簇索引IO和效率一样。

排查聚簇索引

聚簇索引表插入无序主键(GUID/UUID)数据时会造成额外的磁盘IO和时间消耗,采用无序主键(GUID/UUID)的项目设计上是禁止使用聚簇索引,那么如何排查项目中非法使用聚簇索引的表呢?使用dba团队出品的sp_dba_citable存储过程检索
核心代码:

use sybsystemprocsGOif object_id('sp_dba_citable') is not null     drop procedure sp_dba_citableGOcreate procedure sp_dba_citableAS--查看聚簇索引表--add by wangzhen 2017-07-17begin     declare @temp_sql varchar(500)    declare @sql varchar(1000)    declare @dbname varchar(100)    declare dbname_cursor cursor for select name from master..sysdatabases    create table #objectinfo (    dbname varchar(100),    objid int,    tablename varchar(300),    indexid int,    indexname varchar(300),    keycnt int,    indextype varchar(100)    )    set @temp_sql = 'insert into #objectinfo '                         + 'select ''@dbname#'' , '                         + 'obj.id , '                         + 'obj.name , '                         + 'ind.indid , '                         + 'ind.name , '                         + 'ind.keycnt , '                         + '''culster index''  '                         +' from @dbname#..sysindexes ind left join  @dbname#..sysobjects obj on ind.id = obj.id '                         +' where (ind.status2 & 512 = 512 or ind.indid = 1) and obj.type = ''U'' '         open dbname_cursor    while @@sqlstatus =0     BEGIN        FETCH  dbname_cursor into @dbname        set @sql =  str_replace(@temp_sql,'@dbname#',@dbname)         EXECUTE(@sql)    END     close dbname_cursor    select         t.dbname as "库名",        t.objid as "对象ID",        t.tablename as "表名",         t.indexname as "索引名"     from #objectinfo t  where t.dbname not in ('master','tempdb','sybsecurity','sybsystemdb','sybsystemprocs') group by t.dbname,t.objid,t.tablename,t.indexname,t.keycnt,t.indextype  order by t.dbname asc,t.tablename ascend go  

总结

在聚簇索引中,物理数据的存储顺序与索引相同,索引的最低级别包含实际的数据页,在无序字段上(GUID/UUID)上使用聚簇索引插入大量数据会比非聚簇索引时间慢一倍,IO高三倍。其实,NP在设计之初已经规定业务表中不能定义物理主键(含聚簇索引),而应该定义逻辑主键(唯一约束+索引+不为空)。对于采用无序主键(GUID/UUID)的项目可以使用sp_dba_citable排查聚簇索引表!

原创粉丝点击