In & Exists & Join 分析测试

来源:互联网 发布:mysql statistics 编辑:程序博客网 时间:2024/05/16 14:21
测试In & Exists & Join 的性能
 
Step 1. 测试环境搭建
Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64)
Enterprise Edition (64-bit) on Windows NT 6.1
 
   1: -- IN & EXISTS & INNER JOIN 
   2: CREATE TABLE BIG_TB
   3: (
   4: ID        INT IDENTITY PRIMARY KEY,
   5: Col       CHAR(4) NOT NULL
   6: )
   7:  
   8: CREATE TABLE SMALL_TB
   9: (
  10: ID        INT IDENTITY PRIMARY KEY,
  11: Col       CHAR(4) NOT NULL,
  12: IDate     DATE DEFAULT(GETDATE())
  13: )
  14:  
  15:  
  16: INSERT INTO BIG_TB (Col)
  17: SELECT top 250000
  18: char(65+FLOOR(RAND(a.column_id *1111 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *2222 + b.object_id)*12)) +
  19: char(65+FLOOR(RAND(b.column_id *3333 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *4444 + b.object_id)*8))
  20: from master.sys.columns a cross join master.sys.columns b;
  21: GO
  22: --250000
  23:  
  24:  
  25: INSERT INTO SMALL_TB (col)
  26: SELECT DISTINCT col
  27: FROM BIG_TB TABLESAMPLE (20 PERCENT);
  28: GO
  29: --2807

Step 2.测试代码

   1: --===================== No Index =====================
   2: SELECT ID, col FROM BIG_TB
   3: WHERE col IN (SELECT col FROM SMALL_TB)
   4:  
   5: SELECT ID, col FROM BIG_TB
   6: WHERE EXISTS (SELECT col FROM SMALL_TB WHERE SMALL_TB.col = BIG_TB.col)
   7:  
   8: SELECT A.ID ,A.col FROM BIG_TB A
   9: INNER JOIN SMALL_TB  B ON A.col = B.col
  10:  
  11:  
  12: --Not In & Not Exists
  13: SELECT ID, col FROM BIG_TB
  14: WHERE col NOT IN (SELECT col FROM SMALL_TB)
  15:  
  16: SELECT ID, col FROM BIG_TB
  17: WHERE NOT EXISTS (SELECT col FROM SMALL_TB WHERE SMALL_TB.col = BIG_TB.col)
  18:  
  19: SELECT A.ID,A.col FROM BIG_TB A
  20: LEFT OUTER JOIN SMALL_TB  B ON A.col = B.col
  21: WHERE B.col IS NULL
  22:  
  23:  
  24: --===================== Index =====================
  25: CREATE INDEX idx_BIG_TB_col
  26: ON BIG_TB (col)
  27:  
  28: CREATE INDEX idx_SMALL_TB_col
  29: ON SMALL_TB (col)
  30:  
  31:  
  32: SELECT ID, col FROM BIG_TB
  33: WHERE col IN (SELECT col FROM SMALL_TB)
  34:  
  35: SELECT ID, col FROM BIG_TB
  36: WHERE EXISTS (SELECT col FROM SMALL_TB WHERE SMALL_TB.col = BIG_TB.col)
  37:  
  38: SELECT A.ID ,A.col FROM BIG_TB A
  39: INNER JOIN SMALL_TB  B ON A.col = B.col
  40:  
  41:  
  42: --Not In & Not Exists
  43: SELECT ID, col FROM BIG_TB
  44: WHERE col NOT IN (SELECT col FROM SMALL_TB)
  45:  
  46: SELECT ID, col FROM BIG_TB
  47: WHERE NOT EXISTS (SELECT col FROM SMALL_TB WHERE SMALL_TB.col = BIG_TB.col)
  48:  
  49: SELECT A.ID,A.col FROM BIG_TB A
  50: LEFT OUTER JOIN SMALL_TB  B ON A.col = B.col
  51: WHERE B.col IS NULL

Step 3.测试数据汇总

AnalyseList


总结:
     在有无Index的前提下,In/Exists及Not In/Not Exists的性能表现相差无几。
而Inner Join/Left Outer Join则相较而言,性能表现上稍微较慢。但注意到Col列
是Not Null属性的。当该列为Nullable状态时,Not Exists与Not In的处理方式将
有所不同,性能表现见下表:

   1: ALTER TABLE BIG_TB
   2:  ALTER COLUMN col char(4)  NULL
   3:  
   4: ALTER TABLE SMALL_TB
   5:  ALTER COLUMN col char(4)  NULL
   6:  
   7:  
   8:  
   9: SELECT ID, col FROM BIG_TB
  10: WHERE col NOT IN (SELECT col FROM SMALL_TB)
  11:  
  12: /*
  13: 表 'SMALL_TB'。扫描计数 13,逻辑读取 500058 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
  14: 表 'BIG_TB'。扫描计数 5,逻辑读取 3992 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
  15: 
  16:  SQL Server 执行时间:
  17:    CPU 时间 = 1095 毫秒,占用时间 = 306 毫秒。
  18: */
  19:  
  20:  
  21: SELECT ID, col FROM BIG_TB
  22: WHERE NOT EXISTS (SELECT col FROM SMALL_TB WHERE SMALL_TB.col = BIG_TB.col)
  23: /*
  24: 表 'BIG_TB'。扫描计数 1,逻辑读取 3639 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
  25: 表 'SMALL_TB'。扫描计数 1,逻辑读取 9 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
  26: 
  27: 
  28:  SQL Server 执行时间:
  29:    CPU 时间 = 265 毫秒,占用时间 = 263 毫秒。
  30: */


又上述数据可以看出。在处理含有null的数据时,Not exists 的性能将比Not In的性能有很大提升。提升效率大约为5倍左右。
原创粉丝点击