SQLServer 通过DMV实现低影响的自动监控和历史场景追溯

来源:互联网 发布:显示自己是阿里云的ip 编辑:程序博客网 时间:2024/05/01 07:41

一. 背景:

     我们管理和维护的数据库系统基本都是7*24小时运转的,运转时会出现什么情况谁都无法估计,往往一个平时运行很正常的系统,某天晚上突然就抽风了,而此刻并没有系统负责人在旁边,当值班同事的电话打到正在酣睡的DBA手机上时,DBA不得不朦朦胧胧的,条件反射要上数据库看看;要在家里连到公司的内外一般都需要DBA通知运维人员(也可能是个在做梦的哥们)先给自己开个VPN(某些管控严格的公司还需要DBA打车去公司才能处理),然后通过一大堆的验证才通过慢如蜗牛(往往也和此刻的心态有关)的网络,远程连接到了公司内网出问题的数据库上,此刻DBA才开始真正开始排查起问题来;如果说等待DBA登录到问题数据库上,问题依然在出现,那还算好,可以很快找到问题并处理,但是往往这种抽风的现象是短暂的,可能持续几分钟后,系统又恢复了正常,等到DBA"跑山涉水,翻山越岭" 的好不容易登录到数据库上检查时,数据库里的进程、锁、日志一起都正常,系统运行的很Hai;于是第二天上班时老大们问起昨天晚上事故的原因时,DBA只能凭空猜测,可能是网络、数据库阻塞、抑或是程序方面的问题吧......,此刻不同部门的人都猜是其他部门管理的东西出了问题,于是大家都把手头上的数据、监控图拿出来,证明自己这里是没有问题的;如果监控网络和系统的图和APP的Log都没用出现啥问题(说实话监控也不一定准确的),那就基本要把问题推到数据库身上了(DBA往往成为炮灰)。

      数据库有没有出问题,如果出了问题又是啥问题呢?windows Log和DB Log都没异常,如果我们没有把Profile持续的开启(基于性能的考虑,一般都不会持续开启),就很难说清楚在系统出问题时数据库究竟有没有出问题,出了问题又是因为什么原因引起的;如果说此类问题出现了一次,就消失了,那还算好,成为一个无头案,悲剧的是这类问题无规律,反复的出现,如果DBA不能找出问题的原因,也不能证明数据库当时是正常的话,估计在公司里面就没用立足之地了。

 

二 . 对策:

    其实对这种灵异的系统抽风事件,有两种比较好的解决方案:

    1. 开启Trace 跟踪,就是数据库的Profile功能 ,这个还可以结合windows的性能计数器一同使用,能够直观的了解到特殊时间点上运行了什么语句,资源消耗情况是怎样的;不过这个方法比较消耗系统的资源,对访问压力比较大的数据库需要慎重;

    2. 收集数据库DMV当时的情况,使得DBA在故障时间过后,还能通过这些数据了解到事故发生时,数据库里面运行的语句,以及锁和资源的情况;这种方法只是访问系统性能视图,对数据库其他业务影响比较小,也是我接下来要介绍的方案。

 

三. 实施方案:

   思路:根据数据库中某些动态指标,触发收集DMV的过程,必要时发送报警邮件

1. 创建存储系统性能视图的表:

--创建四个记录表
USE [master]
GO

/****** Object:  Table [dbo].[dc_block_info]    Script Date: 05/23/2011 11:35:48 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[dc_block_info](
    [spid] [smallint] NULL,
    [status] [nchar](30) NULL,
    [SQLBuffer] [nvarchar](max) NULL,
    [hostname] [nchar](128) NULL,
    [BlkBy] [varchar](10) NULL,
    [BlockedSQLBuffer] [nvarchar](max) NULL,
    [LoginName] [varchar](100) NULL,
    [DBName] [varchar](50) NULL,
    [CPUTime] [int] NULL,
    [DiskIO] [int] NULL,
    [LastBatch] [datetime] NULL,
    [program_name] [nchar](128) NULL,
    [Command] [varchar](100) NULL,
    [batch_id] [int] NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

USE [master]
GO

/****** Object:  Table [dbo].[dc_Blocked_Resource_Info]    Script Date: 05/23/2011 11:35:48 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[dc_Blocked_Resource_Info](
    [spid] [smallint] NULL,
    [dbid] [smallint] NOT NULL,
    [ObjId] [int] NOT NULL,
    [IndId] [smallint] NOT NULL,
    [Type] [nvarchar](4) NULL,
    [Resource] [nvarchar](32) NULL,
    [Mode] [nvarchar](8) NULL,
    [Status] [nvarchar](5) NULL,
    [batch_id] [int] NULL
) ON [PRIMARY]

GO

USE [master]
GO

/****** Object:  Table [dbo].[dc_info_BlockedInfo]    Script Date: 05/23/2011 11:35:48 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[dc_info_BlockedInfo](
    [batch_id] [int] NULL,
    [lock_type] [varchar](100) NULL,
    [database_name] [varchar](20) NULL,
    [blk_object] [varchar](100) NULL,
    [lock_req] [varchar](100) NULL,
    [waiter_sid] [int] NULL,
    [wait_time] [int] NULL,
    [waiter_batch] [varchar](max) NULL,
    [waiter_stmt] [varchar](max) NULL,
    [blocker_sid] [int] NULL,
    [blocker_stmt] [varchar](max) NULL,
    [create_date] [datetime] NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

USE [master]
GO

/****** Object:  Table [dbo].[dc_info_SessionConn]    Script Date: 05/23/2011 11:35:48 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[dc_info_SessionConn](
    [batch_id] [int] NULL,
    [session_id] [int] NULL,
    [blocking_session_id] [int] NULL,
    [textdata] [varchar](max) NULL,
    [login_name] [varchar](30) NULL,
    [host_name] [varchar](100) NULL,
    [database_name] [varchar](30) NULL,
    [program_name] [varchar](200) NULL,
    [command] [varchar](100) NULL,
    [status] [varchar](20) NULL,
    [cpu_time] [int] NULL,
    [memory_usage_kb] [int] NULL,
    [reads] [int] NULL,
    [writes] [int] NULL,
    [transaction_isolation_level] [int] NULL,
    [connect_time] [datetime] NULL,
    [last_read] [datetime] NULL,
    [last_write] [datetime] NULL,
    [net_transport] [varchar](20) NULL,
    [client_net_address] [varchar](30) NULL,
    [client_tcp_port] [int] NULL,
    [local_tcp_port] [int] NULL,
    [start_time] [datetime] NULL,
    [wait_time] [int] NULL,
    [wait_type] [varchar](100) NULL,
    [last_wait_type] [varchar](100) NULL,
    [wait_resource] [varchar](1000) NULL,
    [open_transaction_count] [int] NULL,
    [create_date] [datetime] NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

2. 定义性能收集的存储过程:

USE [master]
GO

--数据库的阻塞和锁信息,该SP可以记录session中显示不出的信息  
create procedure [dbo].[usp_blocker_info](@batch_id int)   
                                                                                                                                                                                                                    
as   
                                                                                                                                                                                                                                                               
begin   
                                                                                                                                                                                                                                                            
 set nocount on   
                                                                                                                                                                                                                                                  
 declare @spid smallint,@blocked smallint   
                                                                                                                                                                                                                                          
 declare @c_SQL nvarchar(4000),@b_SQL nvarchar(4000)   
                                                                                                                                                                                                                                  
 declare @id int   
                                                                                                                                                                                                                                                 
 --保存Sysprocesses 的内容 
                                                                                                                                                                                                                                           
 create table #Temp(spid smallint,status nchar(30),hostname nchar(128),program_name nchar(128)   
                                                                                                                                                                   
   ,cmd nchar(16),cpu int,physical_io int,blocked smallint,dbid smallint   
                                                                                                                                                                                         
   ,loginame nchar(128),last_batch datetime   
                                                                                                                                                                                                                      
   ,SQLBuffer nvarchar(4000),BlockedSQLBuffer nvarchar(4000))   

      
 --保存DBCC InputBuffer 的结果 
                                                                                                                               
 create table #Temp1(id int identity(1,1),eventtype varchar(20),parameters int,eventinfo nvarchar(4000))   
  
 create table #Temp_b(id int identity(1,1),eventtype varchar(20),parameters int,eventinfo nvarchar(4000))  


 select * into #Temp2   
                                                                                                                                                                                                                                           
  from master..sysprocesses (nolock)   
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
 
 --保存被阻塞的进程信息 
                                                                                                                                                                                                                                                    
 insert into #Temp(spid ,status ,hostname ,program_name ,cmd ,cpu ,physical_io ,blocked ,dbid    
                                                                                                                                                                  
   ,loginame ,last_batch )   
                                                                                                                                                                                                                                      
  SELECT spid ,status ,hostname ,program_name ,cmd ,cpu ,physical_io ,blocked ,dbid   
                                                                                                                                                                              
     ,convert(sysname, rtrim(loginame)) ,last_batch   
                                                                                                                                                                                                              
   from #Temp2   
                                                                                                                                                                                                                                                   
   where blocked > 0   

 --保存阻塞的源头 
                                                                                                                                                                                                                                                       
 insert into #Temp(spid ,status ,hostname ,program_name ,cmd ,cpu ,physical_io ,blocked ,dbid    
                                                                                                                                                                   
   ,loginame ,last_batch )   
                                                                                                                                                                                                                                       
  SELECT spid ,status ,hostname ,program_name ,cmd ,cpu ,physical_io ,blocked ,dbid   
                                                                                                                                                                              
     ,convert(sysname, rtrim(loginame)) ,last_batch   
                                                                                                                                                                                                              
   from #Temp2   
                                                                                                                                                                                                                                                   
   where spid in (select blocked from #Temp)   
                                                                                                                                                                                                                     
    and spid not in (select spid from #Temp)   
                                                                                                                                                                                                                                                     
 
 select @spid = min(spid) from #Temp   
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
 
 while @spid is not null   
                                                                                              
 begin   
                                                                                                                                                                                                                                                           
  set @c_SQL ='dbcc inputbuffer(' + convert(varchar(5), @spid) + ')'   
   
  select @blocked= isnull(blocked,0) from #Temp where spid=@spid 
   
  if(@blocked<>0
   begin 
    set @b_SQL ='dbcc inputbuffer(' + convert(varchar(5), @blocked) + ')'   
    insert into #Temp_b 
        exec (@b_SQL)   
         
   select @id = @@identity   
    
   update #Temp   
                                                                                                                                                                                                                                                  
   set BlockedSQLBuffer = #Temp_b.eventinfo   
                                                                                                                                                                                                                             
   from #Temp,#Temp_b   
                                                                                           
   where #Temp_b.id = @id   
                                                                                                                                                                                                                                         
    and #Temp.blocked = @blocked   
   end 
                                                                                                                                                                                                                                                       
 
  insert into #Temp1   
                                                                                                                                                                                                                                             
   exec (@c_SQL)   
                                                                                                                                                                                                                                                 

  select @id = @@identity   
                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                       
   update #Temp   
                                                                                                                                                                                                                                                  
    set SQLBuffer = #Temp1.eventinfo   
                                                                                                                                                                                                                             
   from #Temp,#Temp1   
                                                                                                                                                                                                                                             
   where #Temp1.id = @id   
                                                                                                                                                                                                                                         
    and #Temp.spid = @spid   
                                                                                                                                                                                                                                                     
 
  select @spid = min(spid) from #Temp where spid > @spid   
                                                                                                                                                                                                         
 end   
                                                                                                                                                                                      
    
 insert into dc_block_info(batch_id,spid,status,SQLBuffer,hostname,BlkBy,BlockedSQLBuffer,LoginName,DBName, 
 CPUTime,DiskIO,LastBatch,program_name,Command)                                                                                                                                                                                                                
                                              
 SELECT @batch_id,convert(char(5),spid) SPID,CASE lower(status) When 'sleeping' Then lower(status) Else upper(status) END Status   
                                                                                                                                          
   ,SQLBuffer   
                                                                                                                                                                                                                                                   
   , CASE hostname When Null  Then '  .' When ' ' Then '  .' Else hostname END HostName   
                                                                                                                                                                          
   ,CASE isnull(convert(char(5),blocked),'0') When '0' Then '  .'   
                                                                                                                                                                                                
                       Else isnull(convert(char(5),blocked),'0') END BlkBy,BlockedSQLBuffer   
                                                                                                                                                                                       
   ,loginame Login   
                                                                                                                                                                                                                                               
   ,db_name(dbid) DBName,convert(varchar,cpu) CPUTime   
                                                                                                                                                                                                            
   ,convert(varchar,physical_io) DiskIO,Last_Batch LastBatch   
                                                                                                                                                                                                     
   ,program_name ProgramName, cmd Command   
                                                                                                                                                                                                                        
  from #Temp   
                                                                                                                                                                                                                                                     
  order by BlkBy, spid   
                                                                                                                                                                                                                                                        
 
 set nocount off   
                                                                                                                                                                                                                                                 
end   

Go

USE [master]
GO   
 
--预警SP
CREATE proc [dbo].[RecodeAndAlertInfo]   
as   
begin   
declare @banch_id int   
select @banch_id=isnull(MAX(batch_id),0)+1 from dc_info_SessionConn  

--记录当前所有会话信息
insert into dc_info_SessionConn   
SELECT @banch_id,S.session_id, R.blocking_session_id,   
 current_execute_sql = SUBSTRING(T.text,   
    R.statement_start_offset / 2 + 1,   
    CASE   
     WHEN statement_end_offset = -1 THEN LEN(T.text)   
     ELSE (R.statement_end_offset - statement_start_offset) / 2+1   
    END),   
 S.login_name,S.host_name,databaseName=DB_NAME(R.database_id),S.program_name,R.command,   
 S.status,S.cpu_time, memory_usage_kb = S.memory_usage * 8, S.reads, S.writes,   
 S.transaction_isolation_level,C.connect_time, C.last_read, C.last_write,   
 C.net_transport, C.client_net_address, C.client_tcp_port, C.local_tcp_port,   
 R.start_time, R.wait_time, R.wait_type, R.last_wait_type, R.wait_resource,   
 R.open_transaction_count,GETDATE()    
FROM sys.dm_exec_sessions S   
 LEFT JOIN sys.dm_exec_connections C   
  ON S.session_id = C.session_id   
 LEFT JOIN sys.dm_exec_requests R   
  ON S.session_id = R.session_id   
   AND C.connection_id = R.connection_id   
 OUTER APPLY sys.dm_exec_sql_text(R.sql_handle) T   
WHERE  S.is_user_process = 1  -- 如果不限制此条件,则查询所有进程(系统和用户进程) 
AND command is not null   
       
   
--记录当前阻塞信息 
insert into dc_info_BlockedInfo   
select @banch_id,t1.resource_type as [lock type] ,db_name(resource_database_id) as [database]   
,t1.resource_associated_entity_id as [blk object]   
,t1.request_mode as [lock req]                          -- lock requested    
,t1.request_session_id as [waiter sid]                      -- spid of waiter   
,t2.wait_duration_ms as [wait time]         
,(select text from sys.dm_exec_requests as r                  --- get sql for waiter   
cross apply sys.dm_exec_sql_text(r.sql_handle)    
where r.session_id = t1.request_session_id) as waiter_batch   
,(select substring(qt.text,r.statement_start_offset/2,    
(case when r.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) *2    
else r.statement_end_offset end - r.statement_start_offset)/2)    
from sys.dm_exec_requests as r   
cross apply sys.dm_exec_sql_text(r.sql_handle) as qt   
where r.session_id = t1.request_session_id) as waiter_stmt    --- statement executing now   
,t2.blocking_session_id as [blocker sid]                --- spid of blocker   
,(select text from sys.sysprocesses as p                       --- get sql for blocker   
cross apply sys.dm_exec_sql_text(p.sql_handle)    
where p.spid = t2.blocking_session_id) as blocker_stmt,getdate()   
from sys.dm_tran_locks as t1, sys.dm_os_waiting_tasks as t2   
where t1.lock_owner_address = t2.resource_address   
     

--记录资源信息
insert into dc_Blocked_Resource_Info 
select  convert (smallint, req_spid) As spid,   
  rsc_dbid As dbid,   
  rsc_objid As ObjId,   
  rsc_indid As IndId,   
  substring (v.name, 1, 4) As Type,   
  substring (rsc_text, 1, 32) as Resource,   
  substring (u.name, 1, 8) As Mode,   
  substring (x.name, 1, 5) As Status ,@banch_id  
 from  master.dbo.syslockinfo,   
  master.dbo.spt_values v,   
  master.dbo.spt_values x,   
  master.dbo.spt_values u     
 where   master.dbo.syslockinfo.rsc_type = v.number   
   and v.type = 'LR'   
   and master.dbo.syslockinfo.req_status = x.number   
   and x.type = 'LS'   
   and master.dbo.syslockinfo.req_mode + 1 = u.number   
   and u.type = 'L'   
 and substring (x.name, 1, 5) = 'WAIT' 
 order by spid   
 
 exec usp_blocker_info @banch_id
   
end

GO

3. 创建信息收集和邮件报警的SP:

    说明:

    SP定义了一个邮件发送的过程,需要先配置好数据库的邮件发送(google一下很多的);

    SP需要做到JOB里面,一分钟运行一次收集信息;

    SP收集信息时的条件(不同的系统触发条件不一样):

       a. 用户链接数大于550,活动链接数大于40;

       b. 阻塞进程比率大于10%;

USE [master]
GO   
--创建触发监控条件,并发邮件
CREATE proc [dbo].[ConAlert]         
as         
begin          
 DECLARE @conn float ,@activeconn float ,@blockedcount float,@spidcount float,@spidblocked float ,@count int      
 DECLARE @body NVARCHAR(MAX),@subject  nvarchar(200)     
 
 -- User Connections        
  Select @conn=ISNULL(cntr_value,0) from sys.dm_os_performance_counters with(nolock)
  where counter_name='User Connections'
     
  if @conn>=550 --根据用户链接数来收集信息和触发报警   
   begin     
     --active requests   
     Select @activeconn=isnull(SUM(cntr_value),0) from sys.dm_os_performance_counters with(nolock)
       where counter_name = 'Active requests'
       group by counter_name

    set @count=1

    if @activeconn>=40  --根据活动链接数来判断     
     begin       
       while 1=1  --循环收集信息
        begin
            exec RecodeAndAlertInfo   

           if(@count%5=0) --连续次就发邮件
             begin
               set @subject='Server:'+@@SERVERNAME+' Connections Alert'
               SET @body ='Server:'+@@SERVERNAME+CHAR(13)+'; UserConnections:'+cast(@conn as varchar)+CHAR(13)+'; ActiveRequests:'+ cast(@activeconn as varchar)              
               EXEC msdb.dbo.sp_send_dbmail           
                @recipients =N'Ken@xxx.com;itmon@xxx.com',                      
                @body = @body,           
                @body_format ='TEXT',           
                @subject =@subject,           
                @profile_name ='dba_profile' --需要配置该模板 
             end

             Select @activeconn=isnull(SUM(cntr_value),0) from sys.dm_os_performance_counters with(nolock)
               where counter_name = 'Active requests'
               group by counter_name

             if @activeconn>=40
                 waitfor delay '00:00:05'
             else
                  break;

             if @count>50
                   break;

           set @count=@count+1;
       end   
   end        
   end     
 else --根据Blocked 百分比来收集信息和触发报警  
   begin     
     select @spidcount=count(0) from sys.sysprocesses with(nolock) where spid>50       
     select @blockedcount=count(0) from sys.sysprocesses with(nolock) where spid>50 and blocked<>0 
     --计算百分比
     set @spidblocked=@blockedcount/@spidcount
     set @count=1

     if(@spidblocked>=0.1) --10%时,记录信息 
      begin       
       while 1=1  --循环收集信息
        begin
            exec RecodeAndAlertInfo  

           if(@count%5=0 or @spidblocked>=0.15) --连续5次阻塞比率大于10%或者阻塞比率大于等于15%时,记录信息并发生报警邮件   
             begin
                set @subject='Server:'+@@SERVERNAME+' Blocked Alert'             
                SET @body ='Server:'+@@SERVERNAME+CHAR(13)+'; BlockedCount:'+cast(@blockedcount as varchar)+CHAR(13)+'; SPIDCount:'+ cast(@spidcount as varchar)              
                EXEC msdb.dbo.sp_send_dbmail           
                @recipients =N'Ken@xxx.com',                      
                @body = @body,           
                @body_format ='TEXT',           
                @subject =@subject,           
                @profile_name ='dba_profile'   
             end

             select @spidcount=count(0) from sys.sysprocesses with(nolock) where spid>50       
             select @blockedcount=count(0) from sys.sysprocesses with(nolock) where spid>50 and blocked<>0 
             set @spidblocked=@blockedcount/@spidcount

             if @spidblocked>=0.1
                 waitfor delay '00:00:05'
             else
                  break;

             if @count>50
                   break;

           set @count=@count+1;
       end
      end 
   end 
end

4. 查看信息:

USE [master]
GO
--通过以下查询来分析信息
declare @batch_id int
select @batch_id=isnull(MAX(@batch_id),1) from dc_info_SessionConn

select  * from  dc_info_SessionConn where batch_id=@batch_id
select  * from  dc_info_BlockedInfo where batch_id=@batch_id
select  * from  dc_block_info       where batch_id=@batch_id
select  * from  dc_Blocked_Resource_Info where batch_id=@batch_id

 

    有了这些阻塞、进程和资源的信息,我们就可以了解系统出问题时,数据库是否出现异常,以及数据库异常时都有些什么语句在系统中运行,它们是否造成了大量的阻塞或消耗了大量的资源;这样DBA就方便定位问题了。

 

原创粉丝点击