VSSADMIN命令无法列出SQL writer的问题

来源:互联网 发布:知乎 双立人 电视购物 编辑:程序博客网 时间:2024/05/16 14:59

我们通常有两种方式来备份SQL Server的数据库。一种是通过在SQL Server中运行BACKUP DATABASE的T-SQL语句来备份数据库。另一种方法是使用SQL writer service来备份数据库。SQL writer service提供了一种通过Volume Shadow Copy Service (VSS)框架来备份SQL Server数据库的途径。第三方厂商可以开发自己的备份应用程序,通过调用SQL writer service来实现SQL数据库的备份。微软自己的一些产品,比如Data Protection Manager (DPM)也是使用的SQL writer service来做备份的。

关于SQL Writer service和VSS的更多内容,可以参考:

http://technet.microsoft.com/en-us/library/cc966520.aspx

http://msdn.microsoft.com/en-us/library/ms175536.aspx

当我们遇到SQL writer service无法正常工作的问题时,我们通过会通过以下命令来对SQL writer service做一个快速而简单的测试:

vssadmin list writers

该命令会列出服务器上所有使用VSS的writers,如果SQL writer service无法被列出,那么自然证明SQL writer service和VSS之间的协同工作有问题。那么SQL writer service无法正常工作也在情理之中了。

要进一步研究为什么SQL writer service无法被列出,一个很有效的方法是使用SQL writer的trace。开启trace步骤如下:

(1). 以下步骤都在SQL Server上执行
(2).备份注册表键值“HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\SQLWriter\”.
(3). 向注册表中添加如下键值:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\SQLWriter\Debug\Tracing]

"TraceEnterExit"=dword:00000001

"TraceFile"="c:\\trace.txt"

"TraceFileLineInfo"=dword:00000001

"TraceForceFlush"=dword:00000001

"TraceLevel"=dword:ffffffff

"TraceTimestamp"=dword:00000001

"TraceToDebugger"=dword:00000001

"TraceToFile"=dword:00000001


(4). 重启SQL Writer Service。此时我们配置的trace就会开始运行。

(5). 打开命令行,运行:

vssadmin list writers

(6). 运行完成后,打开C:\trace.txt文件来查看是否有任何错误。如果需要关闭trace,只要还原之前备份的键值,并且重启SQL writer service即可。

根据经验,大多数SQL writer service无法被列出的问题是由以下两种问题造成的:

<!--[if !supportLists]-->(1)     < !--[endif]-->当VSSADMIN尝试列出SQL writer service时,他们尝试去连接服务器上“所有”的SQL Server实例,一旦有一个实例无法连接上,则SQL writer service就无法被列出。

<!--[if !supportLists]-->(2)     < !--[endif]-->当VSSADMIN连接上SQL Server后,他会运行一句查询语句,如果这句查询语句失败,则SQL writer service就无法被列出。

这两个问题都可以在trace文件里发现。以下是一个trace文件的示例。

---------------------------------------------------------------------------------

[0082394203,0x0008b0:0x1c10:0xfc9d47b3] sqlwriter.yukon\sqllib\sqlconnect.cpp(0589): SqlConnection::Connect: Connecting to server A000S-ITSQL05...

[0082394234,0x0008b0:0x1c10:0xfc9d47b3] sqlwriter.yukon\sqllib\sqlconnect.cpp(0630):SqlConnection::Connect: Connected

[0082394250,0x0008b0:0x1c10:0xfc9d47b3] sqlwriter.yukon\sqllib\sqlconnect.cpp(0652): SqlConnection::Connect: Version: 09.00.3042

[0082394265,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlConnection::Connect}: hr: 0x00000000

[0082394281,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlConnection::SetCommand}:

[0082394296,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlConnection::ReleaseRowset}:

[0082394312,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlConnection::ReleaseRowset}: hr: 0x00000000

[0082394328,0x0008b0:0x1c10:0xfc9d47b3] sqlwriter.yukon\sqllib\sqlconnect.cpp(0916): SqlConnection::SetCommand: SetCommand (select rtrim(physical_name),rtrim(type_desc),rtrim(state_desc), is_name_reserved from sys.master_files where DB_ID(N'RFTHSOWMS ') = database_id)

[0082394359,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlConnection::SetCommand}: hr: 0x00000000

[0082394375,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlConnection::ExecCommand}:

[0082394390,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlConnection::ReleaseRowset}:

[0082394406,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlConnection::ReleaseRowset}: hr: 0x00000000

[0082394453,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlConnection::ExecCommand}: hr: 0x00000000

[0082394468,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlConnection::FetchFirst}:

[0082394484,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlConnection::FetchFirst}: hr: 0x00000000

[0082394500,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlEnumerator::FirstFile}: hr: 0x00000000

[0082394531,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlEnumerator::NextFile}:

[0082394546,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlConnection::FetchNext}:

[0082394578,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlConnection::FetchNext}: hr: 0x00000000

[0082394593,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlEnumerator::NextFile}: hr: 0x00000000

[0082394640,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlEnumerator::NextFile}:

[0082394656,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlConnection::FetchNext}:

[0082394687,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlConnection::FetchNext}: hr: 0x00040ec6

[0082394718,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlEnumerator::NextFile}: hr: 0x00040ec6

[0082394750,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {CSqlWriter::CheckFilesBeforeAddComponent}: hr: 0x00040ec6

[0082394765,0x0008b0:0x1c10:0xfc9d47b3] ENTER {CSqlWriter::AddComponentAndFiles}:

[0082397921,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlEnumerator::FirstFile}:

[0082397937,0x0008b0:0x1c10:0xfc9d47b3] ENTER {SqlConnection::FetchFirst}:

[0082397968,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlConnection::FetchFirst}: hr: 0x00000000

[0082397984,0x0008b0:0x1c10:0xfc9d47b3] EXIT  {SqlEnumerator::FirstFile}: hr: 0x00000000

[0082401375,0x0008b0:0x1c10:0xfc9d47b3] sqlwriter.yukon\sqlwriter\sqlwriter.cpp(2233): CSqlWriter::AddComponentAndFiles: IVssCreateWriterMetadata::AddFilesToFileGroup failed.  hr = 0x80042308

---------------------------------------------------------------------------------

首先我们可以看到SQL Server实例已经被成功连接到了(SqlConnection::Connect: Connected)。如果这里显示连接失败的话,我们可以到SQL errorlog里查看是否在对应时间里有login failed的错误(error 18456)被记录。如果有的话,我们就可以做进一步工作来检查为什么会有这样的连接失败产生。

连接成功后,VSSADMIN就开始执行语句:

SqlConnection::SetCommand: SetCommand (select rtrim(physical_name),rtrim(type_desc),rtrim(state_desc), is_name_reserved from sys.master_files where DB_ID(N'RFTHSOWMS ') = database_id

在上面我们给出的实例中,这句语句的运行失败了。于是导致了VSSADMIN无法列出SQL writer service。

为什么查询会失败呢?该查询语句中,数据库名“RFTHSOWMS ”在末尾处有一个空格。就是这个空格导致了查询的时候。事实上,只要数据库名的开头或者末尾有空格,又或是数据库名包含打印不出的字符的话,就会导致查询失败。关于这个问题,在微软的知识库文章中已经有讲到过,大家有兴趣的话可以去看看这篇文章:

http://support.microsoft.com/default.aspx?scid=kb;en-US;2014054

SQL writer service的trace是非常有用的。它不但可以发现vssadmin list writers时产生的问题,对于一些在使用SQL writer或者MSDE writer备份数据库的过程中发生的问题也能提供非常有用的信息。(MSDE writer是SQL Server 2000的时候SQL使用VSS框架的writer。)

我以前遇到过一个使用MSDE writer备份数据库报错的问题。我使用了trace来排查这个问题。打开MSDE writer的trace的方法和SQL writer service类似,只是注册表的键值不同。对于MSDE writer,我们要添加的键值是:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\VSS\Debug\Tracing]
"TraceFile"="c:\\trace.txt"
"TraceLevel"=dword:ffffffff
"TraceEnterExit"=dword:00000001
"TraceToFile"=dword:00000001
"TraceToDebugger"=dword:00000000
"TraceFileLineInfo"=dword:00000001
"TraceForceFlush"=dword:00000000

添加完后,重启MSDE writer service使trace生效。然后运行backup软件来重现问题。然后查看Trace文件,发现:

---------------------------------------------------------------------------------

[3012184656,0x0011f8:0x1290:0x5d93de09] modules\sqllib\sqlconnect.cpp(0198): DumpErrorInfo: Error state: 2

Severity:              14

[3012184656,0x0011f8:0x1568:0x5d93de09] modules\sqllib\sqlconnect.cpp(0239): DumpErrorInfo: Error message:               Database 'ISALOG_20081021_FWS_000' cannot be opened due to inaccessible files or insufficient memory or disk space.  See the SQL Server errorlog for details.

[3012184671,0x0011f8:0x131c:0x5d93de09] modules\sqllib\sqlconnect.cpp(0605): SqlConnection::SetCommand: SetCommand (BACKUP DATABASE [ISALOG_20100512_FWS_000] TO VIRTUAL_DEVICE='{D431F72B-C2B1-49C0-9B80-01D7B2095BF9}9' WITH SNAPSHOT,BUFFERCOUNT=1,BLOCKSIZE=1024)

[3012184671,0x0011f8:0x15a8:0x5d93de09] ENTER {SqlConnection::ExecCommand}:

[3012184671,0x0011f8:0x0cc0:0x5d93de09] EXIT  {SqlConnection::ReleaseRowset}: hr: 0x00000000

[3012184671,0x0011f8:0x0a9c:0x5d93de09] modules\sqllib\sqlconnect.cpp(0151): DumpErrorInfo: HRESULT:     0X80040E14

[3012184687,0x0011f8:0x0b7c:0x5d93de09] modules\sqllib\sqlconnect.cpp(0198): DumpErrorInfo: Error state: 1

Severity:              16

[3012184687,0x0011f8:0x164c:0x5d93de09] modules\sqllib\sqlconnect.cpp(0239): DumpErrorInfo: Error message:               BACKUP DATABASE is terminating abnormally.

[3012184687,0x0011f8:0x0b98:0x5d93de09] modules\sqllib\sqlconnect.cpp(0158): DumpErrorInfo: SQLSTATE:  42000

---------------------------------------------------------------------------------

从错误上看问题应该是发生在SQL Server内部。于是看了SQL Server的errorlog。Errorlog里在相应的时间里记载了如下信息:

2010-08-05 18:30:33.51 spid10    This SQL Server has been optimized for 8 concurrent queries. This limit has been exceeded by 10 queries and performance may be adversely affected.

2010-08-05 18:30:34.65 spid63    BackupMedium::ReportIoError: write failure on backup device '{D431F72B-C2B1-49C0-9B80-01D7B2095BF9}9'. Operating system error 995(error not found).

2010-08-05 18:30:34.65 spid63    Internal I/O request 0x43333C28: Op: Write, pBuffer: 0x04430400, Size: 1024, Position: 0, UMS: Internal: 0x0, InternalHigh: 0x0, Offset: 0x10001, OffsetHigh: 0x0, m_buf: 0x00030001, m_len: 0, m_actualBytes: 0, m_errcode: 995, BackupFile: {D431F72B-C2B1-49C0-9B80-01D7B2095BF9}9

2010-08-05 18:30:34.65 backup    BACKUP failed to complete the command BACKUP DATABASE [ISALOG_20100512_FWS_000] TO VIRTUAL_DEVICE='{D431F72B-C2B1-49C0-9B80-01D7B2095BF9}9' WITH SNAPSHOT,BUFFERCOUNT=1,BLOCKSIZE=1024

原来客户使用的SQL Server是MSDE的版本。MSDE版本的SQL Server限制最大工作线程数是8个。但是客户使用了备份工具同时备份了多个数据库。每个数据库的备份都占用了工作线程,最后导致有部分数据库备份的请求由于得不到工作线程而失败。

原创粉丝点击