SQL Server 的数据存储(SQL Server data strorage)

来源:互联网 发布:led屏幕控制软件 编辑:程序博客网 时间:2024/05/21 17:38

    SQL Server 的数据存储采用段页式,下面介绍一下SQL Server的存储和管理。

一 理解页和段

     SQL server的数据存储最小单位是页,数据库文件申请的磁盘空间(.mdf 和.ndf)逻辑上被划分成若干页,页号是连续的数字,从0到n,磁盘操作最小单位是页,也就是说,每次磁盘I/O是整页的。

     大小为8k,每页的起始96字节(96 byte)是页头,存储关于该页的系统信息,包括页号、页类型、空闲空间大小和页所有者的unit ID。页类型有以下几种:

         Data(数据页)Data rows with all data, excepttext, ntext,image,nvarchar(max),varchar(max),varbinary(max), andxml data, whentext in row is set to ON.

         Index(索引页)Index entries.

         Text/Image(文本/图片页): Large object data types:text, ntext,image,nvarchar(max),varchar(max),varbinary(max), andxml data 
                                                           Variable length columns when the data row exceeds 8 KB:varchar,nvarchar,varbinary, andsql_variant 
         Global Allocation Map, Shared Global Allocation Map
:Information about whether extents are allocated.(是否分配到段的信息)

         Page Free Space: Information about page allocation and free space available on pages.

         Index Allocation Map:Information about extents used by a table or index per allocation unit.

         Bulk Changed Map:Information about extents modified by bulk operations since the last BACKUP LOG statement per allocation unit.

         Differential Changed Map:Information about extents that have changed since the last BACKUP DATABASE statement per allocation unit.

      所有的数据行在页内连续存储,紧邻Page head,在页末有个行偏移表(row offset table),记录每行的第一个byte和start of page的偏移量。行偏移表(row offset table),的存储顺序是倒序的,即从页末开始存第一行的偏移,然后是第二行的偏移。

      Large Row Support: 数据行不能跨页存储,但数据行的部分数据可以拿到页外存储,因此,一行数据实际上可以很大。在一个页上的单行数据大小不能超过8060字节(8K),但这不包含Text/Image类型的数据,对于varchar, nvarchar, varbinary, 和 sql_variant类型的数据,如果超过了8060B的限制,SQL Server会把这些变长的数据移到Row_OVERFLOW_DATA单元的pages中,申请的空间大小为这些列的最大with,在原来数据页里存储该列的地方,SQL Server维护一个24Byte的指针来指向数据真正存储的地方。如果将来该行(包括变长列)的大小小于8060B,所有的数据会都挪到一页上。

     由8个物理上连续的页组成,所有的页都按段来组织。段是空间管理的基本单位(basic unit)。

     出于管理效率考虑,当数据量比较小时,SQL并不是一次申请整个段给数据表。因此段分为2种类型:单一段(uniform extents)和混合段(Mixed extents)。单一段的8个页都分给段的所有者,而混合段的8个页则可以被多个对象(object)使用。

     新表和索引都会申请混合段,当数据量可以用满8个页,即一个段时,就会自动切换到单一段上来。如果在一个存在的表上建索引,如果数据够8个页面大小,数据库就会为索引直接申请单一段。

二、管理段的申请和空间释放

     SQL Server管理段申请和对free space的track相对来说比较simple,这有2个好处,一是free space信息被打包(densely packed)存储,所以只有很少的页包含这些信息,这样可以提高磁盘操作速度,而且这些信息可以有很大机会被保持在内存里;第二个好处是多数allocation information不是链在一起的(chained together),这样使得维护申请信息(allocation information)变得简单,页的申请和释放很快,也降低了并发任务的争抢。

     SQL Server用两种allocation maps记录已被申请的段,一种是Global Allocation Map(GAM),另一种是Shared Global Allocation Map(SGAM)。

     GAM记录那些段已经被申请了的段,使用一个bit代表段的状态,1代表该段free,0代表该段已经被申请了。GAM pages record what extents have been allocated. Each GAM covers 64,000 extents, or almost 4 GB of data. The GAM has one bit for each extent in the interval it covers. If the bit is 1, the extent is free; if the bit is 0, the extent is allocated.

     SGAM页记录了当前正在使用的混合段状态,这些段包含至少一个未使用的页,使用一个bit代表段的状态,1代表该段正在使用,而且有空闲页,0代表该段未被使用或者是一个混合段,但所有页已经被使用了。SGAM pages record which extents are currently being used as mixed extents and also have at least one unused page. Each SGAM covers 64,000 extents, or almost 4 GB of data. The SGAM has one bit for each extent in the interval it covers. If the bit is 1, the extent is being used as a mixed extent and has a free page. If the bit is 0, the extent is not used as a mixed extent, or it is a mixed extent and all its pages are being used.

     SQL SERVER段申请释放算法很简单:

     申请一个uniform段,DB Engine搜索GAM,找到一个为1的bit,把它标识成0;

     查找一个有空闲页的mixed段,DB Engine搜索SGAM,找到一个为1的bit;

     申请一个mixed段,DB Engine搜索GAM,找到一个为1的bit,把它标识成0,然后到SGAM中将该段标识为1。释放一个段,需要把该段在GAM里标识为1,在SGAM里标识为0.

     PFS(Page Free Space)页记录了每一页的申请状态,不管页是否被申请,PFS都记录该页的申请状态和每一页上的空闲空间。PFS用1个字节记录每一页信息:是否被申请,是否empty,是否1~50%被使用,是否51~80%使用,是否81~95%使用,是否96~100%被使用。

     在一个段被申请后,DB Engine使用PFS记录段中的页的申请/空闲状态,这些信息在db Engine申请新页时使用,页内空闲空间数量只是针对heap and Text/Image pages,只是在DB Engine在保存新数据行时查找空闲空间时使用。索引的插入不需要页内空闲空间信息,因为insert新行时的点是由索引的key值确定。

     PFS页紧跟数据文件的文件头,然后是GAM页,接下来是SGAM页。大约8000页后会有另外一个PFS页,在64000个段后会有另外一个GAM和SGAM页。

三、Tracking Modified Extents(跟踪段的变化)

    SQL Server用2种数据结构管理段的变化,DCM和BCM,这两种结构类似GAM和SGAM,都是用位图管理,具体描述如下:

    DCM(Differential Changed Map):用来跟踪上次DB backup后所有变化的段,用位图表示段的状态,1表示该段被modified过,0表示没有被更改过。这些信息对增量备份非常有用,这是不必扫描所有的段。DCM tracks the extents that have changed since the last BACKUP DATABASE statement. If the bit for an extent is 1, the extent has been modified since the last BACKUP DATABASE statement. If the bit is 0, the extent has not been modified.  Differential backups read just the DCM pages to determine which extents have been modified. This greatly reduces the number of pages that a differential backup must scan. The length of time that a differential backup runs is proportional to the number of extents modified since the last BACKUP DATABASE statement and not the overall size of the database.

    BCM (Bulk Changed Map):用来跟踪上次LOG backup后所有因为bulk日志导致变更的段。用位图表示段的状态,1表示该段被modified过,0表示没有被更改过。This tracks the extents that have been modified by bulk logged operations since the last BACKUP LOG statement. If the bit for an extent is 1, the extent has been modified by a bulk logged operation after the last BACKUP LOG statement. If the bit is 0, the extent has not been modified by bulk logged operations.   Although BCM pages appear in all databases, they are only relevant when the database is using the bulk-logged recovery model. In this recovery model, when a BACKUP LOG is performed, the backup process scans the BCMs for extents that have been modified. It then includes those extents in the log backup. This lets the bulk logged operations be recovered if the database is restored from a database backup and a sequence of transaction log backups. BCM pages are not relevant in a database that is using the simple recovery model, because no bulk logged operations are logged. They are not relevant in a database that is using the full recovery model, because that recovery model treats bulk logged operations as fully logged operations.

    DCM和BCM都可以记录大约4G的段空间状态,他们紧邻GAM和SGAM之后。

 

原创粉丝点击