【CSCI3170学习笔记】Storage and Index

来源:互联网 发布:绿色版软件加入注册表 编辑:程序博客网 时间:2024/06/05 16:40

/*This is a study note written by Edward HUANG 

* in order to study CSCI3170 introduction to database better.

* Many of the content are from professor Wong's lecture notes.

*/

1. What is the time for accessing a disk block?
  • time = seek time + rotational delay + transfer time
  • Most of the contribution are fromseek time and rotational delay.
2. What are 'disk block' & 'disk Sector'
  • Data are stored and retrieved in units called disk blocks or pages
  • Block size is multiple of sector size.
  • A sector is a sequence of bytes.
3. What is 'Page'?

Each page has a fixed size. It contains a sequences of record.

4. Concept of 'Buffer Management'
  • Occurs when the page is requested for operation and it is not yet in the buffer.
  • Replacement Policy
  • Dirty Bit
------------------------------------------------------------------------------------
5. What is the format of a record?
There are two kinds of formats of a record.
  • Fixed length Record
    • Information about field types are same for all records in a file.
    • Information about field types are stored insystem catalogs.
  • Variable length Record (number of field is fixed)
    • Either fields are delimited by special symbols
      • Number of fields|field 1$ field 2$ field 3$ field 4 
    • Or Record begins with array of field (byte) offsets
      • |byte1|byte2|byte3|byte4|byte5|byte6|byte7|byte8|byte9|byte10 |byte11 |
      • |0x5  |0x7  |0x9  |0xA  |0xB  |0x1  |0x2  |0x3  |0x3  |0x1    |0x2    |
      •                               |   field1  |  field 2  |filed 3|filed 4| 

6. What is the format for a Page?

  • Fixed length

|Slot    1|  or unpacked   |Slot    1|

|Slot    2|                |Free Slot|

|Slot    3|         |Slot    3|

|Slot    4|         |Slot    4|

|Free Slot|                |Free Slot|

Record id = <page id, slot#>

  • Variable length


*Pages or block is okay when doing I/O, but higher levels of DBMS operate on records, and files of records.

------------------------------------------------------------------------------------

7.What is File? 
  • It's a collection of pages, each containing a collection of record.
  • File should support insertion, deletion, modification.
  • File should be able to read a particular record.
  • File should be able to scan all records (in every pages).
8.Three kinds of file format
Heap File
  • Using List
    • Header page and data page
  • Using page directory
    • Much smaller than linked list of all Heap File pages.
Sorted Files

Best if records must be retrieved in some order, or only a 'range' of records is needed.

Indexes

  • Data structures to organize records via trees or hashing.
  • They speed up searches for a subset of records, based on values in certain fields.
  • Updates are much faster than in sorted files.

Functionality: An index on a file speeds up selections on the search key fields for the index.

Index must support efficient retrieval of data entries k* with a given key value k.

Structure: Anindex contains a collection of 'data entries'.

What is data entry? Adata entry is denoted ask*, wherek is a search key value and* tells where to find the record containingk.

Structure of data entry

  • <k, data record with search key value k>
  • <k, record id of date record with search key value k>
  • <k, list of record id with search key value k>
Primary VS Secondary
If search key contains primary key, then the index is called primary index, otherwise it is called secondary index.
Clustered VS Unclustered
If order of data records is the same as, or 'close to', order of data entries, then the index is called clustered index.

 

Generally speaking,file contains pages, and page contains records.

  • file = {page 1, page 2, page 3, .....page N}.
  • page1 = {record 1, record 2, record 3, ......record M}.




0 0
原创粉丝点击