跳转至

Indexing

约 1016 个字 -1 行代码 14 张图片 预计阅读时间 3 分钟

Outline:

  • Basic Concepts
  • Ordered indices
  • B+ Tree Index
  • B+ Tree File Organization
  • B Tree Index Files
  • Indices on Multiple Keys
  • Indexing on a Flash
  • Indexing in Main Memory
  • Write Optimized Indices
    • Log Structured Merge(LSM) Tree
    • Buffer Tree
  • Bitmap indices

Basic Concepts

Search Key - attribute to set of attributes used to look up records in a file.

An index file consists of records(called index entries) of the form

Search-key pointer

Two basic kinds of indices:

  • Ordered indices: search keys are stored in sorted order
  • Hash indices: search keys are distributed uniformly across "buckets" using a "hash function".

Index Evaluation Metrics

  • Access types supported efficiently.
    • Point query: records with a specified value in the attribute
    • Range query: or records with an attribute value falling in a specified range of values

支持点查以及范围查询。

  • Access time
  • Insertion time
  • Deletion time
  • Space overhead

Ordered Indices

  • Primary index(主索引): in a sequentially ordered file, the index whose search key specifies the sequential order of the file.
    • Also called clustering index(聚集索引)
    • The search key of a primary index is usually but not necessarily the primary key.
  • Secondary index(辅助索引): an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index
  • Index-sequential file(索引顺序文件): ordered sequential file with a primary index.

主索引只能有一个,其他都是辅助索引。

Example

Dense Index Files

Dense index(稠密索引) - index record appears for every search-key value in the file.

每一个 search-key 都要出现在文件里。

图一是以 ID 为主索引来搜索,图二是以 dept_name 为主索引来搜索。

Sparse Index Files

Sparse index(稀疏索引) - contains index records for only some search-key values.

Good tradeoff: sparse index with an index entry for every block in file, corresponding to least search-key value in the block.

Multilevel Index(多级索引)

对索引再次建立索引,类似 B+ 树。

B+ Tree Index Files

  • All paths from root to the leaf are of the same length
  • Inner node(not a root or a leaf) between \(\lceil n/2 \rceil\) and \(n\) children
  • Leaf node between \(\lceil (n-1)/2 \rceil\) and \(n-1\) values
  • Special cases:
    • If the root is not a leaf: at least 2 children
    • If the root is a leaf: between \(0\) and \(n-1\) values

数据库中B+ 树一般一个叶子结点就是一个块的大小,即 4KB.

Observations about B+ Trees

  • Since the inter-node connections are done by pointers, "logically" close blocks need not be "physically" close.
  • The non-leaf levels of the B+ tree form a hierarchy of sparse indices.
  • The B+ tree contains a relatively small number of levels
    • Level below root has at least \(2 * \lceil n/2 \rceil\) values
    • Next level has at least \(2 * \lceil n/2 \rceil * \lceil n/2 \rceil\) values
    • ..etc.

如果文件中有 \(K\) 个索引项,那么树的高度不会超过 \(\lceil \log_{\ceil n/2 \rceil}(K/2) \rceil + 1\)

B+ 树的插入,删除等操作这里就不展示了,可以在这个网站查看

B+ tree: height and size estimation

B+ tree File Organization

  • B+ tree File Organization:
    • Leaf nodes in a B+ tree file organization store records, instead of pointers.叶子节点不放指针,而是放记录本身。
    • Helps keep data records clustered even when there are insertions/deletions/updates.
  • Leaf nodes are still required to be half full
  • Insertion and deletion are handled in the same way as insertion and deletion of entries in a B+ tree index.

为了提高空间利用率,我们可以改变“半满”的条件,比如将“半满”定义为 \(\lfloor 2n/3 \rfloor\)

Other Issues in Indexing

  • Record relocation and secondary indices
    • If a record moves, all secondary indices that store record pointers have to be updated.
    • Node splits in B+ tree file organizations become very expensive.
    • Solution: use primary-index search key instead of record pointer in secondary index
  • Variable length strings as keys
    • Variable fanout
  • Prefix compression
    • Key values at internal nodes can be perfixes of full key
    • Keys in leaf node can be compressed by sharing common prefixes.

Bulk Loading and Bottom-Up Build

  • Efficient alternative 1: Insert in sorted order
    • Sort entries first(using efficient external-memory sort algorithms)
    • insert in sorted order
  • Efficient alternative 2: Bottom-up B+ tree construction
    • As before sort entries
    • And then create tree layer-by-layer, starting with leaf level
    • Implemented as part of bulk-utility by most database systems

Bulk insert index entries

Indices on Multiple Keys

  • Composite search keys are search keys containing more that one attribute.
  • Lexicographic ordering: \((a_1, a_2) < (b_1, b_2)\) if either \(a_1 < b_1\) or \(a_1 = b_1 \mathrm{and} a_2 < b_2\)

Indexing in Main Memory

  • Random access in memory
    • Much cheaper than on disk/flash, but still expensive compared to cache read
    • Binary search for a key value within a large B+ tree node results in many cache misses
    • Data structures that make best use of cache perferable - cache conscious
  • B+ trees with small nodes that fit in cache line are preferable to reduce cache misses
  • Key idea:
    • use large node size to optimize disk access
    • but structure data within a node using a tree with small node size, instead of using an array, to optimize cache access

Indexing on Flash

  • Random I/O cost much lower on flash
  • Writes are not in-place, and (eventually) require a more expensive erase
  • Optimum page size therefore much smaller
  • Bulk-loading still useful since it minimizes page erases
  • Write-optimized tree structures (i.e., LSM-tree) have been adapted to minimize page writes for flash-optimized search trees

Write Optimized Indices

  • Two approaches to reducing cost of writes
    • Log-structured merge tree (LSM-tree)
    • Buffer tree

Log Structured Merge Tree(LSM-tree)

  • Records inserted first into in-memory tree(\(L_0\) tree)
  • When in-memory tree is full, records moved to disk(\(L_1\) tree).如果内存中的树满了,就将内存中的树移动到磁盘上。
    • B+ tree constructed using bottom-up build by merging existing \(L_1\) tree with records from \(L_0\) tree
  • When \(L_1\) tree exceeds some threshold, merge into \(L_2\) tree
    • And so on for more levels
    • Sizethreshold for \(L_{i+1}\) tree is \(K\) times size threshold for \(L_i\) tree

Stepped Merge Index

  • Stepped-merge index: variant of LSM tree with k trees at each level on disk
    • When all \(k\) indices exist at a level, merge them into one index of next level.
    • Reduces write cost compared to LSM tree
  • But queries are even more expensive since many trees need to be queries
  • Optimization for point lookups
    • Compute Bloom filter for each tree and store in-memory
    • Query a tree only if Bloom filter returns a positive result

评论