Graduation Semester and Year

2022

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Song Jiang

Abstract

With the evolution of new technologies, such as edge computing, full self-driving, virtual reality, and multi-media streaming, the volume of data is growing at an accelerated speed. The global data volume could achieve 175~zettabytes by 2025. With this huge amount of data, the focus of data management has been shifted from traditional SQL databases to NoSQL databases, which provide higher performance and better scalability. Key-value (KV) stores are a common type of NoSQL database and are becoming a major storage infrastructure in various application domains. With the development of high-speed storage devices, such as NVMe SSD, Open-channel SSD, and non-volatile memory, new challenges and opportunities have appeared for KV stores. Traditional KV stores suffer from large write amplification and non-trivial indexing overhead on the high-speed storage devices. And the performance bottleneck gradually shifts from the storage side to the software side in modern database designs. In this dissertation, we propose solutions to overcome the obstacles in the KV store design. To support efficient indexing and range search, key-value items must be sorted. However, the sorting process can be excessively expensive. In the KV systems adopting the Log-Structured Merge Tree (LSM) structure, the write amplification can be very large due to its repeated internal merge-sorting operation. We propose WipDB, which leverages relatively stable key distributions to bound the write amplification at a small number. Meanwhile, to improve the efficiency for large-scale KV stores, we need persistent indexes on the Non-volatile memory (PMEM) to provide instant-recovery ability for the database. However, existing designs of the persistent index, especially the persistent hash table, always make trade-offs between performance, consistency, and persistency. To meet all three requirements, we propose TurboHash for building a high-performance KV store. Though PMEM provides persistency of data, compared to the DRAM, the performance of PMEM is still much worse than DRAM in terms of either latency and throughput (by around 3X or more). While using DRAM as a read cache and/or write-back buffer seems to be a plausible remedy to close the performance gap, traditional cache designs are not effective or even not functional for data of either weak temporal or spatial access locality and requiring persistency and crash consistency. To address these issue, we propose a framework, that turns an index structure designed for persistent memory into a much faster one with application-managed caching and buffering.

Keywords

Key-value store, Persistent memory, Storage system

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS