ORCID Identifier(s)

0009-0006-6155-4527

Graduation Semester and Year

Spring 2026

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Song Jiang

Second Advisor

Jia Rao

Third Advisor

Mohammad Atiqul Islam

Fourth Advisor

Dianqi Han

Abstract

The exponential growth of data in modern computing environments has rendered the efficient extraction of information from massive datasets a critical systemic requirement. Key-value (KV) storage systems serve as the backbone for these operations; however, their performance is consistently bottlenecked by two primary functional requirements: identifying the data's location and managing the physical cost of accessing the storage device. Data locations are typically identified via an index, while disk I/O is minimized through caching. This dissertation presents LearnedStore, TurboIndex, and ReadBooster, which break these performance bottlenecks by introducing architectural modifications to the index and cache. LearnedStore accelerates operations by adapting the Learned Index to jump directly to the leaf node. Utilizing machine learning models to predict the physical location of the leaf node significantly increases search throughput while maintaining block-device-friendly, tree-based systems. TurboIndex and ReadBooster target the inefficiencies inherent in the disk-to-memory transition. Since disks operate as block devices, existing KV stores typically utilize page-based caches to bridge the gap between block-addressable storage and byte-addressable memory. However, we demonstrate that page-level granularity often results in sub-optimal memory utilization by caching "cold" data adjacent to "hot" records, and increases disk I/O when writing to a cold page. TurboIndex and ReadBooster propose a sophisticated solution to this memory-efficiency problem through a dual-granularity caching architecture. TurboIndex accumulates insertions on cold pages to reduce disk I/O, while ReadBooster minimizes I/O by caching specific hot keys from evicted pages. Experimental results indicate that this unified approach substantially increases system throughput and reduces I/O, providing a scalable framework for next-generation, high-performance database systems.

Keywords

Index, learned index, cache, database, storage system, key value storage system, page cache, record cache, hybrid cache

Disciplines

Data Storage Systems

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.