Graduation Semester and Year
Spring 2026
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Song Jiang
Second Advisor
Jia Rao
Third Advisor
Mohammad Atiqul Islam
Fourth Advisor
Dianqi Han
Abstract
The exponential growth of data in modern computing environments has rendered the efficient extraction of information from massive datasets a critical systemic requirement. Key-value (KV) storage systems serve as the backbone for these operations; however, their performance is consistently bottlenecked by two primary functional requirements: identifying the data's location and managing the physical cost of accessing the storage device. Data locations are typically identified via an index, while disk I/O is minimized through caching. This dissertation presents LearnedStore, TurboIndex, and ReadBooster, which break these performance bottlenecks by introducing architectural modifications to the index and cache. LearnedStore accelerates operations by adapting the Learned Index to jump directly to the leaf node. Utilizing machine learning models to predict the physical location of the leaf node significantly increases search throughput while maintaining block-device-friendly, tree-based systems. TurboIndex and ReadBooster target the inefficiencies inherent in the disk-to-memory transition. Since disks operate as block devices, existing KV stores typically utilize page-based caches to bridge the gap between block-addressable storage and byte-addressable memory. However, we demonstrate that page-level granularity often results in sub-optimal memory utilization by caching "cold" data adjacent to "hot" records, and increases disk I/O when writing to a cold page. TurboIndex and ReadBooster propose a sophisticated solution to this memory-efficiency problem through a dual-granularity caching architecture. TurboIndex accumulates insertions on cold pages to reduce disk I/O, while ReadBooster minimizes I/O by caching specific hot keys from evicted pages. Experimental results indicate that this unified approach substantially increases system throughput and reduces I/O, providing a scalable framework for next-generation, high-performance database systems.
Keywords
Index, learned index, cache, database, storage system, key value storage system, page cache, record cache, hybrid cache
Disciplines
Data Storage Systems
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Maharjan, Sujit, "Enhancing the performance of disk-based key-value stores: From learned index acceleration to I/O-efficient hybrid caching" (2026). Computer Science and Engineering Dissertations - Archive. 435.
https://mavmatrix.uta.edu/cse_dissertations/435
revision