Graduation Semester and Year
2019
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Song Jiang
Abstract
The amount of data being produced and consumed is increasing every day. As a result, there can be a large amount of redundant data in the storage system. Storing and accessing these duplicate data unnecessarily consumes disk space and I/O bandwidth. Deduplication techniques are widely deployed to remove the redundancy. In particular, the deduplication solutions that work at the block level are proven to be effective. These solutions aim to effectively use disk space and write bandwidth by avoiding duplicate data writes to the storage. However, such a design might not help in improving the read performance, which is critical for many modern-day applications. The Linux kernel implements an in-memory cache of pages, called the page cache, to improve I/O performance by minimizing disk accesses. The page cache has pages originating from regular file systems, and it is indexed by a file and the offset within the file. However, due to such a design, deduplication information is currently not available to the page cache. Due to this, the kernel cannot avoid read requests from going to the disk on offsets that are not present in the page cache, even though the requested data duplicates another offset that is already cached. Consequently, the overall I/O performance of the applications running on these systems can be compromised. To address this issue, we propose a lightweight scheme called Dual-Dedup, that efficiently coordinates the deduplication information with the page cache. It discloses the redundancy knowledge detected by the block-level deduplication layer to the page cache, which can then prevent unnecessary read requests. Results from extensive experiments show that Dual-Dedup significantly improves read performance. On FIO tests with 25% duplicate data, our system shows an improvement of 34% in the read throughput when compared with Linux EXT4.
Keywords
Page cache, Storage system, Operating system, Performance, Linux kernel
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Boggavarapu, Venkata Satya Ravi Kiran, "DEDUPLICATION-AWARE PAGE CACHE IN LINUX KERNEL FOR IMPROVED READ PERFORMANCE" (2019). Computer Science and Engineering Theses. 436.
https://mavmatrix.uta.edu/cse_theses/436
Comments
Degree granted by The University of Texas at Arlington