Graduation Semester and Year
Spring 2024
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Hong Jiang
Abstract
The increasing performance gap between computation and I/O creates huge data management challenges for simulation-based scientific discovery. Data reduction, among others, is deemed to be a promising technique to bridge the gap through reducing the amount of data migrated to persistent storage. However, the reduction performance is still far from what is being demanded from production applications. To this end, we propose a new methodology that aggressively reduces data despite the substantial loss of information, and re-computes the original accuracy on-demand. As a result, our scheme creates an illusion of a fast and large storage medium with the availability of high-accuracy data. We further design an adaptive load-aware data reduction strategy that monitors the I/O overhead at runtime, and dynamically adjusts the reduction ratio.
We verify the efficacy of data reduction and re-computation through adaptive mesh refinement, a popular numerical technique for solving partial differential equations. We evaluate data reduction and selective data re-computation on Titan, using a real application in FLASH and mini applications in Chombo. To clearly demonstrate the benefits of re-computation, we compare it with other state-of-the-art data reduction methods including SZ, ZFP, FPC and Deduplication, and it is shown to be superior in both write and read speeds, particularly when a small amount of data (e.g., 1%) need to be retrieved, as well as reduction ratio. Data reduction and re-computation can reach up to 6X compression ratio than lossy compressors and up to 16X compression ratio than lossless compressors. In addition, data reduction and re-computation can save around 6X of total time for the evaluation applications. Our results confirm that data reduction and selective data re-computation can 1) reduce the performance gap between I/O and compute via aggressively reducing AMR levels, and more importantly 2) can recover the target accuracy efficiently for AMR through re-computation.
We evaluate load-aware data reduction and adaptive data reduction, which is designed for optimizing performance of applications and HPC systems and utilization of resources on HPC platforms. We apply a machine learning method, LSTM to verify load-aware data reduction by prediction of I/O congestion based on the two-month I/O trace collected due to the repeatability of I/O activities on the HPC platform. In addition, we evaluate adaptive data reduction by sending probe packages and measuring the feedback write rate of them to monitor the complex I/O environment for real-time data reduction achieving the performance goal with sufficient utilization of resources. Our results demonstrate that adaptive load-aware data reduction can successfully mitigate the performance degradation of applications due to I/O congestion and optimize resources utilization via runtime reduction adjustment on HPC platforms.
Keywords
High performance computing, Data reduction, AMR
Disciplines
Computer and Systems Architecture | Data Storage Systems
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Wang, Mengxiao, "ADAPTIVE LOAD-AWARE ELASTIC DATA REDUCTION AND RE-COMPUTATION FOR ADAPTIVE MESH REFINEMENT" (2024). Computer Science and Engineering Theses. 7.
https://mavmatrix.uta.edu/cse_theses/7