Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Hao Che


The workflows of the predominant user-facing datacenter services, including web searching and social networking, are underlaid by various Fork-Join structures. Due to the lack of understanding the performance of Fork-Join structures in general, today’s datacenters often resort to resource overprovisioning, operating under low resource utilization, to meet stringent tail-latency service level objectives (SLOs) for such services. Hence, to achieve high resource utilization, while meeting stringent tail-latency SLOs, it is of paramount importance to be able to accurately predict the tail latency for a broad range of Fork-Join structures of practical interests. In this dissertation, we propose and conduct a comprehensive study of a model for tail latency prediction for a wide range of fork-join structures of practical interests. The idea is to introduce a detailed examination of two main approaches that can be applied to our prediction model: 1) a black-box approach that covers a wide range of fork-join structures. And 2) a white-box approach for a wide range of fork-join queuing models. Our extensive testing results based on model-based and trace-driven simulations demonstrate that the model can consistently predict the tail latency within 20% and 15% prediction errors at 80% and 90% load levels, respectively. The experimental results confirmed the effectiveness of the prediction model in predicting tail latency at high load regions, making the model a valuable tool for resource provisioning and supporting scheduling decisions in datacenter clusters for guaranteeing users satisfaction. Finally, the model is applied to the prediction of achievable resource utilization under any given tail-latency SLO. We derive a cluster load bound for a particular class of OLDI services. And for a simple Fork-Join queuing network model of M/M/1 Fork nodes, the derivation helps characterize the performance upper bound of such services, assuming that all the jobs have the same fanout degree. In general, due to the queuing effect, the tail latency is difficult to contain without throwing a significant amount of cluster resources to the problem, which would result in low cluster resource utilization.


Tail latency, Prediction


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington