Author

Ranjan Dash

Graduation Semester and Year

2012

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Leonidas Fegaras

Abstract

In recent years, we have witnessed the emergence of new types of systems that deal with large volumes of streaming data. Examples include nancial data analy- sis on feeds of stock tickers, sensor-based environmental monitoring, network tra c monitoring and click stream analysis to push customized advertisements or intrusion detection. Traditional database management systems (DBMS), which are very good at managing large volumes of stored data, fall short of serving this new class of appli- cations, which require low-latency processing on live data from push-based sources. Data Stream Management Systems (DSMS) are fast emerging to address this new type of data and processing requirements. A common but challenging issue in DSMS, is to deal with unpredictable data arrival rate. Data arrival may be fast and bursty at times that surpass available system capability to handle. When input rates exceed system capacity, the Quality of Service (QoS) of system outputs falls below the acceptable levels. The problem of system overloading is more acute in XML data streams than its counterpart in rela- tional streams, as XML streams have to spend extra resources on input processing and result construction. The main focus of this thesis is to nd out suitable ways to process this high volume of data streams dealing with the spikes in data arrival gracefully, under limited or xed system resources in the XML stream context. One established method is to shed load by selectively dropping tuples under these condi- tions. This method helps to improve the observed latency of the results but degrades the answer quality. In this dissertation, we rst de ne the QoS in the context of XML stream pro- cessing and then various mechanisms to improve the QoS, specially the method of load shedding. We provide a general solution framework for implementing Load Shed- ding using Synopses, while minimizing the loss in result accuracy. Then, we present speci c situations where issue of QoS is very critical, such as cases of aggregation and join queries. Finally, we provide techniques to handle load shedding in these cases that provide high QoS in the XML data stream systems. In the nal part of this thesis, we investigate issue of processing aggregation (group-by) join queries on data streams that provide exact results and we extend our solutions to address some of the OLAP issues in data streams.

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS