ORCID Identifier(s)

0000-0003-1256-0454

Graduation Semester and Year

2017

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Leonidas Fegaras

Abstract

With explosive growth of data in past few years, discovering previously unknown, frequent patterns within the huge transactional data sets has been one of the most challenging and ventured fields in data mining. Apriori algorithm is widely used and one of the most researched field for frequent pattern mining. The exponential increase in the size of the input data has adverse effect on the efficiency of the traditional or centralized implementation of this algorithm. Thus, various distributed Frequent Itemset Mining(FIM) algorithms have been developed. MapReduce is a programming framework that allows the processing of large datasets with a distributed algorithm over a distributed cluster. During this research, We have implemented a parallel Apriori algorithm in Hadoop MapReduce framework with large volumes of input data and generate frequent patterns based on user defined parameters. We have implemented hash tree data structure to represent the candidate itemsets which aids in faster search for those candidates within a transaction. These experiments were conducted in real-life datasets and varying parameters. Based on various evaluations, the proposed algorithm turns out to be scalable and efficient method to generate frequent item-sets from a large dataset over a distributed network.

Keywords

mapreduce, apriori, parallel apriori

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS