Graduation Semester and Year
2017
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Leonidas Fegaras
Abstract
With explosive growth of data in past few years, discovering previously unknown, frequent patterns within the huge transactional data sets has been one of the most challenging and ventured fields in data mining. Apriori algorithm is widely used and one of the most researched field for frequent pattern mining. The exponential increase in the size of the input data has adverse effect on the efficiency of the traditional or centralized implementation of this algorithm. Thus, various distributed Frequent Itemset Mining(FIM) algorithms have been developed. MapReduce is a programming framework that allows the processing of large datasets with a distributed algorithm over a distributed cluster. During this research, We have implemented a parallel Apriori algorithm in Hadoop MapReduce framework with large volumes of input data and generate frequent patterns based on user defined parameters. We have implemented hash tree data structure to represent the candidate itemsets which aids in faster search for those candidates within a transaction. These experiments were conducted in real-life datasets and varying parameters. Based on various evaluations, the proposed algorithm turns out to be scalable and efficient method to generate frequent item-sets from a large dataset over a distributed network.
Keywords
mapreduce, apriori, parallel apriori
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Neupane, Gokarna, "A PARALLEL IMPLEMENTATION OF APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS IN HADOOP MAPREDUCE FRAMEWORK" (2017). Computer Science and Engineering Theses. 422.
https://mavmatrix.uta.edu/cse_theses/422
Comments
Degree granted by The University of Texas at Arlington