Graduation Semester and Year




Document Type


Degree Name

Master of Science in Computer Science


Computer Science and Engineering

First Advisor

Leonidas Fegaras


Non-Negative matrix factorization is well-known complex machine learning algorithm which is also used in collaborative filtering. Collaborative filtering technique is used in recommendation systems and these techniques aim at predicting the missing values in user-item association matrix. User-item association matrix contains number of users as rows and number of movies as columns and the values are the ratings given by user to respective movies. These matrices have large dimensions, missing values and needs parallel processing. Map reduce query language (MRQL) is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache hadoop, spark, hama and flink. Large scale matrix operations require proper scaling and optimization in distributed systems. Therefore, In this work we are analyzing the performance of MRQL on complex matrix operations by using different sparse matrix datasets in spark mode. This work aims at performance analysis of Map Redce Query Language on complex matrix operations and ease of scalability of these operations. We have performed simple matrix operation like multiplication, division, addition, subtraction and also complex operation like factorization. Gaussian non negative matrix factorization and stochiastic gradient descent based matrix factorization are the two algorithms which are tested in spark and flink modes of MRQL with dataset of movie ratings. The performance analysis in the experiments will help readers to understand and analyze the performance of MRQL and also understand more about MRQL.


Matrix factorization, Map reduce query language, MRQL


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington