Author

Yingsen Mao

ORCID Identifier(s)

0000-0002-1962-0598

Graduation Semester and Year

2015

Language

English

Document Type

Thesis

Degree Name

Master of Science in Information Systems

Department

Information Systems and Operations Management

First Advisor

Riyaz Sikora

Abstract

Exploratory data analysis (EDA) refers to an iterative process through which analysts constantly ‘ask questions’ and extract knowledge from data. EDA is becoming more and more important for modern data analysis, such as business analytics and business intelligence, as it greatly relaxes the statistical assumption required by its counterpart—confirmation data analysis (CDA), and involves analysts directly in the data mining process. However, exploratory visual analysis, as the central part of EDA, requires heavy data manipulations and tedious visual specifications, which might impede the EDA process if the analyst has no guidelines to follow. In this paper, we present a framework of visual data exploration in terms of the type of variable given, using the effectiveness and expressiveness rules of visual encoding design developed by Munzner [1] as guidelines, in order to facilitate the EDA process. A classification problem of the Titanic data is also provided to demonstrate how the visual exploratory analysis facilitates the data mining process by increasing the accuracy rate of prediction. In addition, we classify prevailing data visualization technologies, including the layered grammar of ggplot2 [2], the VizQL of Tableau [3], d3 [4] and Shiny [5], as grammar-based and web-based, and review their adaptability for EDA, as EDA is discovery-oriented and analysts must be able to quickly change both what they are viewing and how they are viewing the data.

Keywords

Exploratory data analysis, Data visualization, Data mining

Disciplines

Business | Management Information Systems

Share

COinS