Graduation Semester and Year

2010

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Abstract

In Wikipedia, each article represents an entity. Entity can have different types like person, country, school, science etc. Although Wikipedia encapsulates category information for each page, sometimes it is not sufficient to deduce the type of a page just from its categories. But, incorporating the clear type information in a Wikipedia page is very important for the users, as it will help them to explore the pages in more organized way. Hence, in my thesis, we explore different standard classification techniques, mainly Naïve Bayes and Support Vector Machines and experiment how these techniques can be made more effective for typifying Wikipedia articles by using different feature selection methods. We proposed a method where Wikipedia categories are used as features. Moreover, we combine features to build a meta classifier which outperforms the other standard methods. To compare our methods we calculate the accuracy of different methods and used well known data mining tool "WEKA".

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS