Graduation Semester and Year
2010
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Chengkai Li
Abstract
In Wikipedia, each article represents an entity. Entity can have different types like person, country, school, science etc. Although Wikipedia encapsulates category information for each page, sometimes it is not sufficient to deduce the type of a page just from its categories. But, incorporating the clear type information in a Wikipedia page is very important for the users, as it will help them to explore the pages in more organized way. Hence, in my thesis, we explore different standard classification techniques, mainly Naïve Bayes and Support Vector Machines and experiment how these techniques can be made more effective for typifying Wikipedia articles by using different feature selection methods. We proposed a method where Wikipedia categories are used as features. Moreover, we combine features to build a meta classifier which outperforms the other standard methods. To compare our methods we calculate the accuracy of different methods and used well known data mining tool "WEKA".
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Hasan, Quazi Mainul, "Typifying Wikipedia Articles" (2010). Computer Science and Engineering Theses. 353.
https://mavmatrix.uta.edu/cse_theses/353
Comments
Degree granted by The University of Texas at Arlington