Author

Fadiah Qudah

ORCID Identifier(s)

0000-0002-3346-3795

Graduation Semester and Year

2019

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Vassilis Athitsos

Abstract

When extracting meaning from language, a common first step is to break down language into constituents, or words that work together as a unit. This task, known as parsing, typically follows a specific grammar in order decompose the language into its underlying structure composed of constituents. Difficulties with this grammar-based parsing occur, however, with real-world natural language due to its unstructured nature. Code-switching, the phenomenon of alternating between languages while communicating, further complicates this task by requiring us to parse based on two (or more) languages instead of one. In this thesis, a data-driven method to parse code-switched language into its constituents is presented. The code- switched language used in this thesis is Taglish, comprised of English and Tagalog, and the data is collected from the social media site Twitter.

Keywords

Language

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

28861-2.zip (841 kB)

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.