Graduation Semester and Year

2016

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Matthew Wright

Abstract

The ability to speak freely has always been a source of conflict between rulers and the people over which they exert power. This conflict usually takes the form of State-sponsored censorship with occasional instances of commercial efforts typically to silence criticism or squelch dissent, and people's efforts to evade such censorship. This is even more so evident in the current environment with its ever-growing number of communication technologies and platforms available to individuals around the world. If the face of efforts to control communication before it is posted or to prevent the discovery of information that exists outside of the control of the authorities, users attempt to slip their messages past the censor's gaze by using keyword replacement. These methods are effective but only as long as those synonyms are not identified. Once the new usage is discovered it is a simple matter to add the new term to the list of black-listed words. While various methods can be used to create mappings between blocked words and their replacements, the difficulty is doing so in a way that makes it clear to a human reader how to perform the mapping in reverse while maintaining readability but without attracting undue attention from systems enforcing the censor's rules and policies. One technique, presented in a related article, considers the use of HTML tags as way to provide a such a replacement method. By using HTML tags related to how text is displayed on the page in can both indicate that the replacement is happening and also provide a legend for mapping the term in the page to one intended by the author. It is a given that a human reader will easily detect this scheme. If a malicious reader is shown the page generated using this method the attempt at evading the censor's rules will be obvious. A potential weakness in this approach is if the tool that generates the replacement uses a small set of HTML tags to effect the censorship evasion but in doing so changes the frequency of those tags appear on the page so that the page stands out and can be flagged by software algorithms for human examination. In this paper we examine the feasibility of using tag frequency as a way to distinguish blog posts needing more attention, examining the means of data collection, the scale of processing required, and the quality of the resulting analysis for detecting deviation from average tag-usage patterns of pages.

Keywords

Censorship, Anti-censorship, Censorship evasion, Statistics, Patterns

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

26396-2.zip (304 kB)

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.