Sandesh Koirala

Document Type

Honors Thesis


Commercial document processing is very expensive and prone to human errors. Despite the presence of various machine learning algorithms for object classification, their performance and feasibility can vary widely based on their implementation and use case. This paper encompasses the performance evaluation of various classification algorithms for use in an automated electronic document classification system. The subject algorithms were used to classify about 1000 vectorized documents in an iterative environment. Various performance measures such as precision, recall, and F-measure were used to evaluate these algorithms. It was found that most algorithms obtained more than 95% accuracy. However, Logistic Regression was chosen as a final model because of the consistent overall performance of more than 95% precision.

Publication Date






To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.