Graduation Semester and Year




Document Type


Degree Name

Master of Science in Computer Science


Computer Science and Engineering

First Advisor

Manfred Huber


The recognition of activities from video is a capability that is important for a wide range of applications, ranging from basic scene understanding to the successful prediction of behavior in autonomous vehicle applications. At this time, human capabilities in this task by far outperform computer applications and thus the idea to mimic human perception should be promising. In this thesis we are proposing an architecture that processes videos to extract important action instances that describe the essential behaviors contained in any video and help us map the information from the video to a machine-understandable form. This is an important research area, as it could help us interpret the surrounding environment for the visually impaired, detect and characterize human behavior for autonomous vehicles, as well as enhance security at some of the most vulnerable places by identifying suspicious behavior. All of this illustrates the vast range of possibilities to this technology. The architecture proposed here is divided into three major sub-modules, namely: i) Localization; ii) Action Detection; iii) Description mapping. In this thesis, all the submodules are introduced and their interaction and operation is described before the action detection module is implemented and its performance is demonstrated. In addition, the thesis will describe how we could use transfer learning to combine all the proposed specialized components to mimic human perception.


Human perception, Activity recognition


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington