Graduation Semester and Year

2020

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Manfred Huber

Abstract

The recognition of activities from video is a capability that is important for a wide range of applications, ranging from basic scene understanding to the successful prediction of behavior in autonomous vehicle applications. At this time, human capabilities in this task by far outperform computer applications and thus the idea to mimic human perception should be promising. In this thesis we are proposing an architecture that processes videos to extract important action instances that describe the essential behaviors contained in any video and help us map the information from the video to a machine-understandable form. This is an important research area, as it could help us interpret the surrounding environment for the visually impaired, detect and characterize human behavior for autonomous vehicles, as well as enhance security at some of the most vulnerable places by identifying suspicious behavior. All of this illustrates the vast range of possibilities to this technology. The architecture proposed here is divided into three major sub-modules, namely: i) Localization; ii) Action Detection; iii) Description mapping. In this thesis, all the submodules are introduced and their interaction and operation is described before the action detection module is implemented and its performance is demonstrated. In addition, the thesis will describe how we could use transfer learning to combine all the proposed specialized components to mimic human perception.

Keywords

Human perception, Activity recognition

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS