Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Fillia Makedon


Generally speaking, traversability estimation illustrates the ability to navigate or move through a particular environment (indoors or outdoors). Indoor environments are governed by uncertainty and stochasticity arising from their complex structures encapsulating both static elements like furniture and walls, as well as entities such as moving humans. In our research, we underline the importance of blending semantic and spatial information for ensuring secure navigation for a mobile robot. We show that RGB sensors suffer from constrained situational awareness of the surroundings, thus highlighting the need to incorporate spatial and geometric data, which can collaborate synergistically to enhance overall perception and safety. Towards this direction, we examine indoors traversability estimation both on higher-level (GO/NO-GO decision) and also at a lower-level by identifying free-space zones that the robot can safely traverse. We combine visual data (RGB) and Laser Range Finder (LRF) information both for annotating our dataset but also for enhancing the prediction compared to exclusive reliance on RGB information. In the core of our experiments, we use Transformer-based architectures~\cite{b86} due to 1) their efficacy in capturing spatial dependencies and sequences of varying lengths, which are common in indoor environments where objects are positioned in relation to each other 2) their notable transfer learning potential, since we are fine-tuning on our custom collected dataset and we need rich pre-trained features from a large scale dataset 3) their significant ability to handle multi-modal input sequences, since we are using different modalities. We investigate the efficiency of employing a Multi-Head Self-Attention module as a fusion mechanism, leveraging its capability to assign varying weights across the input sequence. Ultimately, in order to estimate free-space, we employed a methodology predicated on the assumption that, larger depth values correspond to regions that the robot can safely traverse. Specifically, we implemented an efficient automated masking technique that leverages textural homogeneity, depth uniformity, and positive scenes to create meaningful segments before fine-tuning on our dataset. Applications of this work can be found in the following domains: 1) Navigation of autonomous agents or Mobility-impaired subjects 2) Safety in confined spaces such as warehouse/vineyard patrol robots 3) Search & Rescue applications.


Computer vision, Mobile robots


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington