An RGB-D Fusion System for Indoor Wheelchair Navigation

Christos Sevastopoulos
Sneh Acharya
Fillia Makedon

Abstract

We present a method for extracting high-level semantic information through successful landmark detection using feature fusion between RGB and depth information. We focus on the classification of specific labels (open path, humans, staircases, doorways, obstacles) in the encountered scene, which can be a fundamental source of information enhancing scene understanding, and acting towards the safe navigation of the mobile unit. Experiments are conducted using a manual wheelchair equipped with a stereo RGB-D camera that captures image instances consisting of multiple labels before fine-tuning on a pre-trained Vision Transformer (ViT).