An RGB-D Fusion System for Indoor Wheelchair Navigation
Abstract
We present a method for extracting high-level semantic information through successful landmark detection using feature fusion between RGB and depth information. We focus on the classification of specific labels (open path, humans, staircases, doorways, obstacles) in the encountered scene, which can be a fundamental source of information enhancing scene understanding, and acting towards the safe navigation of the mobile unit. Experiments are conducted using a manual wheelchair equipped with a stereo RGB-D camera that captures image instances consisting of multiple labels before fine-tuning on a pre-trained Vision Transformer (ViT).