Document Type

Article

Abstract

We present a method for extracting high-level semantic information through successful landmark detection using feature fusion between RGB and depth information. We focus on the classification of specific labels (open path, humans, staircases, doorways, obstacles) in the encountered scene, which can be a fundamental source of information enhancing scene understanding, and acting towards the safe navigation of the mobile unit. Experiments are conducted using a manual wheelchair equipped with a stereo RGB-D camera that captures image instances consisting of multiple labels before fine-tuning on a pre-trained Vision Transformer (ViT).

Publication Date

7-1-2023

Language

English

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS