Graduation Semester and Year

Summer 2025

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Manfred Huber

Second Advisor

Farhad A. Kamangar

Third Advisor

David Levine

Abstract

This thesis explores the development of an answering agent capable of generating natural language instructions for unmanned aerial vehicles (UAVs), grounded in a limited, real-world dialogue dataset. The objective is to adapt a static dataset into a training pipeline that can support instruction generation and serve as a foundation for future interactive systems involving question-asking agents and internal dialogue. A hybrid architecture is implemented using a semantic teacher model (MPNet) and a T5-base encoder-decoder trained with contrastive and supervised objectives. The adapted training process yields statistically acceptable performance across standard evaluation metrics. However, qualitative analysis reveals a mismatch between metric performance and semantic coherence in generated outputs: the instructions are grammatically fluent and UAV-relevant, but frequently lack actionable intent or spatial grounding. This research provides a critical diagnostic view into the limits of small-scale instruction generation, emphasizing the challenges of grounding, dataset sparsity, and evaluation misalignment. The contributions include a dataset adaptation pipeline, insights into feature-semantic alignment, and a foundation for more robust instruction-generation systems that integrate both contrastive learning and behavioral evaluation.

Keywords

Vision-and-Language Navigation (VLN), Aerial Vision-and-Dialog Navigation (AVDN) dataset, Spatial grounding and landmark-based instructions, Contrastive learning (InfoNCE) and hard negative mining, Knowledge distillation (MPNet teacher), T5 encoder–decoder for instruction generation, Cross-modal fusion and temporal attention, YOLOv3 Darknet visual features, Evaluation metrics misalignment, Navigation-integrated training and reinforcement learning

Disciplines

Cognitive Science | Other Computer Engineering | Robotics

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.