Graduation Semester and Year
Summer 2025
Language
English
Document Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science and Engineering
First Advisor
Manfred Huber
Second Advisor
Farhad A. Kamangar
Third Advisor
David Levine
Abstract
This thesis explores the development of an answering agent capable of generating natural language instructions for unmanned aerial vehicles (UAVs), grounded in a limited, real-world dialogue dataset. The objective is to adapt a static dataset into a training pipeline that can support instruction generation and serve as a foundation for future interactive systems involving question-asking agents and internal dialogue. A hybrid architecture is implemented using a semantic teacher model (MPNet) and a T5-base encoder-decoder trained with contrastive and supervised objectives. The adapted training process yields statistically acceptable performance across standard evaluation metrics. However, qualitative analysis reveals a mismatch between metric performance and semantic coherence in generated outputs: the instructions are grammatically fluent and UAV-relevant, but frequently lack actionable intent or spatial grounding. This research provides a critical diagnostic view into the limits of small-scale instruction generation, emphasizing the challenges of grounding, dataset sparsity, and evaluation misalignment. The contributions include a dataset adaptation pipeline, insights into feature-semantic alignment, and a foundation for more robust instruction-generation systems that integrate both contrastive learning and behavioral evaluation.
Keywords
Vision-and-Language Navigation (VLN), Aerial Vision-and-Dialog Navigation (AVDN) dataset, Spatial grounding and landmark-based instructions, Contrastive learning (InfoNCE) and hard negative mining, Knowledge distillation (MPNet teacher), T5 encoder–decoder for instruction generation, Cross-modal fusion and temporal attention, YOLOv3 Darknet visual features, Evaluation metrics misalignment, Navigation-integrated training and reinforcement learning
Disciplines
Cognitive Science | Other Computer Engineering | Robotics
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Recommended Citation
Vaziri Bozorg, Seyedarman, "EXPLORING INSTRUCTION GENERATION FOR UAVS: DATASET ADAPTATION, MODEL BEHAVIOR, AND DIAGNOSTIC INSIGHTS" (2025). Computer Science and Engineering Theses. 532.
https://mavmatrix.uta.edu/cse_theses/532