Computer Science and Engineering Theses

EXPLORING INSTRUCTION GENERATION FOR UAVS: DATASET ADAPTATION, MODEL BEHAVIOR, AND DIAGNOSTIC INSIGHTS

Seyedarman Vaziri Bozorg, University of Texas at ArlingtonFollow

Graduation Semester and Year

Summer 2025

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Manfred Huber

Second Advisor

Farhad A. Kamangar

Third Advisor

David Levine

Abstract

This thesis explores the development of an answering agent capable of generating natural language instructions for unmanned aerial vehicles (UAVs), grounded in a limited, real-world dialogue dataset. The objective is to adapt a static dataset into a training pipeline that can support instruction generation and serve as a foundation for future interactive systems involving question-asking agents and internal dialogue. A hybrid architecture is implemented using a semantic teacher model (MPNet) and a T5-base encoder-decoder trained with contrastive and supervised objectives. The adapted training process yields statistically acceptable performance across standard evaluation metrics. However, qualitative analysis reveals a mismatch between metric performance and semantic coherence in generated outputs: the instructions are grammatically fluent and UAV-relevant, but frequently lack actionable intent or spatial grounding. This research provides a critical diagnostic view into the limits of small-scale instruction generation, emphasizing the challenges of grounding, dataset sparsity, and evaluation misalignment. The contributions include a dataset adaptation pipeline, insights into feature-semantic alignment, and a foundation for more robust instruction-generation systems that integrate both contrastive learning and behavioral evaluation.

Keywords

Vision-and-Language Navigation (VLN), Aerial Vision-and-Dialog Navigation (AVDN) dataset, Spatial grounding and landmark-based instructions, Contrastive learning (InfoNCE) and hard negative mining, Knowledge distillation (MPNet teacher), T5 encoder–decoder for instruction generation, Cross-modal fusion and temporal attention, YOLOv3 Darknet visual features, Evaluation metrics misalignment, Navigation-integrated training and reinforcement learning

Disciplines

Cognitive Science | Other Computer Engineering | Robotics

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Recommended Citation

Vaziri Bozorg, Seyedarman, "EXPLORING INSTRUCTION GENERATION FOR UAVS: DATASET ADAPTATION, MODEL BEHAVIOR, AND DIAGNOSTIC INSIGHTS" (2025). Computer Science and Engineering Theses. 532.
https://mavmatrix.uta.edu/cse_theses/532

Download

Included in

Cognitive Science Commons, Other Computer Engineering Commons, Robotics Commons

COinS

Computer Science and Engineering Theses

EXPLORING INSTRUCTION GENERATION FOR UAVS: DATASET ADAPTATION, MODEL BEHAVIOR, AND DIAGNOSTIC INSIGHTS

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Search

Browse

Author & Creator Corner

Links

Computer Science and Engineering Theses

EXPLORING INSTRUCTION GENERATION FOR UAVS: DATASET ADAPTATION, MODEL BEHAVIOR, AND DIAGNOSTIC INSIGHTS

Author

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Share

Search

Browse

Author & Creator Corner

Links