ORCID Identifier(s)

0000-0002-0837-8388

Graduation Semester and Year

2023

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Christoph Csallner

Abstract

In several safety-critical industries such as automotive, aerospace, healthcare, and industrial automation, MATLAB/Simulink has emerged as the de-facto standard tool for system modeling and analysis, model compilation into executable code, and code deployment onto embedded hardware. Within the context of cyber-physical system (CPS) development, it is imperative to both rigorously test the development tools, such as MathWorks’ Simulink, and understand modeling practices and model evolution. The existing body of work faces limitations primarily stemming from two factors: (1) contemporary testing methodologies often prove inefficient in identifying critical toolchain bugs due to a paucity of explicit toolchain specifications and (2) there exists a pronounced scarcity of a reusable and publicly available corpus of Simulink models for research. In response to these challenges, we first pioneered the use of language models for random Simulink model generation by both training and fine-tuning (large) language models such as LSTM and GPT-2 on sample Simulink models. Second, we meticulously curated the largest collection of Simulink models: SLNET, which is redistributable and contains detailed metadata. In addition, to encourage research on Simulink model evolution, we have curated EvoSL, a dataset of 900+ Simulink projects that has over 140k commits. Leveraging these datasets, we have systematically replicated previous studies, corroborating and/or refuting prior findings. As a further aid to the research community, we have developed ScoutSL, an open-source search engine for Simulink models. This tool simplifies the process of sampling Simulink projects from open-source domains, addressing the limitations of popular code hosting platforms that lack Simulink-specific filtering attributes. ScoutSL has already indexed over 100k Simulink models sourced from 18k projects.

Keywords

Cyber-physical system development, Simulink, Tool chain bugs, Deep learning, Programming language modeling, GPT-2, Mining software repositories, Open-source, Model evolution

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS