ORCID Identifier(s)

0009-0008-4019-8441

Graduation Semester and Year

Spring 2026

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Dr Ming Li

Second Advisor

Dr Faysal Hossain Shezan

Third Advisor

Prof Jimmie Bud Davis

Abstract

Video conferencing has become pervasive in daily life, with users often typing sensitive information while their webcam is active. Even when the keyboard and hands are not visible, subtle vibration-induced pixel displacements may be present in the captured video, potentially may exhibit patterns correlated with typing activity.

These signals are typically imperceptible to human observers, yet they may provide a basis for automated analysis.

This thesis focuses on the role of machine learning models in analyzing such vibration-induced visual signals. We employ a signal processing pipeline to extract compact vibration features from webcam video, represented as GFCC features, which serve as inputs for learning-based modeling. The feature extraction process is designed to provide consistent and structured representations suitable for model evaluation, rather than to optimize signal recovery.

Building on this representation, we conduct a systematic study of sequence learning architectures for modeling temporal dependencies in the extracted signals.

Specifically, we evaluate five models—Vanilla RNNs, GRUs, LSTMs, Seq2Seq with Attention, and Transformer Encoders—and compare their performance across multiple dimensions, including predictive accuracy, data efficiency under limited training conditions, and computational cost.

In addition, we investigate the robustness of these models under diverse real- world conditions, including variations in hardware configurations, environmental settings, and user typing behaviors. This analysis provides insight into how different architectures generalize across heterogeneous scenarios and how external factors influence learning performance.

Through comprehensive experiments, this thesis characterizes the strengths and limitations of different sequence learning approaches in this context, and provides practical guidance for selecting and designing models for learning-based analysis tasks involving sensitive data contexts.

Keywords

Optical Vibration Side Channel, Keystroke Inference, Sequence-to-sequence Learning, Attention Mechanism, Transformer Encoder, Recurrent Neural Network, LSTM, GRU, Rolling Shutter Temporal Sampling, Video Conference Security

Disciplines

Artificial Intelligence and Robotics | Cybersecurity | Graphics and Human Computer Interfaces | Information Security | Signal Processing

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Badgujar, Sanket Suresh, "ML-Based Keystroke Recovery via Optical-Vibration Side Channels in Video Conferencing" (2026). Computer Science and Engineering Theses. 4.
https://mavmatrix.uta.edu/cse_theses2/4

Download

Available for download on Friday, May 19, 2028

Included in

Artificial Intelligence and Robotics Commons, Cybersecurity Commons, Graphics and Human Computer Interfaces Commons, Information Security Commons, Signal Processing Commons

COinS

Computer Science and Engineering Theses

ML-Based Keystroke Recovery via Optical-Vibration Side Channels in Video Conferencing

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Search

Browse

Author & Creator Corner

Computer Science and Engineering Theses

ML-Based Keystroke Recovery via Optical-Vibration Side Channels in Video Conferencing

Author

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Share

Search

Browse

Author & Creator Corner