ORCID Identifier(s)

0009-0008-4019-8441

Graduation Semester and Year

Spring 2026

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Dr Ming Li

Second Advisor

Dr Faysal Hossain Shezan

Third Advisor

Prof Jimmie Bud Davis

Abstract

Video conferencing has become pervasive in daily life, with users often typing sensitive information while their webcam is active. Even when the keyboard and hands are not visible, subtle vibration-induced pixel displacements may be present in the captured video, potentially may exhibit patterns correlated with typing activity.

These signals are typically imperceptible to human observers, yet they may provide a basis for automated analysis.

This thesis focuses on the role of machine learning models in analyzing such vibration-induced visual signals. We employ a signal processing pipeline to extract compact vibration features from webcam video, represented as GFCC features, which serve as inputs for learning-based modeling. The feature extraction process is designed to provide consistent and structured representations suitable for model evaluation, rather than to optimize signal recovery.

Building on this representation, we conduct a systematic study of sequence learning architectures for modeling temporal dependencies in the extracted signals.

Specifically, we evaluate five models—Vanilla RNNs, GRUs, LSTMs, Seq2Seq with Attention, and Transformer Encoders—and compare their performance across multiple dimensions, including predictive accuracy, data efficiency under limited training conditions, and computational cost.

In addition, we investigate the robustness of these models under diverse real- world conditions, including variations in hardware configurations, environmental settings, and user typing behaviors. This analysis provides insight into how different architectures generalize across heterogeneous scenarios and how external factors influence learning performance.

Through comprehensive experiments, this thesis characterizes the strengths and limitations of different sequence learning approaches in this context, and provides practical guidance for selecting and designing models for learning-based analysis tasks involving sensitive data contexts.

Keywords

Optical Vibration Side Channel, Keystroke Inference, Sequence-to-sequence Learning, Attention Mechanism, Transformer Encoder, Recurrent Neural Network, LSTM, GRU, Rolling Shutter Temporal Sampling, Video Conference Security

Disciplines

Artificial Intelligence and Robotics | Cybersecurity | Graphics and Human Computer Interfaces | Information Security | Signal Processing

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Available for download on Friday, May 19, 2028

Share

COinS