Computer Science and Engineering Theses - Archive

Universal Sound Separation: Distance-Aware Mixture Simulation, Co-occurrence Conditioning, and Chain-of-Inference

Wonjun Park, University of Texas at ArlingtonFollow

ORCID Identifier(s)

ORCID 0009-0002-9284-8377

Graduation Semester and Year

Spring 2026

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Kenny Q. Zhu

Second Advisor

Vassilis Athitsos

Third Advisor

Shirin Nilizadeh

Abstract

Universal Sound Separation (USS) -- the task of disentangling arbitrary sound sources from a single-channel acoustic mixture -- remains an open challenge due to the ill-posed nature of the problem and the distributional gap between synthetic training data and real-world recordings. This thesis addresses three distinct bottlenecks in the USS pipeline: training data realism, inference strategy, and conditioning richness. We first present two knowledge-guided approaches to sound source separation. The first is a distance-aware mixing strategy that leverages Large Language Models (LLMs) to assign plausible loudness relationships between audio sources during training data synthesis. By querying an LLM about the natural acoustic distance between sound events, we generate Mixture of Mixtures (MoMs) that better approximate real-world acoustic scenes. Human evaluation shows that models trained with this strategy are preferred over randomly-trained baselines in up to 75% of comparisons on three real-world benchmark categories. The second is a co-occurrence conditioning framework that injects information about non-target sounds present in a mixture into the encoder of AudioSep via FiLM modulation, complementing the standard target conditioning. We propose a CLAP-based estimation procedure that approximates co-occurrence embeddings at inference time from only the mixture and the target text, matching the practical setting of USS; an exploratory evaluation shows improved separation of five of six USS benchmarks. We then introduce Chain-of-Inference (CoI), a training-free multi-step inference framework motivated by the human auditory system's sensitivity to sudden changes in the acoustic scene and structurally analogous to Chain-of-Thought prompting in language models. CoI iteratively re-introduces a proportion of the original mixture -- governed by cosine similarity between the current output and the input -- progressively decomposing the separation problem into easier sub-problems. Without any additional training, CoI consistently improves AudioSep across all five evaluated tasks and SAM-Audio on four of five. An interactive online demonstration system is released alongside this work, allowing users to experience the perceptual improvements on arbitrary audio. Taken together, these contributions show that USS performance can be improved from two distinct angles: incorporating external knowledge -- LLM commonsense priors and contrastive audio-text embeddings -- to improve training data and conditioning, and exploiting underutilised capacity already present in frozen models through principled inference-time refinement.

Keywords

Universal sound separation, Data synthesis, Chain-of-inference

Disciplines

Artificial Intelligence and Robotics | Signal Processing

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Park, Wonjun, "Universal Sound Separation: Distance-Aware Mixture Simulation, Co-occurrence Conditioning, and Chain-of-Inference" (2026). Computer Science and Engineering Theses - Archive. 542.
https://mavmatrix.uta.edu/cse_theses/542

Download

Included in

Artificial Intelligence and Robotics Commons, Signal Processing Commons

COinS

Computer Science and Engineering Theses - Archive

Universal Sound Separation: Distance-Aware Mixture Simulation, Co-occurrence Conditioning, and Chain-of-Inference

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Search

Browse

Author & Creator Corner

Links

Computer Science and Engineering Theses - Archive

Universal Sound Separation: Distance-Aware Mixture Simulation, Co-occurrence Conditioning, and Chain-of-Inference

Author

ORCID Identifier(s)

Graduation Semester and Year

Language

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Keywords

Disciplines

License

Recommended Citation

Included in

Share

Search

Browse

Author & Creator Corner

Links