Document Type



Quality of Experience (QoE) assessment is a long-lasting but yet-tobe-resolved task. Existing approaches, especially for conversational voice services, are restricted to leveraging network-centric parameters. However, their performances are hardly satisfactory due to the failure to consider comprehensive QoE-related factors. Moreover, they develop a one-for-all model that is uniform for all individuals and thus incapable of handling user diversity in QoE perception. This paper proposes a personalized QoE assessment model, namely SpeechQoE. It exploits speaker’s speech signals to infer individual’s perceived quality in voice services. SpeechQoE fundamentally addresses the drawback of conventional models. Instead of enumerating and incorporating unlimited QoE-related factors, SpeechQoE takes as input speech signals that inherently bear rich information needed for QoE assessment of the speaker. SpeechQoE employs an efficient few-shot learning framework to adapt the model to a new user quickly. We additionally design a lightweight data synthetic scheme to minimize the overhead of data collection needed for model adaption. A modular integration with a conventional parametric model is further implemented to avoid issues caused by the clean-slate data-driven approach. Our experiments show that SpeechQoE achieves an accuracy of 91.4% in QoE assessment which outperforms the state-of-the-art solutions by a clear margin. As another contribution of this work, we build a dataset that would be the first source of annotated audio tracks for QoE assessment of conversational calls.

Publication Date





Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.