Document Type



Molecular biology prediction tasks suffer the limited labeled data problem since it normally demands a series of professional experiments to label the target molecule. Self-training is one of the semi-supervised learning paradigms that utilizes both labeled and unlabeled data. It trains a teacher model on labeled data, and uses it to generate pseudo labels for unlabeled data. The labeled and pseudo-labeled data are then combined to train a student model. However, the pseudo labels generated from the teacher model are not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels, which is model and task agnostic, and can be easily embedded with any prediction tasks. We have conducted molecular biology prediction tasks to gradually evaluate the performance of proposed robust self-training strategy. The results demonstrate that the proposed method consistently boosts the prediction performance, especially for molecular regression tasks, which have gained a 41.5% average improvement.

Publication Date





Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.