Document Type

Article

Source Publication Title

BCB 2022

Abstract

Obtaining informative representations of gene expression is crucial in predicting various downstream regulatory-related tasks such as promoter prediction and transcription factor binding sites prediction. Nevertheless, current supervised learning with insufficient labeled genomes limits the generalization capability of training a robust predictive model. Recently researchers model DNA sequences by self-supervised training and transfer the pre-trained genome representations to various downstream tasks. Instead of directly shifting the mask language learning to DNA sequence learning, we incorporate prior knowledge into genome language modeling representations. We propose a novel Motif-oriented DNA (MoDNA) pre-training framework, which is designed self-supervised and can be fine-tuned for different downstream tasks MoDNA effectively learns the semantic level genome representations from enormous unlabelled genome data, and is more computationally efficient than previous methods. We pre-train MoDNA on human genome data and fine-tune it on downstream tasks. Extensive experimental results on promoter prediction and transcription factor binding sites prediction demonstrate the state-of-the-art performance of MoDNA.

Publication Date

8-10-2022

Language

English

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

An, Weizhi; Guo, Yuzhi; Bian, Yatao; Ma, Hehuan; Yang, Jinyu; Li, Chunyuan; and Huang, Junzhou, "MoDNA: Motif-Oriented Pre-training For DNA Language Model" (2022). Association of Computing Machinery Open Access Agreement Publications. 7.
https://mavmatrix.uta.edu/utalibraries_acmoapubs/7

Download

COinS

Association of Computing Machinery Open Access Agreement Publications

MoDNA: Motif-Oriented Pre-training For DNA Language Model

Document Type

Source Publication Title

Abstract

Publication Date

Language

License

Recommended Citation

Search

Browse

Author & Creator Corner

Association of Computing Machinery Open Access Agreement Publications

MoDNA: Motif-Oriented Pre-training For DNA Language Model

Authors

Document Type

Source Publication Title

Abstract

Publication Date

Language

License

Recommended Citation

Share

Search

Browse

Author & Creator Corner