Graduation Semester and Year
Summer 2024
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Chengkai Li
Abstract
This dissertation delves into the realm of natural language generation (NLG) from expansive open-domain knowledge graphs, aiming to bridge the gap between existing methods primarily tested on limited datasets and the demands of real-world large-scale, diverse graph structures. Prior works in NLG often relied on small-scale or restricted datasets, neglecting the complexities of broader knowledge graphs. To address this, we introduce a new dataset called GraphNarrative, designed to encompass a wide range of graph structures and enhance the realism of NLG tasks.
The core contribution of this research lies in devising a novel approach to mitigating information hallucination, a common issue in NLG where generated text may include inaccuracies or fabricated details not present in the input graph. Our method leverages Transformer-based pre-trained language models fine-tuned on GraphNarrative. Notably, we employ dependency parse trees to trim training sentences, ensuring they strictly adhere to the information present in their corresponding graphs.
Through rigorous experimentation and evaluation, we demonstrate the effectiveness of our approach in eliminating information hallucination while maintaining high-quality NLG output. Our findings showcase significant improvements over existing methods, particularly when applied to diverse and large-scale knowledge graphs.
Furthermore, we contribute to the research community by releasing the GraphNarrative dataset, along with our source code and trained models, available for public access at https://github.com/idirlab/graphnarrator.
In conclusion, this dissertation not only advances the field of NLG by addressing challenges posed by large-scale open-domain knowledge graphs but also provides valuable resources and methodologies for future research in this domain.
Keywords
graph-to-text generation, knowledge graphs, hallucination mitigation, natural language generation
Disciplines
Artificial Intelligence and Robotics | Databases and Information Systems | Data Science
License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.
Recommended Citation
Shi, Xiao, "NATURAL LANGUAGE GENERATION FROM LARGE-SCALE OPEN-DOMAIN KNOWLEDGE GRAPHS" (2024). Computer Science and Engineering Dissertations. 257.
https://mavmatrix.uta.edu/cse_dissertations/257
Included in
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Data Science Commons