ORCID Identifier(s)

https://orcid.org/0009-0005-8884-376X

Graduation Semester and Year

Summer 2024

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Chengkai Li

Abstract

This dissertation delves into the realm of natural language generation (NLG) from expansive open-domain knowledge graphs, aiming to bridge the gap between existing methods primarily tested on limited datasets and the demands of real-world large-scale, diverse graph structures. Prior works in NLG often relied on small-scale or restricted datasets, neglecting the complexities of broader knowledge graphs. To address this, we introduce a new dataset called GraphNarrative, designed to encompass a wide range of graph structures and enhance the realism of NLG tasks.

The core contribution of this research lies in devising a novel approach to mitigating information hallucination, a common issue in NLG where generated text may include inaccuracies or fabricated details not present in the input graph. Our method leverages Transformer-based pre-trained language models fine-tuned on GraphNarrative. Notably, we employ dependency parse trees to trim training sentences, ensuring they strictly adhere to the information present in their corresponding graphs.

Through rigorous experimentation and evaluation, we demonstrate the effectiveness of our approach in eliminating information hallucination while maintaining high-quality NLG output. Our findings showcase significant improvements over existing methods, particularly when applied to diverse and large-scale knowledge graphs.

Furthermore, we contribute to the research community by releasing the GraphNarrative dataset, along with our source code and trained models, available for public access at https://github.com/idirlab/graphnarrator.

In conclusion, this dissertation not only advances the field of NLG by addressing challenges posed by large-scale open-domain knowledge graphs but also provides valuable resources and methodologies for future research in this domain.

Keywords

graph-to-text generation, knowledge graphs, hallucination mitigation, natural language generation

Disciplines

Artificial Intelligence and Robotics | Databases and Information Systems | Data Science

License

This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.

Recommended Citation

Shi, Xiao, "NATURAL LANGUAGE GENERATION FROM LARGE-SCALE OPEN-DOMAIN KNOWLEDGE GRAPHS" (2024). Computer Science and Engineering Dissertations. 257.
https://mavmatrix.uta.edu/cse_dissertations/257

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Data Science Commons

COinS

Computer Science and Engineering Dissertations

NATURAL LANGUAGE GENERATION FROM LARGE-SCALE OPEN-DOMAIN KNOWLEDGE GRAPHS