Authors

Sophie Soueid

Document Type

Honors Thesis

Abstract

This study explored the importance of, and challenges facing, the application of machine translation (MT) to Québécois, a low-resource variety of French native to the Canadian province of Québec. The history of MT in Canada is discussed, and a QuébécoisEnglish MT engine was trained to investigate practical questions around applying automated translation to this low-resource French variant. Québécois has only rarely been the focus of published MT research work, even within the Canadian governmental setting, which relies primarily on human translation. Marian, an open-source neural machine translation (NMT) toolkit, was utilized for training an MT engine on the 36th Canadian Parliament’s aligned Québécois-English Hansards (debate transcripts). Parliamentary debates are a common source of training data for MT, but the Canadian Parliament data has not been widely used. The engine’s BLEU score, an automated metric of translation quality, was 32.0, indicating moderately good translation quality. Further qualitative analyses are performed by translating authentic Québécois texts taken from a range of linguistic domains—interviews, health/medical, technology, and politics—and a hand-crafted set of sentences containing “challenge” words that were expected to be difficult for the engine to translate. The resulting engine trained on Hansards data struggled in basic Québécois FrenchEnglish translation across multiple domains. In-domain, the engine globally and automatically received a BLEU score of 32, which is within the normal range for a new engine. It did not perform as well on a test set of sentence probes based on Québécois terminology, nor did it output anything other than post-edit-ready strings when translating modern-day news magazine stories. With the addition of a large, aligned, bilingual Canadian dataset, an adequately satisfying specialized MT engine for this French variant could be built. In the meantime, it is advisable for MT researchers and administrators in that environment to continue to pair MT output with human translators and post-editors, an arrangement with which the Canadian Translation Bureau has demonstrated greater comfort for some years.

Publication Date

5-1-2020

Language

English

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.