Uncovering machine translationese—On syntactic properties of neural machine-translated texts

SHEN Mengfei; HUANG Wei

您当前的位置：

首页 >

文章列表页 >

Uncovering machine translationese—On syntactic properties of neural machine-translated texts

Articles | 更新时间：2024-08-13

Uncovering machine translationese—On syntactic properties of neural machine-translated texts

沈梦菲，黄伟作者信息&出版信息

Foreign Language Teaching and Research · 2024年8月13日 · 2024年 56卷第3期

AI 摘要

1. Research background

Deeply explored the development background of machine translation technology and its differences from manual translation. With the advancement of neural machine translation technology, the quality of translation has been significantly improved. However, compared with manual translation, there are still perceptible "machine translation traces", mainly reflected in differences in language style. Research has shown that machine translated text differs significantly from human translation in terms of language unit usage frequency, vocabulary, and morphological richness, resulting in reduced coherence, readability, and authenticity of the translated text. In addition, the syntactic features of machine translated texts have also received attention, and research has found that their syntactic complexity differs from that of human translation. The aim of this study is to construct a "human machine translation dependency tree library" and use syntactic metrics to compare and analyze the syntactic features of two translations, in order to reveal the linguistic manifestations of "machine translation traces" at the syntactic level. The study selected Baidu and Google Translate as representatives and used MDD and dependency direction ratio as indicators to measure syntactic complexity and word order distribution, exploring the differences and causes of syntactic complexity and word order distribution between neural machine translated texts and manually translated texts. This study is of great significance for a deeper understanding of the syntactic characteristics of neural machine translation language, improving translation quality assessment, and post translation editing.

2. Research Design

Detailed description of the research design, including corpus sources, syntactic annotation, measurement indicators, and analysis methods. The study selected three translations of Harari's "A Brief History of Humankind: From Animals to God": manual translation, Baidu translation, and Google translation, and split them into samples of around 2000 words for paragraph alignment. Using the dependency grammar theory framework for syntactic annotation, three dependency tree libraries were created using the Language Technology Platform (LTP) of Harbin Institute of Technology. The study is based on a dependency tree library and uses dependency distance (DD) and dependency direction ratio to examine the syntactic complexity and word order distribution of the text. Dependency distance is the linear distance between dominant and subordinate words, and the MDD (Mean Dependency Distance) of a sentence can reflect syntactic complexity and cognitive difficulty. The distribution of dependency direction can be used as an indicator to distinguish word order types. The study calculated the MDD of the entire text as well as different sentence lengths and dependency types, and the proportion of dependency relationships after dominant words, in order to compare the differences between human and machine translations. Use Python language to calculate relevant indicators, and conduct variance analysis using SPSS software to determine whether there is a significant difference between the two translations.

3. Results and Discussion

Examining the syntactic complexity of both human and machine translated texts by calculating the average dependency distance (MDD). Research has found that the MDD of manual translations is slightly lower than that of two machine translated versions, and as the sentence length increases, MDD increases in both translations. However, the growth rate of manual translations gradually slows down, while the growth rate of machine translated versions is faster. In addition, machine translation systems have strong capabilities in handling short sentences, but there are shortcomings in controlling the syntactic complexity of long sentences. However, the MDD of machine translated texts is generally within the range of natural language and human short-term memory capacity, indicating that machine translation systems may have learned a tendency towards minimizing memory burden in natural language use.

4. Conclusion

This chapter summarizes the research on using dependency distance and dependency direction to analyze the syntactic features of English to Chinese translations of human translation and neural machine translation, revealing the shortcomings of machine translation in controlling the complexity of long sentence syntax and word order distribution, as well as the differences in the use of word classes and syntactic means between machine translation and human translation and native Chinese. At the same time, it is pointed out that machine translation technology is developing rapidly, and its characteristics may change with technological iteration. It is suggested that future research consider more influencing factors to explore machine translation text features more comprehensively, providing support for improving translation algorithms and enhancing translation quality.

* 以上内容由AI自动生成，内容仅供参考。对于因使用本网站以上内容产生的相关后果，本网站不承担任何商业和法律责任。

展开

引用量

AI 摘要

1. Research background

2. Research Design

3. Results and Discussion

4. Conclusion

当前期刊

推荐论文

Connecting the language faculty with the external world—An initial exploration of a generative-linguistics-based language model

Lexical diversity and syntactic complexity in ChatGPT translation

“In-between Hypothesis”—A novel proposition on the feature of translational language—A quantitative study based on dependency grammar