Anàlisi de la detecció d’errors tipogràfics en el text original per part de Google Translate i DeepL Translate en els àmbits econòmic i jurídic
ISSN: 2013-1453
Year of publication: 2024
Issue Title: Les mobilitats com a repte per a la sostenibilitat de les llengües minoritzades
Issue: 81
Type: Article
More publications in: Revista de llengua i dret
Abstract
L’objectiu d’aquest estudi és posar a prova el rendiment de Google Translate i DeepL Translate pel que fa a la detecció i la correcció d’errors tipogràfics. Per fer-ho, es van introduir 1.820 errades tipogràfiques que s’havien trobat en un treball anterior sobre diccionaris especialitzats espanyol-anglès (Rodríguez-Rubio i Fernández-Quesada, 2020a, 2020b; Rodríguez-Rubio Mediavilla, 2021). S’hi van introduir errades aïllades i també en cotext. Els resultats van mostrar que Google Translate superava clarament DeepL Translate. A més, es va observar que la repetició de l’error tipogràfic era el fenomen més freqüent en els resultats de traducció automàtica d’ambdós sistemes. Donant a conèixer la capacitat d’aquests sistemes per detectar els errors tipogràfics en el text original, el nostre estudi vol oferir un punt de partida perquè aquests sistemes es puguin perfeccionar.
Bibliographic References
- Anastasopoulos, Antonios. (2019). An analysis of source-side grammatical errors in NMT. In Tal Linzen, Grzegorz Chrupała, Yonatan Belinkov, & Dieuwke Hupkes (Eds.), Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (pp. 213-223). Association for Computational Linguistics. 10.18653/v1/W19-4822
- Araghi, Sahar, Palangkaraya, Alfons, & Webster, Elizabeth. (2023). The impact of language translation quality on commerce: The example of patents. Journal of International Business Policy.
- Arpit, Devansh, Jastrzębski, Stanisław, Ballas, Nicolas, Krueger, David, Bengio, Emmanuel, Kanwal, Maxinder S., Maharaj, Tegan, Fischer, Asja, Courville, Aaron, Bengio, Yoshua, & Lacoste-Julien, Simon. (2017). A closer look at memorization in deep networks. In Doina Precup, & Yee Whye Teh (Eds.), Proceedings of the 34th International Conference on Machine Learning, PMLR, 70 (pp. 233-
- . Proceedings on Machine Learning Research.
- Belinkov, Yonatan, & Bisk, Yonatan. (2018, April 30-May 3). Synthetic and natural noise both break neural machine translation [Conference presentation]. 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada. https://doi.org/10.48550/arXiv.1711.02173
- Bengio, Samy, Vinyals, Oriol, Jaitly, Navdeep, & Shazeer, Noam. (2015). Scheduled sampling for sequence prediction with recurrent neural networks. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, & Roman Garnett (Eds.), NIPS’15: Proceedings of the 28th International Conference on Neural Information Processing Systems – Vol. 1 (pp. 1171-1179).
- Bergmanis, Toms, Stafanovičs, Artūrs, & Pinnis, Mārcis. (2020). Robust neural machine translation: modeling orthographic and interpunctual variation. In Andrius Utka, Jurgita Vaičenonienė, Jolanta Kovalevskaitė, & Danguolė Kalinauskaitė (Eds.), Human Language Technologies – The Baltic Perspective: Proceedings of the Ninth International Conference Baltic HLT (pp. 80-86). IOS Press.
- Briva-Iglesias, Vicent. (2021). Traducción humana vs. traducción automática: análisis contrastivo e implicaciones para la aplicación de la traducción automática en traducción jurídica. Mutatis Mutandis, 14(2), 571-600. https://doi.org/10.17533/udea.mut.v14n2a14
- Conijn, Rianne. (2020). The keys to writing: A writing analytics approach to studying writing processes using keystroke logging [Unpublished doctoral dissertation]. Tilburg University and University of Antwerp.
- Conijn, Rianne, van Zaanen, Menno, Leijten, Mariëlle, & Van Waes, Luuk. (2019). How to typo? Building a process-based model of typographic error revisions. The Journal of Writing Analytics, 3, 69-95. https://doi.org/10.37514/JWA-J.2019.3.1.05
- Costa, Ângela, Ling, Wang, Luís, Tiago, Correia, Rui, & Coheur, Luísa. (2015). A linguistically motivated taxonomy for machine translation error analysis. Machine Translation, 29(2), 127-161.
- Chakravarthi, Bharathi Raja, Rani, Priya, Arcan, Mihael, & McCrae, John P. (2021). A survey of orthographic information in machine translation. SN Computer Science, 2(4), Article 330. https://doi.org/10.1007/s42979-021-00723-4
- Chen, Tsong Yueh, Cheung, Shing Chi, & Yiu, Siu Ming. (1998). Metamorphic testing: A new approach for generating next test cases (Technical report HKUST-CS98-01). Department of Computer Science, The Hong Kong University of Science and Technology. https://doi.org/10.48550/arXiv.2002.12543
- Cheng, Yong, Tu, Zhaopeng, Meng, Fandong, Zhai, Junjie, & Liu, Yang. (2018). Towards robust neural machine translation. In Iryna Gurevych, & Yusuke Miyao (Eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers) (pp. 1756-1766). Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/P18-1163
- Daems, Joke. (2016). A translation robot for each translator? A comparative study of manual translation and post-editing of machine translations: process, quality and translator attitude [Unpublished doctoral dissertation]. Ghent University.
- De Iriarte, Juan. (1774). Juan de Iriarte, Obras sueltas (Tomo II). Imprenta de Don Francisco Manuel de Mena.
- Elbayad, Maha, Ustaszewski, Michael, Esperança-Rodier, Emmanuelle, Manquat, Francis Brunet, Verbeek, Jakob, & Besacier, Laurent. (2020). Online versus offline NMT quality: An in-depth analysis on English-German and German-English. In Donia Scott, Nuria Bel, & Chengqing Zong (Eds.), Proceedings of the 28th International Conference on Computational Linguistics (pp. 5047-5058). International Committee on Computational Linguistics. 10.18653/v1/2020.coling-main.443
- González Vallejo, Rubén. (2022). De errores y erratas en el lenguaje jurídico: una reflexión acerca de la traducción automática. In Nuria Fernández-Quesada, & Santiago Rodríguez-Rubio (Eds.), Detección y tratamiento de errores y erratas: Un diagnóstico para el siglo XXI (pp. 185-201). Dykinson.
- Heigold, Georg, Varanasi, Stalin, Neumann, Günter, & van Genabith, Joseph. (2018). How robust are character-based word embeddings in tagging and MT against wrod scramlbing or randdm nouse? In Colin Cherry, & Graham Neubig (Eds.), Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Vol. 1: Research Track) (pp. 68-80). Association for Machine
- Translation in the Americas.
- He, Pinjia, Meister, Clara, & Su, Zhendong. (2020). Structure-invariant testing for machine translation. In Gregg Rothermel, & Doo-Hwan Bae (Eds.), ICSE ‘20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings (pp. 961-973). Association for Computing Machinery.
- Hirschmann, Fabian, Nam, Jinseok, & Fürnkranz, Johannes. (2016). What makes word-level neural machine translation hard: A case study on English-German translation. In Yuji Matsumoto, & Rashmi Prasad (Eds.), Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3199-3208). COLING 2016 Organizing Committee.
- Jurafsky, Daniel, & Martin, James H. (2023). Speech and language processing: Spelling correction and the noisy channel. Stanford University.
- Killman, Jeffrey, & Rodríguez-Castro, Mónica. (2022). Post-editing vs. translating in the legal context: Quality and time effects from English to Spanish. Revista de Llengua i Dret, Journal of Language and Law, 78, 56-72. http://dx.doi.org/10.2436/rld.i78.2022.3831
- Larriba Flor, Antonio Manuel. (2017). Character-based neural machine translation [Unpublished master’s dissertation]. Universitat Politècnica de València.
- Lee, Dickson T. S., Zhou, Zhi Quan, & Tse, T. H. (2020). Metamorphic robustness testing of Google Translate (Technical report HKUCS-TR20-03). Department of Computer Science, The Hong Kong University of Science and Technology.
- Lew, Robert, & Mitton, Roger. (2012). Online English learners’ dictionaries and misspellings: One year on. International Journal of Lexicography, 26(2), 219-233. https://doi.org/10.1093/ijl/ecs016
- Li, Yitong, Cohn, Trevor, & Baldwin, Timothy. (2017). Robust training under linguistic adversity. In Mirella Lapata, Phil Blunsom, & Alexander Koller (Eds.), Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Vol. 2, Short Papers (pp. 21-27). Association for Computational Linguistics.
- Li, Zilong, Xiao, Lei, Lin, Ruijin, & Xiao, Zhengxinchao. (2023). Metamorphic robustness testing for DeepL translation. Journal of Physics (Conference Series), 2456, Article 012018.
- Luong, Minh-Thang, Sutskever, Ilya, Le, Quoc. V., Vinyals, Oriol, & Zaremba, Wojciech. (2015). Addressing the rare word problem in neural machine translation. In Chengqing Zong, & Michael Strube (Eds.), Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers) (pp. 11-19). Association for Computational Linguistics.
- Maldonado González, María Concepción, & Liébana González, María. (2023). La traducción automática y su uso en la redacción de textos digitales: Análisis de algunos documentos reales de trabajo. Círculo de Lingüística Aplicada a la Comunicación, 95, 133-161. https://doi.org/10.5209/clac.79337
- Mateo Martínez, José. (2009). Diccionario de términos de la banca. Ariel.
- Munkova, Dasa, Munk, Michal, Welnitzova, K., & Jakabovicova, Johanna. (2021). Product and process analysis of machine translation into the inflectional language. SAGE Open, 11(4), 1-13. https://doi.org/10.1177/21582440211054501
- Parra-Galiano, Silvia. (2022). ¿Correcciones o mejoras textuales? Cuándo intervenir en la revisión y corrección de traducciones. In Nuria Fernández-Quesada, & Santiago Rodríguez-Rubio (Eds.), Detección y tratamiento de errores y erratas: Un diagnóstico para el siglo XXI (pp. 103-121). Dykinson.
- Raunak, Vikas, Menezes, Arul, & Junczys-Dowmunt, Marcin. (2021). The curious case of hallucinations in neural machine translation. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, & Yichao Zhou (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association
- for Computational Linguistics: Human Language Technologies (pp. 1172-1183). Association for Computational Linguistics.
- Rico Pérez, Celia, Díez Orzas, Pedro L., Deriard Nolasco, Giuseppe, Cámara, Lidia, Regidor, Igone, Ariano, Martin, Fernández, Félix, Blasco, Johanna, Nieto Caride, Pablo, & Benítez Monje, Roberto. (2011). D4.1.4 Annex I. EDI-TA. Post-editing methodology for machine translation. MultilingualWeb-LT, Language Technology in the Web FP7-ICT-2011-7 (Project Number 287815).
- Rodríguez-Rubio, Santiago, & Fernández-Quesada, Nuria. (2020a). Towards accuracy: A model for the analysis of typographical errors in specialised bilingual dictionaries. Two case studies. Lexikos, 30, 386-415. https://doi.org/10.5788/30-1-1606
- Rodríguez-Rubio, Santiago, & Fernández-Quesada, Nuria. (2020b). The dynamics of typographical error reproduction: Optimising formal correctness in three specialised bilingual dictionaries. ELIA, 20, 147-190. https://doi.org/10.12795/elia.2020.i20.06
- Rodríguez-Rubio Mediavilla, Santiago. (2021). Estudio comparativo de erratas en diccionarios especializados inglés-español en papel de los ámbitos económico-financiero, jurídico e industrial. Una propuesta metodológica [Unpublished doctoral dissertation]. Universidad Pablo de Olavide.
- Rodríguez-Rubio, Santiago. (2022). La imperfección de los diccionarios: estudio teórico sobre la transmisión de las erratas en los últimos siglos. Revista de Lexicografía, 28, 207-222. https://doi.org/10.17979/rlex.2022.28.1.9125
- Rodríguez-Rubio, Santiago. (In press). Did you mean…? Rendimiento de Google Translate y DeepL en la detección y corrección de erratas de un diccionario económico, financiero y comercial (inglés-español). In Ana Medina Reguera (Ed.), Traducción, localización, discurso; textos de la economía y la empresa. Peter Lang.
- Rumelhart, David E., Hinton, Geoffrey E., & Williams, Ronald J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536. https://doi.org/10.1038/323533a0
- Sennrich, Rico, Haddow, Barry, & Birch, Alexandra. (2016). Neural machine translation of rare words with subword units. In Katrin Erk, & Noah A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 1715-1725). Association for Computational Linguistics.
- Sosoni, Vilelmini, O’Shea, John, & Stasimioti, Maria. (2022). Translating law: A comparison of human and post-edited translations from Greek to English. Revista de Llengua i Dret, Journal of Language and Law, 78, 92-120. http://dx.doi.org/10.2436/rld.i78.2022.3704
- Sun, Zeyu, Zhang, Jie M., Harman, Mark, Papadakis, Mike, & Zhang, Lu. (2020). Automatic testing and improvement of machine translation. In Gregg Rothermel, & Doo-Hwan Bae (Eds.), ICSE ‘20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (pp. 974-985). Association for Computing Machinery.
- Tan, Zhixing, Wang, Shuo, Yang, Zonghan, Chen, Gang, Huang, Xuancheng, Sun, Maosong, & Liu, Yang. (2020). Neural machine translation: A review of methods, resources, and tools. AI Open, 1, 5-21. https://doi.org/10.1016/j.aiopen.2020.11.001
- Varela-Salinas, María José, & Burbat, Ruth. (2023). Google Translate and DeepL: breaking taboos in translator training. Ibérica, 45, 243-266. https://doi.org/10.17398/2340-2784.45.243
- Wahler, Madison E. (2018, January 3). A word is worth a thousand words: Legal implications of relying on machine translation technology. Carlton Fields.
- Weber, Steven, & Mehandru, Nikita. (2022). The 2020s political economy of machine translation. Business and Politics, 24(1), 96-112. https://doi.org/10.1017/bap.2021.17
- Wiesmann, Eva. (2019). Machine translation in the field of law: A study of the translation of Italian legal texts into German. Comparative Legilinguistics, 37, 117-153. https://doi.org/10.14746/cl.2019.37.4
- Wu, Lijun, Tan, Xu, He, Di, Tian, Fei, Qin, Tao, Lai, Jianhuang, & Liu, Tie-Yan. (2018). Beyond error propagation in neural machine translation: Characteristics of language also matter. In Ellen Riloff, David Chiang, Julia Hockenmaier, & Jun’ichi Tsujii (Eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3602-3611). Association for Computational Linguistics.
- Yan, Jianhao, Meng, Fandong, & Zhou, Jie. (2022). Probing causes of hallucinations in neural machine translations. Cornell University. https://doi.org/10.48550/arXiv.2206.12529
- Yang, Ziyan, Pinto-Alva, Leticia, Dernoncourt, Franck, & Ordoñez, Vicente. (2022). Backpropagationbased decoding for multimodal machine translation. Frontiers in Artificial Intelligence, 4, Article 736722. https://doi.org/10.3389/frai.2021.736722
- Xue, Haiyang, Feng, Yang, Gu, Shuhao, & Chen, Wei. (2020). Robust neural machine translation with ASR errors. In Hua Wu, Colin Cherry, Liang Huang, Zhongjun He, Mark Liberman, James Cross, & Yang Liu (Eds.), Proceedings of the 1st Workshop on Automatic Simultaneous Translation (pp. 15-23). Association for Computational Linguistics.
- Zhang, Tong, Zhang, Long, Ye, Wei, Li, Bo, Sun, Jinan, Zhu, Xiaoyu, Zhao, Wen, & Zhang, Shikun. (2021). Point, disambiguate and copy: Incorporating bilingual dictionaries for neural machine translation. In Chengqing Zong, Fei Xia, Wenjie Li, & Roberto Navigli (Eds.), Proceedings of the 59th Annual
- Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 3970-3979). Association for Computational Linguistics.
- Zhao, Wei, Wang, Liang, Shen, Kewei, Jia, Ruoyu, & Liu, Jingming. (2019). Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data. In Jill Burstein, Christy Doran, & Thamar Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 156-165). Association for Computational Linguistics.