Papers and publications

About Systran

With more than 50 years of experience in translation technologies, SYSTRAN has pioneered the greatest innovations in the field, including the first web-based translation portals and the first neural translation engines combining artificial intelligence and neural networks for businesses and public organizations.

SYSTRAN provides business users with advanced and secure automated translation solutions in various areas such as: global collaboration, multilingual content production, customer support, electronic investigation, Big Data analysis, e-commerce, etc. SYSTRAN offers a tailor-made solution with an open and scalable architecture that enables seamless integration into existing third-party applications and IT infrastructures.

Towards Example-Based NMT with Multi-Levenshtein Transformers

Maxime Bouthors, Josep Crego, François Yvon.

2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), Dec 2023, Singapore.

BiSync: A Bilingual Editor for Synchronized Monolingual Texts

In our globalized world, a growing number of situations arise where people are required to communicate in one or several foreign languages. In the case of written communication, users with a good command of a foreign language may find assistance from computeraided translation (CAT) technologies. These technologies often allow users to access external resources, such … Continued

Josep Crego, Jitao Xu, François Yvon.

61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), Jul 2023, Toronto, Canada.

Example-Based Machine Translation from Text to a Hierarchical Representation of SignLanguage

This paper presents an experiment in automatic translation from text to sign language (SL). As we do not have a large aligned corpus, we have explored an example-based approach, using AZee, an intermediate representation of the discourse in SL in the form of hierarchical expressions.

Élise Bertin-Lemée, Annelies Braffort, Camille Challant, Claire Danet, Michael Filhol

18e Conférence en Recherche d'Information et Applications -- 16e Rencontres Jeunes Chercheurs en RI -- 30e Conférence sur le Traitement Automatique des Langues Naturelles -- 25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (TALN 2023), Jun 2023, Paris, France.

Integrating Translation Memories into Non-Autoregressive Machine Translation

Jitao Xu, Josep Crego, François Yvon.

7th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Association for Computational Linguistics, May 2023, Dubrovnik, Croatia.

Bilingual Synchronization: Restoring Translational Relationships with Editing Operations

Machine Translation (MT) is usually viewed as a one-shot process that generates the target language equivalent of some source text from scratch. We consider here a more general setting which assumes an initial target sequence, that must be transformed into a valid translation of the source, thereby restoring parallelism between source and target. For this bilingual synchronization task, we consider several architectures (both autoregressive and non-autoregressive) and training regimes, and experiment with multiple practical settings such as simulated interactive MT, translating with Translation Memory (TM) and TM cleaning. Our results suggest that one single generic edit-based system, once fine-tuned, can compare with, or even outperform, dedicated systems specifically trained for these tasks.

Jitao Xu, Josep Crego, François Yvon

The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Dec 2022, Abou Dabi, United Arab Emirates

Non-Autoregressive Machine Translation with Translation Memories

Non-autoregressive machine translation (NAT) has recently made great progress. However, most works to date have focused on standard translation tasks, even though some edit-based NAT models, such as the Levenshtein Transformer (LevT), seem well suited to translate with a Translation Memory (TM). This is the scenario considered here. We first analyze the vanilla LevT model … Continued

Jitao Xu, Josep Crego, François Yvon

The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Dec 2022, Abou Dabi, United Arab Emirates.

Robust Translation of French Live Speech Transcripts

Despite a narrowed performance gap with direct approaches, cascade solutions, involving automatic speech recognition (ASR) and machine translation (MT) are still largely employed in speech translation (ST). Direct approaches employing a single model to translate the input speech signal suffer from the critical bottleneck of data scarcity. In addition, multiple industry applications display speech transcripts … Continued

Elise Bertin-Lemée, Guillaume Klein, Josep Crego, Jean Senellart

Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), Sep 2022, Orlando USA

Latent Group Dropout for Multilingual and Multidomain Machine Translation

Multidomain and multilingual machine translation often rely on parameter sharing strategies, where large portions of the network are meant to capture the commonalities of the tasks at hand, while smaller parts are reserved to model the peculiarities of a language or a domain. In adapter-based approaches, these strategies are hardcoded in the network architecture, independent … Continued

Minh-Quang Pham, François Yvon, Josep Crego

Findings of the Association for Computational Linguistics: NAACL 2022, Jul 2022, Seattle, United States

Example-based Multilinear Sign Language Generation from a Hierarchical Representation.

Boris Dauriac, Annelies Braffort, Elise Bertin-Lemée.

Jun 2022, Marseille, France.

Multi-Domain Adaptation in Neural Machine Translation with Dynamic Sampling Strategies

Building effective Neural Machine Translation models often implies accommodating diverse sets of heterogeneous data so as to optimize performance for the domain(s) of interest. Such multi-source / multi-domain adaptation problems are typically approached through instance selection or reweighting strategies, based on a static assessment of the relevance of training instances with respect to the task … Continued

MinhQuang Pham, Antoine Senellart, Dan Berrebbi, Josep Crego, Jean Senellart

Proceedings of the 23rd Annual Conference of the European Association for Machine Translation , Jun 2022, Ghent, Belgium