OpenNMT is an open-source ecosystem for neural machine translation started in 2016 by SYSTRAN and the Harvard NLP group. The project has been used in numerous research and industry applications, including SYSTRAN Translate and SYSTRAN Model Studio.
OpenNMT’s main goal is to make neural machine translation accessible to everyone. However, neural machine translation is notoriously expensive to run as MT models often require a lot of memory and compute power. Early in this project, SYSTRAN engineers focused on improving the efficiency of OpenNMT inference to reduce cost and improve productivity.
The computational challenge of neural machine translation
Neural machine translation models are usually based on the Transformer architecture which powers many recent advances in natural language processing. A common variant known as “big Transformer” contains about 300 million parameters that are tuned during a training phase. Since the parameters are stored using 32-bit floating-point numbers, the model alone takes at least 1.2 GB on disk and in memory.
An increasing number of live events such as conferences, meetings, lectures, debates, radio and TV shows, etc. are nowadays being live streamed on video channels and social networks. These events are transmitted in real time to a large audience, on all types of devices and anywhere in the world.
Captioning and live translation1 are seen as essential in order to ensure that these events reach a growing international audience. How to optimise the comfort and understanding experience of such large audience raises the issue of multilingualism that we discuss in this post.
In the context of the upcoming French Presidency of European Union in January 2022, SYSTRAN has developed a tool called Speech Translator for real-time captioning and translation of single-speaker speeches or multi-speaker meetings. Starting with French or English as the source spoken language, Speech Translator :
transcribes the original speech, partnering for this task with Vocapia Automatic Speech Recognition2,
punctuates and segments the automatic speech recognition (ASR) output, making this automatically formatted and corrected transcription available to human reviewer and audience (speech transcription/captioning),
simultaneously runs machine translation (MT) powered by our best quality translation models towards European Union languages (speech translation/subtitling),
all of this with the lowest latency and in a dedicated and user-friendly interface. The task closely resembles simultaneous interpreting, which performs real-time multilingual translations. The next figure shows a screenshot of our live ST system interface where captions (left) as well as the corresponding English translations (right) are displayed.
Over decades, the translation industry has been proposing the use of “similar” translations in CAT tools, allowing human translators to visualize one or several matches retrieved from a translation memory (TM) when translating new documents. A translation memory (TM) is a database that stores segments of text and their corresponding translations. Segments can be sentences, paragraphs or sentence-like units (headings, titles, elements in a list, etc.). While the ideal situation is to find perfect matches, these are not always available. In such a case, translators resort to matches showing sufficient content in common with the document to be translated. These partial matches are then slightly “repaired” to achieve correct translations.
The use of TM matches relies on the idea that repairing a given TM match requires less effort than producing a translation from scratch, thus leading to higher productivity and consistency rates. The following figure illustrates human translation via repairing a TM match. The English sentence How long does the flight last? is translated into French considering the TM match How long does a flu last? —Quelle est la durée d’une grippe?
As Globalization 4.0 rears its head and the convergence of Industry 4.0 and remote work become commonplace in the business ecosystem, translation is an increasingly important component of productivity, engagement, and communication.
But how do you iron out the knots? You need to effectively communicate with team members, colleagues, and customers across physical and linguistic borders. Unfortunately, there’s a tiny road bump in the road— language.
Translation engines allow you to seamlessly communicate across language barriers. But creating a well-oiled, hyper-engaging translation solution isn’t always easy. Obviously, the source of your engine is important. Modern Neural Machine Translation (NMT) uses intelligent neural networks to instantly contextualize, digest, and output translations in micro-seconds.
SYSTRAN, the pioneer of neural machine translation solutions and technology, recently launched SYSTRAN Model Studio to help language experts build powerful and robust domain-specific translation models. By converging SYSTRAN’s world-class neural machine translation technologies with a global network of talented language and translation experts, SYSTRAN Model Studio unlocks higher translation quality and in-domain specialization for niche industries and businesses and allows LSPs to profit further from their data.
Fifty-seven percent of executives list risk and compliance as their two largest barriers to success, and a mere six percent of board members feel their company is adequately prepared to manage risk. In today’s hyper-complex risk landscape, compliance is the single greatest threat to productivity and liquidity.
Even though noncompliance costs twice as much as building compliance frameworks, most organizations have difficulty integrating compliance into their day-to-day business model.
Diversity & Inclusion (D&I) have quickly become staples of HR playbooks, yet organizations still struggle with how to fully integrate the practice across both people culture and enablement tools (technology).
The data supporting D&I practice is clear. Sales teams with the highest levels of racial diversity see 15x greater sales. Companies with diverse employee pools (across age, race, gender, culture, religion, etc.) see 2.3x higher bottom-line revenue.
Studies have shown that diverse teams are 87 percent better at decision-making — a largely intangible asset. In addition to being a moral imperative, D&I is a legitimate competitive advantage with echoing consequences and rewards across the business hierarchy. Although we’ve seen a focus in D&I HR strategies around hiring and leadership growth, businesses often fail to invest in technology frameworks that perpetuate D&I strategy internally, and as a customer engagement proposition.
Glossaries usually prove helpful to welcome a new colleague in your team, what if they were one of the best entry point to your domain for our models?
In various workplaces, a lot of knowledge is accumulated in lexicons, which uncover a wide variety of usages, from specifying specialized terms to introducing brand names and business concepts.
Based on more than 50 years of dedicated experience, our research team have presented at COLING 2020 the technique behind the User Dictionary feature, designed to polish machine translation and give it an appropriate flavor through words. This presentation has been recorded and is available here.
For many businesses, translation is a time-consuming, labor-intensive process. Human translators can take days to fully process, translate, and proof a few thousand words. Between sending jobs, communicating time frames and service prices, and receiving the actual translated document, businesses can spend several days waiting for a complete translation.
SYSTRAN’s Neural Machine Translation (NMT) solution cuts that process down to seconds. Our OpenNMT-powered neural engine and hyper-scalable architecture can almost instantly process translation requests. For example, it can translate a double-spaced, one-page Word document in around one second. NMT frees up human translators from grunt work and allows them to tackle more impactful, growth-oriented business problems.
Today, let’s discuss some of the features that allow us to provide those one-second, industry-leading turnaround times that facilitate nearly instant translations.
Over the past few years, Neural Machine Translation (NMT) has worked through many core restraints that hold it back from mainstream research and adoption across the translation space. In 2016, Google’s Neural Machine Translation system (GNMT) promised to bypass issues with computational requirements. Using low-precision arithmetic during inference computation and subdivision of common words, GNMT worked to increase throughput and accuracy for rare word computations. Today, GMNT is a core component of Google Translate.