OpenNMT is an open-source ecosystem for neural machine translation started in 2016 by SYSTRAN and the Harvard NLP group. The project has been used in numerous research and industry applications, including SYSTRAN Translate and SYSTRAN Model Studio.
OpenNMT’s main goal is to make neural machine translation accessible to everyone. However, neural machine translation is notoriously expensive to run as MT models often require a lot of memory and compute power. Early in this project, SYSTRAN engineers focused on improving the efficiency of OpenNMT inference to reduce cost and improve productivity.
The computational challenge of neural machine translation
Neural machine translation models are usually based on the Transformer architecture which powers many recent advances in natural language processing. A common variant known as “big Transformer” contains about 300 million parameters that are tuned during a training phase. Since the parameters are stored using 32-bit floating-point numbers, the model alone takes at least 1.2 GB on disk and in memory.
Continue reading