Deploying Meta’s Transcoder on E2E’s Cloud GPU Server to Translate C++ and Java Code into Python

January 17, 2024

Introduction - Language Migration

The business of changing code databases or porting code from one language to another, known as language migration, is a significant and multifaceted endeavor in software development. It often involves moving from a familiar language to another to address technical debt, exploit better tools, or leverage a more active developer community. While the reasons for migration might vary, ranging from seeking improved features, functionalities, or even attracting talent, the process is generally complex, risky, and costly. Companies may undertake gradual migration if languages are compatible or might need to be rewritten entirely for different language families. The decision involves weighing the technical and operational costs against the potential long-term benefits and is influenced by the company's size, the scope of the codebase, and specific business needs. Ultimately, a migration should be carefully planned and executed, considering both immediate needs and future sustainability.

Some notable examples of companies migrating their code to different languages include:

1. Twitter: Moved from Ruby to Scala for better performance and scalability.

2. Netflix: Transitioned from Java to Node.js for its user interface layer to improve load times and user experience.

3. Google: Developed Go language and migrated some components for better performance and efficiency.

4. Facebook: Introduced and moved significant parts of its code to Hack, a language specifically designed to maintain the speed of PHP while offering additional type safety.

The cost of migrating a codebase to a different language can be substantial, depending on various factors like the size of the codebase, the complexity of the migration, and the languages involved. For example, the Commonwealth Bank of Australia reportedly spent approximately $750 million over five years to switch its platform from COBOL to Java. This example illustrates that while the costs can vary widely, migrating significant codebases, especially for large organizations, is generally a major financial undertaking.

Transcoder by Meta: An Overview

Transcoder by Meta, previously known as Facebook, represents an ambitious endeavor in the domain of automated code translation. The aim is to streamline the process of migrating codebases across diverse programming languages. As part of Meta's initiative titled ‘Unsupervised Translation of Programming Languages,’ Transcoder is designed to leverage artificial intelligence for understanding and converting code from one language to another. By employing machine learning techniques, it can recognize patterns, functions, and structures in one language and adeptly translate them to another while ensuring that the logical flow and functionality are maintained.

The development of tools like Transcoder indicates a growing capability and interest in automating language migration. For businesses, the implications are profound, offering potential reductions in costs and timeframes associated with codebase migration. The process, which traditionally requires extensive manual effort and deep expertise in both the source and destination languages, can be assisted or even fully automated by AI. However, quality assurance and a thorough understanding of the output remain critical components of the process.

Despite these advancements, the task of automatic code translation is rife with complexity. Each programming language comes with its own set of paradigms, libraries, and environment specifics, making a one-size-fits-all solution challenging. Tools like Transcoder significantly aid the translation process but might not always produce deployment-ready results without further tuning and testing. Therefore, while Transcoder can potentially revolutionize how companies maintain and evolve their technology stacks, it's important for organizations to consider the nuances of integrating such translations into their production environments. Factors such as cost, risk, and long-term maintainability are crucial in determining whether to adopt such automated translation solutions for business operations. The rise of tools like Transcoder by Meta could provide more flexibility in dealing with technical debt and legacy systems, yet they necessitate a careful balance of benefits against the complexities and intricacies involved.

E2E’s Cloud GPU Server

E2E's Cloud GPU servers are tailored for high-performance computations, offering a cost-effective solution for deep machine learning, architectural visualization, video processing, and scientific computing. These servers are equipped with modern NVIDIA GPU chipsets, including the latest Tesla V100 & T4 cards, known for their high operation speed and power. They also utilize CUDA technology, a parallel computing architecture from NVIDIA, which enhances GPU computing performance significantly. CUDA's advantages include the use of common programming languages, application of advanced technologies for efficient computing, multi-GPU support, and excellent scalability for various projects.

In this article, we’ll deploy Transcoder onto V100 GPUs available on E2E’s cloud server.

Deploying Transcoder

First:


# Clone the git repository for transcoder
# in local environment
git clone https://github.com/facebookresearch/TransCoder transcoder/

Download model files:


# for C++ -> Java, Java -> C++ and Java -> Python
wget https://dl.fbaipublicfiles.com/transcoder/model_1.pth
# for C++ -> Python, Python -> C++ and Python -> Java
wget https://dl.fbaipublicfiles.com/transcoder/model_2.pth

Installing dependencies:


# Since transcoder is implemented in pytorch,
# we need to install pytorch first
pip install torch torchvision


# Now install other required documentations
pip install numpy Moses libclang submitit six sacrebleu==1.2.11

Now go to TransCoder/XLM/tools/, where you will need to install fastBPE:


cd transcoder/XLM/tools/fastBPE
git clone https://github.com/glample/fastBPE
cd fastBPE
g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast

Now, to get the Python API for the code; run the following commands:

# installs header files sudo apt-get install python3-dev pip install cython python setup.py install

Now, to install Apex, run the following command. Make sure you have navigated out of the Transcoder directory.

git clone https://github.com/NVIDIA/apex cd apex if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple --config-settings with the same key... pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ otherwise

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Make sure to install libclang:


sudo apt-get install libclang-dev

And the following dependency for Python:


pip install clang-5

Clang is basically a compiler front end for the C, C++, and Objective-C programming languages. It is designed to be a replacement for the GNU Compiler Collection's (GCC) traditional compiler front end. Clang is part of the LLVM (Low-Level Virtual Machine) project and serves as its primary front end.

‍

Important: If your libclang.so is not in /usr/lib/llvm-7/lib/, replace the path to libclang.so to the correct path in clang.cindex.Config.set_library_path('path_to_libclang') in code_tokenizer.py

Translating the Code


python translate.py --src_lang cpp --tgt_lang java --model_path trained_model.pth < input_code.cpp

Use the above command to translate the code.

--src_lang: This is the source language

--tgt_lang: This is the target language

--model_path: This represents the model used for translation

Input_code.cpp: This is the file that contains the source code

‍

Let’s take C++ and Java code for generating the sum of first n natural numbers. We’ll attempt to convert this code into Python using Transcoder.

‍

Input_code.cpp code:


#include 
using namespace std;
int main() {
    int n, sum = 0;
    cout << "Enter a positive integer: ";
    cin >> n;
    for (int i = 1; i

First, let’s convert the C++ code into Python:

python translate.py --src_lang cpp --tgt_lang python --model_path model_2.pth < Input_code.cpp

Output:

‍

‍

Now let’s convert the Java code into Python:

python translate.py --src_lang java --tgt_lang python --model_path model_1.pth < Input_code.java

Output:

‍

‍

As you can see, the code translation from Java to Python makes some sense, but the one from C++ to Python produces a garbage output. This shows that Transcoder still needs a lot of improvements to be able to translate code without errors. Some more examples need to be tested to further understand Transcoder’s capabilities. Please go ahead and experiment, now that you know how to deploy it.

Conclusion

In conclusion, this article provides a comprehensive overview of language migration in software development, emphasizing the complexities and costs associated with porting code from one programming language to another. It introduces Transcoder by Meta, an automated code translation tool that leverages artificial intelligence for converting codebases across diverse languages. The deployment process of Transcoder onto E2E's cloud GPU servers is explained, showcasing the potential benefits and challenges of automating language migration.

The article highlights the growing interest in AI-driven tools like Transcoder, which can potentially reduce costs and timeframes associated with codebase migration. However, it acknowledges the complexities involved in automatic code translation, showcasing examples of both successful and less accurate translations.