Now there are various ways we could approach solving the problem, depending on the resources we have available to us. But for the sake of this talk, let's assume that we don't have much budget to dedicate to this feature, nor do we have a team that can translate content for us. So our solution must be low cost and require human interaction only to validate the results.
This is a fantastic use case for AI and automation. When my team started tackling this problem, for obvious reasons we looked at Azure first. Through that research, I found the Azure AI translator service. This service not only allows for ad hoc text translation, where you send text directly through the translator and it returns the translated text as JSON, but it also offers document translation, which takes as input one or more files from a blob storage container, translates them and outputs the translated files into a different blob storage container.
Also, Azure AI translator service has a few key features that ultimately convinced us that it was the right fit for our needs. The first key feature that was important to us was language support. My team at Microsoft runs an upscaling, reskilling program that operates all around the world. So it's important to us that any multilingual solution be able to accommodate any language or dialect we want to support. With over 100 different languages and dialects supported, even including Klingon, Azure AI translator was definitely the right choice for us.
The next key feature for us was cost. Azure AI translator has a generous free tier of up to 2 million characters translated per month. So depending on the amount of content on your site and how often you translate your content, you may not even incur a cost at all. Another important feature is accuracy. Azure AI translator uses what they call Neural Machine Translation or NMT, which in their own words is an improvement on previous statistical machine translation, SMT, based approaches as it uses far more dimensions to represent tokens, such as words, morphemes and punctuation of the source and target text. Additionally, they go on to explain that using the NMT approach, the translation will take into context the full sentence versus only a few words sliding window that SMT uses and will produce more fluid and human translated looking translations. For us, this means more contextually accurate translations, which means less tweaking, if any that would be needed. This is what makes it possible for us to integrate translations into our CI CD workflow.
The final key feature for us is the ability to use custom glossaries to tweak the translation process, which is useful for skipping translation of industry specific terminology and or brand related text, as well as for ensuring that certain words or phrases are translated in a way that retains the original meaning. It's not all sunshine and rainbows though, so I wanted to also take a moment to call out some of the limitations of Azure AI translator. Although the service can handle many different file types, it currently isn't able to handle MDX or JSX TSX, so your content will need to be stored separately from these files in order to be translated. Another limitation is that all translation responses are returned as horizontal, left to right, or right to left text, so you may need to add rendering logic if you want to display content in a vertical format for applicable languages. Finally, as with any AI implementation, you'll still want to validate the results before deploying them to prod, and you may find that additional tweaking may be needed, which would also require a redeployment and revalidation.
Alright, enough talk about Azure AI translator, let's get into how we're going to design this workflow. Depending on how the content is stored, there are essentially three options for how to translate it. Option 1, your content is stored as JSON. As you can see from this JSON snippet, we have an array of posts, which are objects containing the various data our frontend would need in order to render each post. If your content is stored like this, you would most likely want to translate the content during runtime using the Azure Translator API, probably using a React server component or a separate custom API that would cache the translated content in order to reduce how often the translation service is being run, which is ultimately going to keep your cost down.
Comments