Language Mapping

Description

This chapter is intended to give an overview over the concept of language tags which are used behind the scenes in language aware workflows.

Normally (in the default case) the user/configurator does not have to deal with these internals as they only apply in special use cases (e.g. the user wants to translate something to American English instead of British English). But for these special use cases it is crucial to understand the following concepts.

Languages in OMN

Language Aware AI workflows usually have parameters to select for example target- and/or source-language.
For the user and configurator these languages are displayed with their simplified display name like "English" or "German".
But internally they are represented as a Language Tag consisting of an ISO language and country code like "en-EN" or "de-DE".

In general it could be said that Language Tags have the following format:

<language>-<script>-<region>-<variant> (whereby all parts except <language> are optional).

Examples are:

  • "de-DE", "de-AT", "en-GB" or "sr-Latn-RS" (Serbian written using the Latin script as used in Serbia) or

  • "de-CH-1901" (German as used in Switzerland using the 1901 variant)

As OMN only supports an ISO language code together with a country code, usually the <script> and <variant> parts are not used.

Languages in API Providers

API Providers which support languages usually have their own format for representing different languages.
Here some examples what can be used:

  • A language code equivalent like "de" or "en".

    • Some use it in uppercase like "DE" or "EN".

  • Usage of distinguished country codes like "en-GB" or "en-US" with some providing support for different scripts like "sr_cyrl" (Serbian Cyrillic) or "sr_latn" (Serbian Latin).

  • An ID like "538b1efc6f88ad88feebf7acd8c618facb54fe82".

Mapping

Example: There are three API providers which each accept language codes in a different format.

providerOne = DE,EN
providerTwo = de,en
providerThree = de-DE,en-GB,en-US

In this example, only provider number three can explicitly handle GB and US english regions.

Lets say in our OMN the language "German" is configured with "de" as language code and "DE" as country code ("de-DE"). Furthermore the language "English" is configured with "en" as language code and "EN" as country code ("en-EN"). Now we can map the Language Tags to the corresponding code of the provider:

providerOne.languageTagMappings = de-DE:DE,en-EN:EN
providerTwo.languageTagMappings = de-DE:de,en-EN:en
providerThree.languageTagMappings = de-DE:de-DE,en-EN:en-GB
On most Workflows these languageTagMappings parameters are set with a reasonable default value which should fit in most cases. It is usually only required to adjust them in case you have special needs regarding the used dialect or your OMN has a "non-standard" language/country code configuration.

Configuration

Language Tag Mappings

Language Tag Mappings are usually configured by using a global namespace on which users can use put and remove capabilities on the map.
For example, in AI Image Tagger users can find the Language Tag Mappings for every Tagger on the corresponding configuration namespace 'ext.ai-imagetagging.<tag-provider-name>.languageTagMappings'.
To illustrate, "Imagga" got the following default value (abbreviated for simplicity):

ext.ai-imagetagging.imagga.languageTagMappings = ar:ar,bg:bg,bs:bs,en:en,ca:ca,cs:cs,cy:cy,da:da,de:de,el:el,es:es,et:et,...

If a user wants to change a mapping for one or more keys instead of providing the whole string again, the user can use "putAll" and "removeAllKeys" on the namespace:

ext.ai-imagetagging.imagga.languageTagMappings.putAll = de:de-AT,de-AT:de-AT
ext.ai-imagetagging.imagga.languageTagMappings.removeAllKeys = ar,fr,fr-CA

Language Tag Mapping Strategy

Next to the mappings there are usually two strategies for Language Tag Mapping available:

  • lenient: If a more complex Language Tag like "fr-FR" can not be found in the Mappings, then this strategy falls back to the minimal Language Tag (in this case "fr") and tries to map that one.

  • strict: This strategy tries to map the exact Language Tag.

Example Use Cases

Customer wants to use American English

The customer has an existing OMN system with English configured as "en-EN", which is usually by default mapped to "en-GB" (except for DeepL in AI Translation), but he wants to use the workflow with American English. As several other existing processes might already rely on the language code "en" and the country code "EN" the customer does not want to change these codes system-wide.

As described above the default mapping for "en-EN" is usually to a British dialect like "en-GB" (actual code depends on the API Provider). Except for DeepL in AI Translation Workflow, where we map to American Dialect for backward compatibility reasons, this is the default maping in AI Workflows if the API Provider supports different dialects. To use American English it is required to configure the corresponding languageTagMappings parameter to use "en-US":

*.languageTagMappings.putAll = en-EN:en-US

Welcome to the AI Chat!

Write a prompt to get started...