This document lists all transforms available in unicode_transform, including the :from and :to option atoms, direct transform IDs for the :transform option, and the 394 CLDR XML rule sets that ship with unicode_transform.
There are three ways to invoke a transform:
# 1. :from / :to atoms (most convenient)
iex> Unicode.Transform.transform("Москва", from: :cyrillic, to: :latin)
{:ok, "Moskva"}
# 2. :transform string (for transforms without atom mappings)
iex> Unicode.Transform.transform("Հayaստan", transform: "Armenian-Latin-BGN")
{:ok, "Hayastan"}
# 3. :transform with :direction (for reverse transforms)
iex> Unicode.Transform.transform("アイウエオ", transform: "Hiragana-Katakana", direction: :reverse)
{:ok, "あいうえお"}The :from option defaults to :any when omitted:
# These are equivalent:
iex> Unicode.Transform.transform("hello", to: :upper)
{:ok, "HELLO"}
iex> Unicode.Transform.transform("hello", from: :any, to: :upper)
{:ok, "HELLO"}Both atoms and strings are accepted for :from and :to. The
BCP47 script code or the Unicode script name are both acceptable.
iex> Unicode.Transform.transform("αβγδ", from: :greek, to: :latin)
{:ok, "abgd"}
iex> Unicode.Transform.transform("αβγδ", from: "Greek", to: "Latin")
{:ok, "abgd"}
iex> Unicode.Transform.transform("αβγδ", from: :grek, to: :latn)
{:ok, "abgd"}Use from: :detect to automatically detect scripts in the input:
iex> Unicode.Transform.transform("αβγδ мир", from: :detect, to: :latin)
{:ok, "abgd mir"}When from: :detect is set, the string is analzed to identify the script
or scripts used within it. A transform is applied for each script
identified in the string.
To list all available transforms at runtime:
iex> Unicode.Transform.available_transforms() |> Enum.take(5)
["Amharic-Latin-BGN", "Any-Accents", "Any-Publishing", "Arabic-Latin-BGN", "Arabic-Latin"]This returns a sorted list of all transform IDs unicode_transform can resolve.
These transforms are implemented directly in Elixir without CLDR XML rule sets.
:to atom |
Transform ID | Description |
|---|---|---|
:nfc |
Any-NFC |
Unicode NFC normalization |
:nfd |
Any-NFD |
Unicode NFD normalization |
:nfkc |
Any-NFKC |
Unicode NFKC normalization |
:nfkd |
Any-NFKD |
Unicode NFKD normalization |
:upper |
Any-Upper |
Uppercase mapping |
:lower |
Any-Lower |
Lowercase mapping |
:title |
Any-Title |
Titlecase mapping |
:null |
Any-Null |
Identity (no change) |
:remove |
Any-Remove |
Remove all characters |
These atoms can be used with the :from and :to options. Each script has a full name atom and a BCP47 (ISO 15924) code atom that resolve to the same value.
| Full name | BCP47 code | Resolves to |
|---|---|---|
:arabic |
:arab |
"Arabic" |
:armenian |
:armn |
"Armenian" |
:bengali |
:beng |
"Bengali" |
:bopomofo |
:bopo |
"Bopomofo" |
:canadian_aboriginal |
:cans |
"CanadianAboriginal" |
:cyrillic |
:cyrl |
"Cyrillic" |
:devanagari |
:deva |
"Devanagari" |
:ethiopic |
:ethi |
"Ethiopic" |
:georgian |
:geor |
"Georgian" |
:greek |
:grek |
"Greek" |
:gujarati |
:gujr |
"Gujarati" |
:gurmukhi |
:guru |
"Gurmukhi" |
:han |
:hani |
"Han" |
:hangul |
:hang |
"Hangul" |
:hebrew |
:hebr |
"Hebrew" |
:hiragana |
:hira |
"Hiragana" |
:kannada |
:knda |
"Kannada" |
:katakana |
— | "Katakana" |
:khmer |
:khmr |
"Khmer" |
:lao |
:laoo |
"Lao" |
:latin |
:latn |
"Latin" |
:malayalam |
:mlym |
"Malayalam" |
:myanmar |
:mymr |
"Myanmar" |
:oriya |
:orya |
"Oriya" |
:sinhala |
:sinh |
"Sinhala" |
:syriac |
:syrc |
"Syriac" |
:tamil |
:taml |
"Tamil" |
:telugu |
:telu |
"Telugu" |
:thaana |
:thaa |
"Thaana" |
:thai |
— | "Thai" |
Additional atoms without BCP47 equivalents:
| Atom | Resolves to |
|---|---|
:hant |
"Hant" |
:interindic |
"InterIndic" |
:jamo |
"Jamo" |
| Atom | Resolves to |
|---|---|
:ascii |
"ASCII" |
:fullwidth |
"Fullwidth" |
:halfwidth |
"Halfwidth" |
| Atom | Resolves to |
|---|---|
:publishing |
"Publishing" |
:accents |
"Accents" |
:any |
"Any" |
:detect |
Automatic script detection |
These are the most commonly used transforms. Use from: <script>, to: :latin or the equivalent :transform string.
:from |
:to |
Transform ID | Bidirectional |
|---|---|---|---|
:arabic |
:latin |
Arabic-Latin |
Yes |
:armenian |
:latin |
Armenian-Latin |
No (forward only) |
:bengali |
:latin |
Bengali-Latin |
Yes |
:canadian_aboriginal |
:latin |
CanadianAboriginal-Latin |
Yes |
:cyrillic |
:latin |
Cyrillic-Latin |
Yes |
:devanagari |
:latin |
Devanagari-Latin |
Yes |
:georgian |
:latin |
Georgian-Latin |
Yes |
:greek |
:latin |
Greek-Latin |
Yes |
:gujarati |
:latin |
Gujarati-Latin |
Yes |
:gurmukhi |
:latin |
Gurmukhi-Latin |
Yes |
:han |
:latin |
Han-Latin |
No (forward only) |
:hangul |
:latin |
Hangul-Latin |
Yes |
:hant |
:latin |
Hant-Latin |
No (forward only) |
:hebrew |
:latin |
Hebrew-Latin |
Yes |
:hiragana |
:latin |
Hiragana-Latin |
Yes |
:kannada |
:latin |
Kannada-Latin |
Yes |
:katakana |
:latin |
Katakana-Latin |
Yes |
:khmer |
:latin |
Khmer-Latin |
Yes |
:lao |
:latin |
Lao-Latin |
Yes |
:malayalam |
:latin |
Malayalam-Latin |
Yes |
:myanmar |
:latin |
Myanmar-Latin |
Yes |
:oriya |
:latin |
Oriya-Latin |
Yes |
:sinhala |
:latin |
Sinhala-Latin |
Yes |
:syriac |
:latin |
Syriac-Latin |
Yes |
:tamil |
:latin |
Tamil-Latin |
Yes |
:telugu |
:latin |
Telugu-Latin |
Yes |
:thaana |
:latin |
Thaana-Latin |
Yes |
:thai |
:latin |
Thai-Latin |
Yes |
Bidirectional transforms can be used in reverse with from: :latin, to: <script>.
:from |
:to |
Transform ID |
|---|---|---|
:latin |
:ascii |
Latin-ASCII |
iex> Unicode.Transform.transform("café résumé", from: :latin, to: :ascii)
{:ok, "cafe resume"}This transform has a dedicated fast-path module for maximum performance.
| Transform ID | Description |
|---|---|
Fullwidth-Halfwidth |
Convert fullwidth characters to halfwidth |
Halfwidth-Fullwidth |
Convert halfwidth characters to fullwidth (reverse) |
iex> Unicode.Transform.transform("Hello", transform: "Fullwidth-Halfwidth")
{:ok, "Hello"}:from |
:to |
Transform ID | Bidirectional |
|---|---|---|---|
:hiragana |
:katakana |
Hiragana-Katakana |
Yes |
iex> Unicode.Transform.transform("あいうえお", from: :hiragana, to: :katakana)
{:ok, "アイウエオ"}| Transform ID | Description |
|---|---|
Han-Latin |
Chinese characters to Latin (pinyin) |
Han-Latin-Names |
Chinese names to Latin (surname-first order) |
Han-Spacedhan |
Insert spaces between Chinese characters |
Simplified-Traditional |
Simplified Chinese to Traditional Chinese |
iex> Unicode.Transform.transform("中国", transform: "Han-Latin")
{:ok, "zhōng guó"}Every pair of Indic scripts has a direct transform via the InterIndic pivot. These are all bidirectional.
Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, Telugu
Each Indic script can be transformed to every other Indic script. For example:
iex> Unicode.Transform.transform("हिन्दी", from: :devanagari, to: :bengali)
{:ok, "হিন্দী"}
iex> Unicode.Transform.transform("বাংলা", from: :bengali, to: :gujarati)
{:ok, "બાંલા"}The full set of Indic cross-script files:
| Source → | Bengali | Devanagari | Gujarati | Gurmukhi | Kannada | Malayalam | Oriya | Tamil | Telugu |
|---|---|---|---|---|---|---|---|---|---|
| Bengali | — | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Devanagari | ✓ | — | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Gujarati | ✓ | ✓ | — | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Gurmukhi | ✓ | ✓ | ✓ | — | ✓ | ✓ | ✓ | ✓ | ✓ |
| Kannada | ✓ | ✓ | ✓ | ✓ | — | ✓ | ✓ | ✓ | ✓ |
| Malayalam | ✓ | ✓ | ✓ | ✓ | ✓ | — | ✓ | ✓ | ✓ |
| Oriya | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — | ✓ | ✓ |
| Tamil | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — | ✓ |
| Telugu | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
Each Indic script also has transforms to Arabic script and Urdu:
| Transform ID | Description |
|---|---|
Bengali-Arabic |
Bengali to Arabic script |
Bengali-ur |
Bengali to Urdu |
Devanagari-Arabic |
Devanagari to Arabic script |
Devanagari-ur |
Devanagari to Urdu |
Gujarati-Arabic |
Gujarati to Arabic script |
Gujarati-ur |
Gujarati to Urdu |
Gurmukhi-Arabic |
Gurmukhi to Arabic script |
Gurmukhi-ur |
Gurmukhi to Urdu |
Kannada-Arabic |
Kannada to Arabic script |
Kannada-ur |
Kannada to Urdu |
Malayalam-Arabic |
Malayalam to Arabic script |
Malayalam-ur |
Malayalam to Urdu |
Oriya-Arabic |
Oriya to Arabic script |
Oriya-ur |
Oriya to Urdu |
Tamil-Arabic |
Tamil to Arabic script |
Tamil-ur |
Tamil to Urdu |
Telugu-Arabic |
Telugu to Arabic script |
Telugu-ur |
Telugu to Urdu |
These are internal transforms used by the cross-script Indic pipeline. They can also be used directly:
| Transform ID | Description |
|---|---|
Bengali-InterIndic |
Bengali to InterIndic |
Devanagari-InterIndic |
Devanagari to InterIndic |
Gujarati-InterIndic |
Gujarati to InterIndic |
Gurmukhi-InterIndic |
Gurmukhi to InterIndic |
Kannada-InterIndic |
Kannada to InterIndic |
Malayalam-InterIndic |
Malayalam to InterIndic |
Oriya-InterIndic |
Oriya to InterIndic |
Tamil-InterIndic |
Tamil to InterIndic |
Telugu-InterIndic |
Telugu to InterIndic |
InterIndic-Bengali |
InterIndic to Bengali |
InterIndic-Devanagari |
InterIndic to Devanagari |
InterIndic-Gujarati |
InterIndic to Gujarati |
InterIndic-Gurmukhi |
InterIndic to Gurmukhi |
InterIndic-Kannada |
InterIndic to Kannada |
InterIndic-Latin |
InterIndic to Latin |
InterIndic-Malayalam |
InterIndic to Malayalam |
InterIndic-Oriya |
InterIndic to Oriya |
InterIndic-Tamil |
InterIndic to Tamil |
InterIndic-Telugu |
InterIndic to Telugu |
InterIndic-Arabic |
InterIndic to Arabic |
InterIndic-ur |
InterIndic to Urdu |
These follow the BGN/PCGN romanization standards used by the U.S. Board on Geographic Names and the Permanent Committee on Geographical Names.
| Transform ID | Description |
|---|---|
Amharic-Latin-BGN |
Amharic (Ethiopic) to Latin |
Arabic-Latin-BGN |
Arabic to Latin |
Armenian-Latin-BGN |
Armenian to Latin |
Azerbaijani-Latin-BGN |
Azerbaijani (Cyrillic) to Latin |
Belarusian-Latin-BGN |
Belarusian to Latin |
Bulgarian-Latin-BGN |
Bulgarian to Latin |
Georgian-Latin-BGN |
Georgian to Latin |
Georgian-Latin-BGN_1981 |
Georgian to Latin (1981 system) |
Greek-Latin-BGN |
Greek to Latin |
Hebrew-Latin-BGN |
Hebrew to Latin |
Katakana-Latin-BGN |
Katakana to Latin |
Kazakh-Latin-BGN |
Kazakh to Latin |
Kirghiz-Latin-BGN |
Kirghiz to Latin |
Korean-Latin-BGN |
Korean to Latin |
Macedonian-Latin-BGN |
Macedonian to Latin |
Maldivian-Latin-BGN |
Maldivian (Thaana) to Latin |
Mongolian-Latin-BGN |
Mongolian to Latin |
Pashto-Latin-BGN |
Pashto to Latin |
Persian-Latin-BGN |
Persian to Latin |
Russian-Latin-BGN |
Russian to Latin |
Serbian-Latin-BGN |
Serbian to Latin |
Turkmen-Latin-BGN |
Turkmen to Latin |
Ukrainian-Latin-BGN |
Ukrainian to Latin |
Uzbek-Latin-BGN |
Uzbek to Latin |
iex> Unicode.Transform.transform("Москва", transform: "Russian-Latin-BGN")
{:ok, "Moskva"}Uses the United Nations Group of Experts on Geographical Names(United Nations Group of Experts on Geographical Names)[https://en.wikipedia.org/wiki/United_Nations_Group_of_Experts_on_Geographical_Names] Romanization rules.
| Transform ID | Description |
|---|---|
Greek_Latin_UNGEGN |
Greek to Latin (UNGEGN standard) |
:from |
:to |
Transform ID | Description |
|---|---|---|---|
:de |
:ASCII |
de-ASCII |
Context-sensitive German umlaut conversion |
iex> Unicode.Transform.transform("Ä ö ü", transform: "de-ASCII")
{:ok, "AE oe ue"}
# Also available via :from/:to
iex> Unicode.Transform.transform("Ä ö ü", from: "de", to: "ASCII")
{:ok, "AE oe ue"}These apply language-specific casing rules (e.g., Turkish dotted/dotless I, Lithuanian accent handling).
| Transform ID | Description |
|---|---|
az-Lower |
Azerbaijani lowercase |
az-Title |
Azerbaijani titlecase |
az-Upper |
Azerbaijani uppercase |
el-Lower |
Greek lowercase |
el-Title |
Greek titlecase |
el-Upper |
Greek uppercase |
lt-Lower |
Lithuanian lowercase |
lt-Title |
Lithuanian titlecase |
lt-Upper |
Lithuanian uppercase |
nl-Title |
Dutch titlecase (IJ digraph handling) |
tr-Lower |
Turkish lowercase |
tr-Title |
Turkish titlecase |
tr-Upper |
Turkish uppercase |
iex> Unicode.Transform.transform("İSTANBUL", transform: "tr-Lower")
{:ok, "istanbul"}| Transform ID | Description |
|---|---|
Ethiopic-Latin-ALALOC |
Ethiopic to Latin (ALA-LOC standard) |
Ethiopic-Latin-Aethiopica |
Ethiopic to Latin (Aethiopica) |
Ethiopic-Latin-BetaMetsehaf |
Ethiopic to Latin (Beta Maṣāḥǝft) |
Ethiopic-Latin-ES3842 |
Ethiopic to Latin (ES 3842) |
Ethiopic-Latin-IES_JES_1964 |
Ethiopic to Latin (IES/JES 1964) |
Ethiopic-Latin-Lambdin |
Ethiopic to Latin (Lambdin) |
Ethiopic-Latin-SERA |
Ethiopic to Latin (SERA) |
Ethiopic-Latin-TekieAlibekit |
Ethiopic to Latin (Tekie Alibekit) |
Ethiopic-Latin-Xaleget |
Ethiopic to Latin (Xaleget) |
Ethiopic-Braille-Amharic |
Ethiopic to Braille (Amharic) |
Ethiopic-Cyrillic-Gutgarts |
Ethiopic to Cyrillic (Gutgarts) |
Ethiopic-Ethiopic-Gurage |
Ethiopic variant (Gurage) |
Ethiopic-Musnad |
Ethiopic to Musnad |
Transforms to and from the [International Phonetic Alphabet](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet#:~:text=The%20International%20Phonetic%20Alphabet%20(IPA,for%20the%20sounds%20of%20speech.).
| Transform ID | Description |
|---|---|
IPA-XSampa |
IPA to X-SAMPA notation |
am-am_FONIPA |
Amharic to IPA |
ar-ar_FONIPA |
Arabic to IPA (via und_FONIPA-ar) |
blt-fonipa-t-blt |
Tai Dam to IPA |
ch-ch_FONIPA |
Chamorro to IPA |
chr-chr_FONIPA |
Cherokee to IPA |
cs-cs_FONIPA |
Czech to IPA |
cy-fonipa-t-cy |
Welsh to IPA |
dsb-dsb_FONIPA |
Lower Sorbian to IPA |
eo-eo_FONIPA |
Esperanto to IPA |
es-es_FONIPA |
Spanish to IPA |
fa-fa_FONIPA |
Persian to IPA |
hy-hy_FONIPA |
Armenian to IPA |
hy_AREVMDA-hy_AREVMDA_FONIPA |
Western Armenian to IPA |
ia-ia_FONIPA |
Interlingua to IPA |
kk-kk_FONIPA |
Kazakh to IPA |
ky-ky_FONIPA |
Kyrgyz to IPA |
la-la_FONIPA |
Latin to IPA |
my-my_FONIPA |
Myanmar to IPA |
nv-nv_FONIPA |
Navajo to IPA |
pl-pl_FONIPA |
Polish to IPA |
ro-ro_FONIPA |
Romanian to IPA |
sat_Olck-sat_FONIPA |
Santali (Ol Chiki) to IPA |
si-si_FONIPA |
Sinhala to IPA |
sk-sk_FONIPA |
Slovak to IPA |
ta-ta_FONIPA |
Tamil to IPA |
tlh-tlh_FONIPA |
Klingon to IPA |
ug-ug_FONIPA |
Uyghur to IPA |
vec-vec_FONIPA |
Venetian to IPA |
xh-xh_FONIPA |
Xhosa to IPA |
zu-zu_FONIPA |
Zulu to IPA |
These use IPA as a pivot to approximate transliteration between languages.
| Transform ID | Description |
|---|---|
cs_FONIPA-ja |
Czech IPA to Japanese |
cs_FONIPA-ko |
Czech IPA to Korean |
es_FONIPA-am |
Spanish IPA to Amharic |
es_FONIPA-es_419_FONIPA |
Spanish IPA to Latin American Spanish IPA |
es_FONIPA-ja |
Spanish IPA to Japanese |
es_FONIPA-zh |
Spanish IPA to Chinese |
pl_FONIPA-ja |
Polish IPA to Japanese |
ro_FONIPA-ja |
Romanian IPA to Japanese |
sk_FONIPA-ja |
Slovak IPA to Japanese |
und_FONIPA-ar |
IPA to Arabic |
und_FONIPA-chr |
IPA to Cherokee |
und_FONIPA-fa |
IPA to Persian |
These transform between specific language pairs, often using phonetic approximation.
| Transform ID | Description |
|---|---|
am-ar |
Amharic to Arabic |
am-chr |
Amharic to Cherokee |
am-fa |
Amharic to Persian |
ch-am |
Chamorro to Amharic |
ch-ar |
Chamorro to Arabic |
ch-chr |
Chamorro to Cherokee |
ch-fa |
Chamorro to Persian |
cs-am |
Czech to Amharic |
cs-ar |
Czech to Arabic |
cs-chr |
Czech to Cherokee |
cs-fa |
Czech to Persian |
cs-ja |
Czech to Japanese |
cs-ko |
Czech to Korean |
eo-am |
Esperanto to Amharic |
eo-ar |
Esperanto to Arabic |
eo-chr |
Esperanto to Cherokee |
eo-fa |
Esperanto to Persian |
es-am |
Spanish to Amharic |
es-ar |
Spanish to Arabic |
es-chr |
Spanish to Cherokee |
es-fa |
Spanish to Persian |
es-ja |
Spanish to Japanese |
es-zh |
Spanish to Chinese |
es_419-am |
Latin American Spanish to Amharic |
es_419-ar |
Latin American Spanish to Arabic |
es_419-chr |
Latin American Spanish to Cherokee |
es_419-fa |
Latin American Spanish to Persian |
es_419-ja |
Latin American Spanish to Japanese |
es_419-zh |
Latin American Spanish to Chinese |
hy-am |
Armenian to Amharic |
hy-ar |
Armenian to Arabic |
hy-chr |
Armenian to Cherokee |
hy-fa |
Armenian to Persian |
hy_AREVMDA-am |
Western Armenian to Amharic |
hy_AREVMDA-ar |
Western Armenian to Arabic |
hy_AREVMDA-chr |
Western Armenian to Cherokee |
hy_AREVMDA-fa |
Western Armenian to Persian |
ia-am |
Interlingua to Amharic |
ia-ar |
Interlingua to Arabic |
ia-chr |
Interlingua to Cherokee |
ia-fa |
Interlingua to Persian |
it-am |
Italian to Amharic |
it-ja |
Italian to Japanese |
ja_Latn-ko |
Romanized Japanese to Korean |
ja_Latn-ru |
Romanized Japanese to Russian |
kk-am |
Kazakh to Amharic |
kk-ar |
Kazakh to Arabic |
kk-chr |
Kazakh to Cherokee |
kk-fa |
Kazakh to Persian |
ky-am |
Kyrgyz to Amharic |
ky-ar |
Kyrgyz to Arabic |
ky-chr |
Kyrgyz to Cherokee |
ky-fa |
Kyrgyz to Persian |
my-am |
Myanmar to Amharic |
my-ar |
Myanmar to Arabic |
my-chr |
Myanmar to Cherokee |
my-fa |
Myanmar to Persian |
pl-am |
Polish to Amharic |
pl-ar |
Polish to Arabic |
pl-chr |
Polish to Cherokee |
pl-fa |
Polish to Persian |
pl-ja |
Polish to Japanese |
rm_SURSILV-am |
Romansh (Sursilvan) to Amharic |
rm_SURSILV-ar |
Romansh (Sursilvan) to Arabic |
rm_SURSILV-chr |
Romansh (Sursilvan) to Cherokee |
rm_SURSILV-fa |
Romansh (Sursilvan) to Persian |
rm_SURSILV-rm_FONIPA_SURSILV |
Romansh (Sursilvan) to IPA |
ro-am |
Romanian to Amharic |
ro-ar |
Romanian to Arabic |
ro-chr |
Romanian to Cherokee |
ro-fa |
Romanian to Persian |
ro-ja |
Romanian to Japanese |
ru-ja |
Russian to Japanese |
ru-zh |
Russian to Chinese |
sat-am |
Santali to Amharic |
sat-ar |
Santali to Arabic |
sat-chr |
Santali to Cherokee |
sat-fa |
Santali to Persian |
si-am |
Sinhala to Amharic |
si-ar |
Sinhala to Arabic |
si-chr |
Sinhala to Cherokee |
si-fa |
Sinhala to Persian |
sk-am |
Slovak to Amharic |
sk-ar |
Slovak to Arabic |
sk-chr |
Slovak to Cherokee |
sk-fa |
Slovak to Persian |
sk-ja |
Slovak to Japanese |
tlh-am |
Klingon to Amharic |
tlh-ar |
Klingon to Arabic |
tlh-chr |
Klingon to Cherokee |
tlh-fa |
Klingon to Persian |
xh-am |
Xhosa to Amharic |
xh-ar |
Xhosa to Arabic |
xh-chr |
Xhosa to Cherokee |
xh-fa |
Xhosa to Persian |
zu-am |
Zulu to Amharic |
zu-ar |
Zulu to Arabic |
zu-chr |
Zulu to Cherokee |
zu-fa |
Zulu to Persian |
| Transform ID | Description |
|---|---|
Any-Accents |
Add accents to Latin text |
Any-Publishing |
Typographic improvements (smart quotes, dashes) |
Jamo-Latin |
Korean Jamo to Latin |
Latin-ConjoiningJamo |
Latin to Conjoining Jamo (bidirectional) |
Latin-NumericPinyin |
Latin pinyin to numeric tone pinyin |
Pinyin-NumericPinyin |
Pinyin with tone marks to numeric tones |
Jpan-Latn |
Japanese (mixed scripts) to Latin |
ug-Latin |
Uyghur to Latin |
si-si_Latn |
Sinhala to Sinhala Latin |
| Transform ID | Description |
|---|---|
my-t-my-d0-zawgyi |
Myanmar Unicode to Zawgyi encoding |
my-t-my-s0-zawgyi |
Zawgyi encoding to Myanmar Unicode |
| Transform ID | Description |
|---|---|
Simplified-Traditional |
Simplified Chinese to Traditional Chinese |
uz_Cyrl-uz_Latn |
Uzbek Cyrillic to Uzbek Latin |
ru_Latn-ru-BGN |
Romanized Russian back to Russian Cyrillic |
ha-ha_NE |
Hausa to Hausa (Niger variant) |
yo-yo_BJ |
Yoruba to Yoruba (Benin variant) |
mn-mn_Latn-MNS |
Mongolian to Mongolian Latin (MNS) |
zh_Latn_PINYIN-ru |
Pinyin to Russian |