How to write translation files

TnFOX provides what is probably the most powerful and intuitive automatic human language translation for any C++ library currently available. While it is mostly source compatible with Qt, it also provides a good deal more power.

A tool is provided called TranslateViaGoogle.py which screen scrapes the output from Google Translations in order to automate the translations for you into all supported languages. Simply call with the output and input files and much of the work will be done for you - though you WILL need to manually refine the output afterwards. Google Translate will only do so many translations for you before it realises that it is an automated script and will refuse to provide any more - thus the tool can restart a partially completed automatic translation, or indeed it can fill in the blanks of a manually translated file. You need the BeautifulSoup Python package as a prerequisite for operation.

Much of that power derives from the intuitive and powerful method of specifying translations. The file format is basically as follows:

"<program original text literal>":
    <langid>: "<generic translation for this language>"
    <langid>_<regionid>: "<localised translation for this language>"
    ...
    class="<class name>":
        <langid>: up|"<specific translation for this class in this language>"
        ...
    srcfile="<source file name>":
        ...
    hint="<hint>":
        ...

"<next literal>":
    ...
As of v0.86 of TnFOX (v1.6 FOX based builds only), you can write this file in UTF-8, UTF-16 or UTF-32 and TnFOX will automatically detect which format it is in. Therefore, you can use any text editor you like (including the special TnFOX edition of Adie which can save out in any of these formats).

More powerful operators are the 1, 2, 3 ... which let you specialise a translation based on parameter insert value eg;

"%1 pretty round teapots":
    ES: "%1 teteras redondas bonitas"
    %1=1:
        EN: "%1 pretty round teapot"
        ES: "%1 tetera redonda bonita"
You can also arbitrarily combine modifiers eg;
    srcfile="<source file name>":class="<class name>":hint="<hint>"
        <langid>: "<translation>"
Note also that the tabulation is important - and they must be tab (0x09) characters, not spaces. Note also that you can substitute the word "up" for a particular language - this makes it use the next highest available translation for that language id. If you look at the output of CppMunge.py, it will look something like this:
"Device reopen has different mode":
    ES: "El modo de operacion no es iqual que antes"
    srcfile="QBuffer.cxx":class="QBuffer":
        ES: up
    srcfile="QFile.cxx":class="QFile":
        ES: up
    srcfile="QGZipDevice.cxx":class="QGZipDevice":
        ES: up
By convention, all source files and classes are listed in the translation file even though they usually are "up". Why? Because this means you don't accidentally miss a special translation for some class or source file as all the classes and source files using a particular literal are listed.

Using this format, you have amazing flexibility with specifying special translation situations for some languages and not for others, in a very large variety of circumstances. It is for this reason that I believe TnFOX has the most superior translation facilities available today!

Language codes

Ideally, a language code is of the form language[_territory][], where language is a two-letter ISO 639 language code and territory is a two-letter ISO 3166 country code. However, due to incompatibilities in the implementation of the standard C library's setlocale() function, it isn't as easy as that.

Generally speaking you will not specialise past language and at most territory (currency differences eg; between pre and post-euro Europe tend to be held in the modifier. If you are on a POSIX system, run "locale -a" off a command line). Theoretically, Microsoft Windows reports the same ids as GNU/Linux correctly but I have found during testing that your milage may vary. If any really annoying incompatibilities arise, let me know and I'll implement id mapping to make them portable.

Below is a list of common language ids bewtween Microsoft Windows and GNU/Linux:

cs (Czech),     da (Danish),   nl (Dutch),  en (English),   fi (Finnish),
fr (French),    de (German),   el (Greek),  hu (Hungarian), is (Icelandic),
it (Italian),   ja (Japanese), ko (Korean), no (Norwegian), pl (Polish),
pt (Portugese), ru (Russian),  sk (Slovak), es (Spanish),   sv (Swedish),
tr (Turkish).

This is a pitiful subset, missing out important ones such as Arabic, Chinese and Indian languages. I also find it depressing that the above nations are more or less the major geopolitical players around the time of the first world war!

Note:
Windows also supports: chs (Chinese simplified) & chn (Chinese traditional).

GNU/Linux also supports: ar (Arabic)

As for country codes, I have found the greatest variance here. Below is a list as defined on the ISO 3166 website:

AFGHANISTAN AF LIBYAN ARAB JAMAHIRIYA LY
ALBANIA AL LIECHTENSTEIN LI
ALGERIA DZ LITHUANIA LT
AMERICAN SAMOA AS LUXEMBOURG LU
ANDORRA AD MACAO MO
ANGOLA AO MACEDONIA MK
ANGUILLA AI MADAGASCAR MG
ANTARCTICA AQ MALAWI MW
ANTIGUA AND BARBUDA AG MALAYSIA MY
ARGENTINA AR MALDIVES MV
ARMENIA AM MALI ML
ARUBA AW MALTA MT
AUSTRALIA AU MARSHALL ISLANDS MH
AUSTRIA AT MARTINIQUE MQ
AZERBAIJAN AZ MAURITANIA MR
BAHAMAS BS MAURITIUS MU
BAHRAIN BH MAYOTTE YT
BANGLADESH BD MEXICO MX
BARBADOS BB MICRONESIA FM
BELARUS BY MOLDOVA MD
BELGIUM BE MONACO MC
BELIZE BZ MONGOLIA MN
BENIN BJ MONTSERRAT MS
BERMUDA BM MOROCCO MA
BHUTAN BT MOZAMBIQUE MZ
BOLIVIA BO MYANMAR MM
BOSNIA AND HERZEGOVINA BA NAMIBIA NA
BOTSWANA BW NAURU NR
BOUVET ISLAND BV NEPAL NP
BRAZIL BR NETHERLANDS NL
BRITISH INDIAN OCEAN TERRITORY IO NETHERLANDS ANTILLES AN
BRUNEI DARUSSALAM BN NEW CALEDONIA NC
BULGARIA BG NEW ZEALAND NZ
BURKINA FASO BF NICARAGUA NI
BURUNDI BI NIGER NE
CAMBODIA KH NIGERIA NG
CAMEROON CM NIUE NU
CANADA CA NORFOLK ISLAND NF
CAPE VERDE CV NORTHERN MARIANA ISLANDS MP
CAYMAN ISLANDS KY NORWAY NO
CENTRAL AFRICAN REPUBLIC CF OMAN OM
CHAD TD PAKISTAN PK
CHILE CL PALAU PW
CHINA CN PALESTINIAN TERRITORY PS
CHRISTMAS ISLAND CX PANAMA PA
COCOS (KEELING) ISLANDS CC PAPUA NEW GUINEA PG
COLOMBIA CO PARAGUAY PY
COMOROS KM PERU PE
CONGO CG PHILIPPINES PH
COOK ISLANDS CK PITCAIRN PN
COSTA RICA CR POLAND PL
COTE D'IVOIRE CI PORTUGAL PT
CROATIA HR PUERTO RICO PR
CUBA CU QATAR QA
CYPRUS CY REUNION RE
CZECH REPUBLIC CZ ROMANIA RO
DENMARK DK RUSSIAN FEDERATION RU
DJIBOUTI DJ RWANDA RW
DOMINICA DM SAINT HELENA SH
DOMINICAN REPUBLIC DO SAINT KITTS AND NEVIS KN
ECUADOR EC SAINT LUCIA LC
EGYPT EG SAINT PIERRE AND MIQUELON PM
EL SALVADOR SV SAINT VINCENT AND THE GRENADINES VC
EQUATORIAL GUINEA GQ SAMOA WS
ERITREA ER SAN MARINO SM
ESTONIA EE SAO TOME AND PRINCIPE ST
ETHIOPIA ET SAUDI ARABIA SA
FALKLAND ISLANDS (MALVINAS) FK SENEGAL SN
FAROE ISLANDS FO SEYCHELLES SC
FIJI FJ SIERRA LEONE SL
FINLAND FI SINGAPORE SG
FRANCE FR SLOVAKIA SK
FRENCH GUIANA GF SLOVENIA SI
FRENCH POLYNESIA PF SOLOMON ISLANDS SB
FRENCH SOUTHERN TERRITORIES TF SOMALIA SO
GABON GA SOUTH AFRICA ZA
GAMBIA GM SPAIN ES
GEORGIA GE SRI LANKA LK
GERMANY DE SUDAN SD
GHANA GH SURINAME SR
GIBRALTAR GI SVALBARD AND JAN MAYEN SJ
GREECE GR SWAZILAND SZ
GREENLAND GL SWEDEN SE
GRENADA GD SWITZERLAND CH
GUADELOUPE GP SYRIAN ARAB REPUBLIC SY
GUAM GU TAIWAN TW
GUATEMALA GT TAJIKISTAN TJ
GUINEA GN TANZANIA TZ
GUINEA-BISSAU GW THAILAND TH
GUYANA GY TIMOR-LESTE TL
HAITI HT TOGO TG
HOLY SEE (VATICAN CITY STATE) VA TOKELAU TK
HONDURAS HN TONGA TO
HONG KONG HK TRINIDAD AND TOBAGO TT
HUNGARY HU TUNISIA TN
ICELAND IS TURKEY TR
INDIA IN TURKMENISTAN TM
INDONESIA ID TURKS AND CAICOS ISLANDS TC
IRAN IR TUVALU TV
IRAQ IQ UGANDA UG
IRELAND IE UKRAINE UA
ISRAEL IL UNITED ARAB EMIRATES AE
ITALY IT UNITED KINGDOM GB
JAMAICA JM UNITED STATES US
JAPAN JP URUGUAY UY
JORDAN JO UZBEKISTAN UZ
KAZAKHSTAN KZ VANUATU VU
KENYA KE VENEZUELA VE
KIRIBATI KI VIET NAM VN
KOREA KR VIRGIN ISLANDS VG
KUWAIT KW WALLIS AND FUTUNA WF
KYRGYZSTAN KG WESTERN SAHARA EH
LAO PEOPLE'S DEMOCRATIC REPUBLIC LA YEMEN YE
LATVIA LV YUGOSLAVIA YU
LEBANON LB ZAMBIA ZM
LESOTHO LS ZIMBABWE ZW
LIBERIA LR


(C) 2002-2008 Niall Douglas. Some parts (C) to assorted authors.
Generated on Fri Jun 13 21:55:17 2008 for TnFOX by doxygen v1.5.6