2006-03-13

Sinhala Unicode on GNU/Linux

Update: in almost all GNU/Linux distributions released in the last two years, most if not all of the following settings are already done. You have only to install the font and an input method. Please check this page for more details.

Here are the steps to get Sinhala working on GNU/Linux. If you are running Debian or Ubuntu, there is an easier way. Most of the steps will have to be skipped on modern distributions, as Sinhala is mostly `enabled' in them.

Also, this guide assumes reasonable experience in using the GNU/Linux environment. If you think you are a newbie, please get a Guru involved... ;-)

Sinhala/Sri Lanka Locale for Glibc

This is a file `si_LK' in /usr/share/i18n/locales/. If it's not there, download it here.

If there is a /usr/share/i18n/SUPPORTED file in your system, make sure that there is an entry `si_LK UTF-8' in an alphabatically suitable place.

If you are using a recent version of glibc locales (e.g.: locales package on Debian Etch / Sid), si_LK is included and there is no need to download it. Hopefully, other distros will begin to ship it, too.

Aliases for Glibc Locale (Optional)

Add these lines to /etc/locale.alias so that you can refer to si_LK.UTF-8 locale as si, si_LK or sinhala. If this file is not there, skipping this step is harmless.

sinhala  si_LK.UTF-8
si       si_LK.UTF-8
si_LK    si_LK.UTF-8

Generating the Glibc Locale (Debian based systems)

Non-Debian users should skip this step.

Run `dpkg-reconfigure locales'. Select si_LK.UTF-8 locale and other UTF-8 locales (e.g.: en_US.UTF-8, en_GB.UTF-8). Make sure to select a UTF-8 locale (not necessarily si_LK) as the default locale.

Generating the Glibc Locale (non-Debian systems)

Debian users should skip this step.

Generate the si_LK.UTF-8 locale by running:

localedef -i si_LK -f UTF-8 -A /etc/locale.alias si_LK

X-window Locale

Most of the X window programs used on GNU/Linux (GNOME, GTK, QT and KDE apps) are using Glibc locale, and there is no need to add a full fledged locale to X. However, if X Window system doesn't know about si_LK, X programs will complain of it as an unknown locale. A common practice is just to alias such locales to en_US.UTF-8 to avoid this.

If you are running xorg 6.9.0 (or later) or a recent version of XFree86, this is already done, please jump to the next step.

Otherwise, locate the files locale.dir and compose.dir in /usr/X11R6/lib/X11/locale/ and add suitable lines. Notice that you need to add two lines in each file, one without a colon:

en_US.UTF-8/XLC_LOCALE       si_LK.UTF-8
and one with a colon.
en_US.UTF-8/XLC_LOCALE:      si_LK.UTF-8

Lines in compose.dir are similar, except `XLC_LOCALE' is replaced with `Compose'.

Sinhala Unicode Fonts

It's good to see more and more new Unicode Sinhala fonts are being released. Unfortunately, the FreeFont project includes sinhala characters that don't have correct rendering tables, and sometimes this font takes precedance over other correct unicode fonts, making wrong rendering of kombuwa and other specially handled glyphs. A quick workaround would be to remove freefont package (sometimes called ttf-freefont) if it's installed.

Downloading the LK-LUG Unicode font and copying it to .fonts/ directory in your home directory is sufficient for most cases. Copy it to /usr/local/share/fonts/ to make it available globally.

I have written a more detailed description about fonts in X Windows here.

Sinhala Rendering in KDE/QT

If you are using a version of QT later than 3.3.4, Sinhala should be working fine. There was one bug in old version of of QT, which is now fixed, both in QT 3 and 4 series.

Sinhala Rendering in GNOME/GTK

If your Pango version is later than 1.8.1, Sinhala should be working fine. 1.8.0 also supports Sinhala with a bug, and Harshula's fix went into 1.8.1.

Touching letters are also now supported.

Firefox

Firefox renderes Sinhala properly only if it's compiled with Pango. 1.0.x needs a patch, but Pango comes standard in 1.5.x series. If you are using Firefox in RedHat / Fedora, it comes with the Pango patch, and there is nothing extra to be done.

The easiest is to upgrade Firefox to 1.5 (hoping that it's compiled with Pango support) and set the environment variable MOZ_ENABLE_PANGO to 1.

Sinhala Input

Earlier, we used seperate input method modules for GTK and QT, but now they are obsoleted by SCIM and M17N input methods. Here are the steps to install them.

  • Install SCIM
  • Install SKIM if you use KDE
  • Get SCIM transliterated input method for Sinhala and install it
  • If you like to use Sinhala input method modules from M17N project, install SCIM-M17N bridge, and M17N input method modules.

Running skim in KDE or scim in GNOME should create an icon on system tray that can be used to select the language. After that, you can use ctrl+space to switch between normal ASCII (English) input and SCIM input.

SCIM 1.4.4 doesn't have a Sinhala catagory, so Sinhala input methods are listed under `Other'. It's fixed now and a seperate menu for Sinhala should be available in the next version.

4 comments:

கா. சேது (K. Sethu) said...

Very well written
K. Sethu

SrimalJ said...

There is no
/usr/share/i18n/SUPPORTED
/etc/locales.alis
in RHEL4 .
Whats the alternative?

Anuradha said...

SUPPORTED and locale.alias files are not mandatory. Changed the text to reflext that. Thanks Srimal!

கா. சேது (K. Sethu) said...

Am I right in guessing that what unicode.org releases as the Unicode versions (with number of text files for universal character database etc) are only standards and it is up to the operating system writers to make and implement the necessary code page source codes and binaries? - or is it the function of i18n packages? Hope I am not confused and confusing -

But ok, let me ask in another way - All distros I have installed in my PC(they are all earlier than around ~Oct 2005) seem to be having Unicode 3.x something. Say if I want to update the unicode to be the latest 4.1.x so that I can make use of newly added symbols / characters in any application, is it like I can download a patch or updater file(s) from unicode.org or somewhere else that would make the full system conforming to the latest Unicode or is it more complex procedure than that?

Also is there any command in linux platforms to find what Unicode version i am working with?

Thanks in advance
~Sethu