FPDF and Chinese Characters

I worked endless hours to make FPDF display Chinese characters and whoever out there finds this article, I hope I can save you a few hours. Please consider this as a loose collection of tips and not as a complete tutorial.

FPDF and Chinese Language Out of the Box

FPDF does NOT work with Chinese characters out of the box, because FPDF uses PHP string functions like strlen() or substr() that do not work with Unicode charsets. It will mess up everything. You basically need a modification or extension of FPDF that supports Unicode.

MBFPDF (Multi-Byte FPDF) can do it and that’s what I used. UFPDF (Unicode FPDF) might also work. Google it to find the sources.

These libraries extend FPDF and overwrite methods that used the old string functions with Unicode-supporting string functions like mb_strlen() and mb_substr(). However, these functions usually require the correct encoding of the string as a parameter and that is where it becomes interesting.

I fiddled around with MBFPDF and worked with another encoding that isn’t present in MBFPDF by default. You can download it at the end of this article.

Chinese Character Encodings

There are apparently a few different Unicode encodings and I neither want to go into too much detail, nor do I have the expertise for Unicode, but what the trick did for me is to use the encoding

UCS-4BE

This stackoverflow answer says:

UCS-4BE is a Unicode encoding which stores each character as a 32-bit (4 byte) integer. This accounts for the “UCS-4”; the “BE” prefix indicates that the integers are stored in big-endian order. The reason for this encoding is that, unlike smaller encodings (like UTF-8 or UTF-16), it requires no surrogate pairs each character is a fixed size.

Chinese PDF Fonts

In PDFs, fonts get embedded into the file itself. So you need a font that can display Chinese characters. Not every font includes Chinese symbols and oftentimes, there are fonts that are dedicated for Chinese only.

In your OS, this whole behavior is probably hidden from you more or less. You see a Chinese-Latin string in your browser and think: Well, that’s one nice font. I gonna use it. See the following example?

chinese-string

This string actually contains three different fonts, but as a user, you usually do not notice. In TextEdit, you can check the fonts of every character:

chinese-font

So, even if looks like one font, it might actually consist of several fonts. The braces ( and ) are Chinese symbols, which came quite surprising to me. ABC is Helvetica.

Anyways, there are hardly any fonts that do include ASCII characters AND Chinese characters, mostly, because this would result in extremely big font files.

Arial Unicode MS might come close to a font file that includes Chinese characters as well as Latin characters, but the .ttf-file is already around 55 MB. This font won’t work with FPDF directly, because FPDF still uses the wrong string-related functions as mentioned above.

However, you could try to make Arial Unicode MS work with MBFPDF or UFPDF. Thing is that you need to convert a TTF-file into various other files, which is a difficult task itself.

One note: Every font that you use in your PDF will be embedded. That means, if you embed Arial Unicode MS into a PDF, the PDF will be at least 55 MB, even if you have only once sentence in it.

Converting TTF-files to .AFM / .Z / .PHP

FPDF or their extensions like MBFPDF/UFPDF require font files in specific formats. There are two online converters that help you do the task so that you don’t have to install the necessary tools:

Beware of Character Splitting (Substrings)

In my project, someone rewrote FPDF’s MultiCell. It took a substring of the data, put one half into a cell and the other half into another cell. The result was that mb_substr split the input data at the wrong byte so that it messed up half of the Chinese characters.

It took me actually around 8 hours to figure this problem out. Be extremely careful, if you have your original Chinese character string and process it in any way that might end up destroying the byte-sequence and thus the entire string. In my example below, you can find my version of MultiCell that worked for me.

Putting it Together and Download

So, if you just want to grab my slightly modified MBFPDF and try my example, go for it:

Download MBFPDF & Working Example

You’ll need a recent version of FPDF to make it work which you can find on the official website.

Please understand that I can’t give you much support for it. I still hope it saves you a few hours. If you found anything in this article helpful, please leave a note. I’d like to learn whether or not I could save you a few hours.

Happy coding!