I worked endless hours to make FPDF display Chinese characters and whoever out there finds this article, I hope I can save you a few hours. Please consider this as a loose collection of tips and not as a complete tutorial.
FPDF and Chinese Language Out of the Box
FPDF does NOT work with Chinese characters out of the box, because FPDF uses PHP string functions like
substr() that do not work with Unicode charsets. It will mess up everything. You basically need a modification or extension of FPDF that supports Unicode.
MBFPDF (Multi-Byte FPDF) can do it and that’s what I used. UFPDF (Unicode FPDF) might also work. Google it to find the sources.
These libraries extend FPDF and overwrite methods that used the old string functions with Unicode-supporting string functions like
mb_substr(). However, these functions usually require the correct encoding of the string as a parameter and that is where it becomes interesting.
I fiddled around with MBFPDF and worked with another encoding that isn’t present in MBFPDF by default. You can download it at the end of this article.
Chinese Character Encodings
There are apparently a few different Unicode encodings and I neither want to go into too much detail, nor do I have the expertise for Unicode, but what the trick did for me is to use the encoding
UCS-4BE is a Unicode encoding which stores each character as a 32-bit (4 byte) integer. This accounts for the “UCS-4”; the “BE” prefix indicates that the integers are stored in big-endian order. The reason for this encoding is that, unlike smaller encodings (like UTF-8 or UTF-16), it requires no surrogate pairs
each character is a fixed size.
Chinese PDF Fonts
In PDFs, fonts get embedded into the file itself. So you need a font that can display Chinese characters. Not every font includes Chinese symbols and oftentimes, there are fonts that are dedicated for Chinese only.
In your OS, this whole behavior is probably hidden from you more or less. You see a Chinese-Latin string in your browser and think: Well, that’s one nice font. I gonna use it. See the following example?
This string actually contains three different fonts, but as a user, you usually do not notice. In TextEdit, you can check the fonts of every character:
So, even if looks like one font, it might actually consist of several fonts. The braces ( and ) are Chinese symbols, which came quite surprising to me. ABC is Helvetica.
Anyways, there are hardly any fonts that do include ASCII characters AND Chinese characters, mostly, because this would result in extremely big font files.
Arial Unicode MS might come close to a font file that includes Chinese characters as well as Latin characters, but the
.ttf-file is already around 55 MB. This font won’t work with FPDF directly, because FPDF still uses the wrong string-related functions as mentioned above.
However, you could try to make Arial Unicode MS work with MBFPDF or UFPDF. Thing is that you need to convert a TTF-file into various other files, which is a difficult task itself.
One note: Every font that you use in your PDF will be embedded. That means, if you embed Arial Unicode MS into a PDF, the PDF will be at least 55 MB, even if you have only once sentence in it.
Converting TTF-files to .AFM / .Z / .PHP
FPDF or their extensions like MBFPDF/UFPDF require font files in specific formats. There are two online converters that help you do the task so that you don’t have to install the necessary tools:
- fPDF Font File Converter at fruit-lab.de. This one does not work with files over a specific file size that I couldn’t figure out. It didn’t work with Arial Unicode MS because that file was too big.
- FPDF Font File Generation at fpdf.org. This tool accepts big font files, but it seems like it doesn’t produce all files required for MBFPDF/UFPDF.
Beware of Character Splitting (Substrings)
In my project, someone rewrote FPDF’s
MultiCell. It took a substring of the data, put one half into a cell and the other half into another cell. The result was that
mb_substr split the input data at the wrong byte so that it messed up half of the Chinese characters.
It took me actually around 8 hours to figure this problem out. Be extremely careful, if you have your original Chinese character string and process it in any way that might end up destroying the byte-sequence and thus the entire string. In my example below, you can find my version of MultiCell that worked for me.
Putting it Together and Download
So, if you just want to grab my slightly modified MBFPDF and try my example, go for it:
You’ll need a recent version of FPDF to make it work which you can find on the official website.
Please understand that I can’t give you much support for it. I still hope it saves you a few hours. If you found anything in this article helpful, please leave a note. I’d like to learn whether or not I could save you a few hours.