Unicode ribbon campaign
Welcome to the home page of the Unicode Ribbon Campaign.
The aim of this campaign is to reduce the time wasted educating people about international text and how programs should all be able to deal with it (no excuses).
If you received a link to this page, choose the category that applies best to you:
- You are a software developer and received a support request linking to this page;
- You received an e-mail you could not read or sent one the recipient could not read and your correspondent sent you here;
- You visited a web site that had unreadable characters, notified the webmaster whose sole answer was the address of this page;
- You maintain a web site and a visitor more or less strongly suggested that you read this page;
- You saw an unusual signature in an e-mail and it made you curious;
Software developers
Software that corrupts user data because it encountered unexpected non-ASCII characters is a plague. It must be eradicated and the developers punished. Joel Spolsky’s suggestion1 is particularly adequate, although only practical in a city like San Francisco where you can visit a WWII submarine for a few bucks. In the rest of the world, a broom cupboard can replace the submarine.
Software that cannot deal with international text and displays illegible characters instead, without corrupting any data, is less troublesome. It is only evil. Unfortunately for the developers, though, Joel Spolsky’s punishment applies to them too.
Bottom line:
The Unicode Ribbon Campaign recommends against paying for software that cannot deal with international text. It is nice, however, to notify the developers of the issue. A short introduction sentence and a link to this page should be sufficient.
If you are a software developer and your application cannot deal with international text, go and fix it now or prepare the gas mask.
What are you still doing here? I thought I just told you to go and fix your software!
E-mail clients and servers that do not support sending and receiving messages containing international characters hinder the evolution of the Internet and are therefore considered harmful. The handling of UTF-8 sent in Quoted-Printable and optionally 8-bit is required for an e-mail client to be considered decent.
Note: Outbreak2 using an Exchange Server in MAPI mode is considered harmful because regardless of how Outbreak is configured, Exchange encodes UTF-8 messages in Base64, which makes them more likely to be rejected by spam filters and wastes bandwidth.
Some mass mailing programs used to send newsletters cannot seem to know about Content-Type headers. They complacently send their rubbish in an unknown character set resulting in unreadable messages.
Bottom line:
If you have problems reading e-mails with Unicode content or if you cannot send a message with the proper content type headers, get a decent e-mail client. If you are using a webmail interface, show this page to the system administrator and make sure he reads the following paragraph.
If you are the administrator of a mail server or of a webmail system that does not support sending or receiving Unicode messages, update your software to the latest version. If it doesn’t fix the problem, send the URL of this page to the software’s developer.
If you are responsible for sending a newsletter, think of the bad publicity for your company if the message was not sent properly and either configure your software correctly, update it to the latest version or change it for a decent alternative.
Web browsers
Web browsers that do not support the UTF-8 character set must be wiped out. It is so comfortable to edit web pages in UTF-8 without needing to use HTML entities that nobody should be forced to do otherwise because their visitors use an obsolete browser.
Note: Fortunately, even Netscape 4 deals with UTF-8 content. It is therefore quite safe to write UTF-8 web pages.
Web sites
Some web sites manage to use a Unicode encoding but announce a non-Unicode one. I don’t know how it is possible that the webmaster doesn’t notice it, nor do I care; just fix it!
Unicode hall of shame
Burnout Menu
Two months ago, I tried Burnout Menu 1.1.4 and it irremediably messed up all my iCal calendars. It apparently read them as Mac Roman and wrote them back as UTF-8, several times in a row (probably each time the menu was drawn), like this:
débile
débile
débile
débile
débile
It would have been recoverable if the different entries had not all been messed up a different number of times. Fortunately, I noticed it before my calendar files hit the 4 terabyte (242 bytes) file size limit, which would have happened sooner than you may think if you remember the story about the man who is said to have invented chess3.
I immediately contacted the developers of the program. I was remarkably polite considering the nature and consequences of the bug. I have not yet received an answer and no update was released. This is, sadly, how some developers care about their international customers.
PHP
By default, PHP does not include any extension for character set conversion or quoted-printable encoding, which makes it a pain to send e-mail messages in French, German or Elvish. Since PHP must be recompiled to enable additional extensions, some negotiation with one’s hosting provider is often required. The only benefit of this deplorable limitation of PHP is that it helps rate hosting providers on their readiness to rebuild their PHP module on customer request.
Update: It looks like PHP 5 might address this issue as iconv is enabled by default. mbstring isn’t, though.
SQLiteManager
As its name tells, SQLiteManager 1.2 manages SQLite databases. Its problem is that it cannot correctly read back data that it wrote itself to the database.
It’s very easy to test. Launch SQLiteManager, create a new database and a table with one column. Then insert the following value:
Iñtërňâtīôʼnàlįşætiøŋ
Click OK and behold:
Iñtërňâtīôʼnàlįşætiøŋ
(Some people might think that using such a complex combination of strange characters is being too harsh on the software. They’d be plain wrong. Moreover, SQLiteManager cannot handle a simple word like “Débile” either.)
Considering that SQLite can be set at compile time to use ISO-8859-1 or UTF-8 and that SQLiteManager uses the UTF-8 variant, such a flaw is a shame. What’s worse, the developer has known about the issue for months but does not consider it to be a problem.
E-mail signature
The official Unicode Ribbon Campaign e-mail signature line is:
∞ Unicode Ribbon Campaign — No ASCII, anywhere ∞
∞ <https://ithink.ch/unicode> ∞
It can only by used in Unicode e-mails (obviously). Moreover, the Unicode Ribbon Campaign supersedes the now obsolete ASCII Ribbon Campaign. Therefore, e-mail messages sent in HTML, RTF, MS Word, or PDF are prohibited. These formats are allowed in attachments under the following conditions:
- attachments must be additional documents, not a replacement of the message body;
- the message body must therefore include all the relevant information, e.g. object, date and location of an event;
- MS Word and RTF attachments are allowed when the recipient is expected to modify the documents;
- HTML attachments should be limited to web site updates and HTML coding examples;
- PDF attachments should be used in the remaining situations, when the page layout is essential, meaning that the same function cannot be achieved with a plain text message.
The recipient should never have to open an attachment to know what a message is about. People used to sending e-mails with a subject like “Invitation” and a Word document for sole content are partly responsible for the dreadfully quick spreading of viruses like MyDoom last January.
Links of interest
1. The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky ↩
2. Outbreak: Sometimes incorrectly called Outlook: Daring Fireball — Good Times by John Gruber ↩
3. Try this at home:
echo é > e
iconv -f macintosh -t utf-8 e > f ; mv f e
Run the last line 10 times and you have a 13 kB file. Run it 10 more times and you have a nice 124 MB file. ↩