| SuperMemo 2004: Bug in UTF-8 encoding |
All users of SuperMemo 2004 are encouraged to update to SuperMemo 2004 Build 12.03 dated Sep 15, 2004 (or later).
The update is particularly important for users who use languages other than English. Some text encoding strategies adopted by earlier versions will not be supported in the future. Updating at later time may require conversion of some foreign language texts to retain compatibility with future versions. We sincerely apologize for this reversal.
If you do not care about technical details explained below, see Recommendations
Introduction
On September 7, 2004, three days after the release of SuperMemo 2004, we have documented a bug in UTF-8 encoding in the final release of the program. This bug resulted in ambiguous interpretation of UTF-8 encoded texts stored in SuperMemo registry. It would primarily affect students of Asian languages who used plain text components, as opposed to HTML components.
Symptoms
Collections upgraded from SuperMemo 2002 might display incorrect texts. In particular, short Asian texts in some encodings may yield 10-20% incorrect conversion rate.
For example, when learning Japanese, the following text:
![]()
would be displayed as
![]()
Reasons
SuperMemo 2004 adopted UTF-8 encoding in the registry as the most efficient way of representing Unicode. However, some short MBCS strings used by plain text components from earlier versions of SuperMemo can form legitimate UTF-8 sequences and yield seemingly random texts on decoding.
Solution
Updated SuperMemo 2004 will not attempt to convert legacy collections created with earlier SuperMemos. Instead, it will allow of un-encoded texts in plain texts components. This implies that both UTF-8 encoded and plain ANSI strings will coexist at the registry level. Users will be able to use old-style code page fonts with plain text components to create collections for learning languages (with the added benefit of the possibility to display non-Latin titles in the contents window, registries, browser, etc.). At the same time, users who prefer Unicode and HTML-based incremental reading, will still be able to fully benefit from UTF-8 encoding at the registry level.
Side effects
Q: A set of [...] peptides are generated from ß-APP by proteases known as the ß- and gamma-secretases
A: ß-amyloid (Aß)May show up as:
Q: A set of [...] peptides are generated from ß-APP by proteases known as the ß- and gamma-secretases
A: ß-amyloid (Aß)Right-click over the encoded text, and choose Text : Convert : Decode UTF-8 to revert this change.
It is highly recommended all users of SuperMemo 2004 update to the newest version available. Here are the steps: