Seach Makes Easy

Labels:

Chapter 3 discusses this in detail. Here's a very informal version:

  • Unicode characters don't fit in 8 bits; deal with it.


  • 2 Byte order is only an issue in I/O.


  • If you don't know, assume big-endian.


  • Loose surrogates have no meaning.


  • Neither do U+FFFE and U+FFFF.


  • Leave the unassigned codepoints alone.


  • It's OK to be ignorant about a character, but not plain wrong.


  • Subsets are strictly up to you.


  • Canonical equivalence matters.


  • Don't garble what you don't understand.


  • Process UTF-* by the book.


  • Ignore illegal encodings.


  • Right-to-left scripts have to go by bidi rules

Comments (0)

Followers

Blog Archive