Hao Chen | 陈浩     Posts     About Me

Bytes, Text Encoding, and Precision in LLMs

Chen Hao posted on 01 Mar 2025

1. Why 1 Byte = 8 Bits?

  • A bit is a 0 or 1. Computers group 8 bits into a byte.
  • Why 8? IBM’s 1960s System/360 made it standard for 256 characters (ASCII). It’s a power of 2 (2^3), good for binary math.
  • Use: Basis for all data storage (kilobytes, megabytes).

2. Bytes in Text Encoding

  • ASCII: 1 byte (8 bits) per character. 256 options (e.g., “A”, “1”). English-only.
  • UTF-8: Variable. 1 byte for ASCII (English), 3 bytes for Chinese, up to 4 for rare symbols. Supports all languages.
  • UTF-16: 2 bytes usually, 4 for extras. Used in old Windows systems.
  • UTF-32: 4 bytes always. Bulky, rare.
  • Example: “Hello 世界” (Hello World) = 11 bytes in UTF-8 (5 + 2 spaces + 6).

3. Chinese vs. English in UTF-8

  • English: 1 byte per letter. “Life is good” = 11 bytes.
  • Chinese: 3 bytes per character, but concise. “生活好” = 9 bytes.
  • Trend: Chinese often uses fewer bytes for same meaning (e.g., “I enjoy reading books in the library” = 31 bytes vs. “我喜欢图书馆读书” = 15 bytes).
  • Exception: Short phrases like “Go now” (6 bytes) beat “现在走” (9 bytes).
  • Result: Chinese wins for complex content.

4. FP8 and FP16 in Large Language Models

  • FP16 (16-bit): 2 bytes per number. Half the memory of FP32 (32-bit). Fast on GPUs, used in training/inference (e.g., GPT-3 = 350 GB).
  • FP8 (8-bit): 1 byte. Two types: E4M3 (precision) and E5M2 (range). Cuts memory further (e.g., GPT-3 = 175 GB), 2x+ faster on H100 GPUs.
  • Trade-off: FP16 is safer, FP8 needs tuning to avoid accuracy loss.
  • Use: Both shrink LLMs, speed up training/inference. FP8 is newer, cutting-edge.

Summary

  • Byte: 8 bits, set by history.
  • Encoding: UTF-8 rules, 1-4 bytes per character.
  • Chinese: More efficient for meaning-heavy text.
  • FP16/FP8: 2 or 1 bytes, make AI lean and fast.
comments powered by Disqus