跳到主要内容

Cangjie & Chu Bong Foo - part 1

· 阅读需 4 分钟
Jimmy Chu
Site Author

On one occasion in Taiwan, people around me were amazed that we Hong Kong people type Chinese using the Cangjie (倉頡) method. I realize it is uncommon for people to learn how to deconstruct Chinese characters and type them with the Cangjie encoding. Further thinking about it, it is a brilliant idea that Chinese characters can be deconstructed this way. It makes me wonder how the inventor came up with this input methodology. So I dug deeper into its history and found an inspiring story.

The architect Chu Bong Foo (朱邦復), during the 1960s, was working in a publishing house in Brazil. He saw that the publishing house could finish an English book publication in 3 to 5 days. From his youth experience, he knew it would take about two to three months to get a Chinese book published, converting from the author's manuscript to a publishing form. This is because, at that time, there was no systematic way to index Chinese characters. English words are composed of 26 alphabets, and how to index them is well-established. However, there is no systematic way of indexing Chinese characters, and there are over 14k+ most frequently used characters and 40k+ frequently used characters.

So Chu Bong Foo first studied the most frequently used characters, analyzing their shape and deconstructing them into 24 basic shapes (radicals) and 76 auxiliary shapes, often rotated or transposed versions of the basic shapes. Then, he used the following character roots to represent them.

Philosophical Group

  • A: 日 - Sun
  • B: 月 - Moon
  • C: 金 - Gold
  • D: 木 - Wood
  • E: 水 - Water
  • F: 火 - Fire
  • G: 土 - Earth

Stroke Group

  • H: 竹 - apostrophe / slant
  • I: 戈 - dot
  • J: 十 - cruciform
  • K: 大 - cross
  • L: 中 - vertical
  • M: 一 - horizontal
  • N: 弓 - hook

Body Parts Group

  • O: 人 - person
  • P: 心 - heart
  • Q: 手 - hand
  • R: 口 - mouth

Character Shape Group

  • S: 尸 - corpse
  • T: 廿 - twin
  • U: 山 - mountain
  • V: 女 - female
  • W: 田 - field
  • Y: 卜 - fortune telling

Collision / Difficult Group

  • X: 難 - difficult
  • Z: 重 - unknown/collision

Each Chinese character can be represented by a sequence of one to five character roots above, and the duplication rate (given a valid sequence returning more than one character) is less than 10%. Cangjie encoding is the most efficient way of inputting Chinese characters.

With such a system, publication houses can now input Chinese characters much like an English word, and Chinese characters can be sorted in a particular order and indexed efficiently. This has increased the Chinese publication process.

Later in Mr. Chu Bong Foo's life, he was also involved in the following projects:

  • Chinese integration in modern computer systems, back in the 1980s. When the computer itself is not too popularized yet.

  • Launched the first Chinese e-book system in 2001, when the first version of Kindle Reader was launched six years later in 2007.

  • Launched an AI system that takes a Chinese poem text as input and generates a 3D animation, all without human intervention. The process involves natural language processing, Chinese character interpretation, and generative AI, which we talk about today, all back in 2011! For details, refer to his announcement post at that time and the animation 記承天寺 (original download link, or the Youtube video).

Mr. Chu Bong Foo is truly a visionary.

References