This page will soon move into the Lab Pass.
Chinese text consists of a continuous sequence of characters, which can often be cut to form words in different ways, and in turn influences the pronunciation. The process of cutting up the sequence into chunks corresponding to words is called word segmentation (分詞).
Computationally this requires a dataset containing all known words (often tens of millions of entries), specialized algorithms, and can be relatively slow. This is not something that can be embedded into a font.
The pre-processor currently segments any text you paste, using a state-of-the-art, unreleased process tailored for Cantonese. Providing corrected segmented text as input (using the | notation designed into Canto Font) can reduce the error rate from 0.3% to 0.1%.
