The Making of Cantonese Font Braille Edition 粵語字體點字版製作誌

廣東話點字 Cantonese Braille

香港盲人使用嘅凸點字，其實係標示緊一套拼音。譬如「學」嘅點字 ⠓⠻⠄ 係對應香港語言學會粵拼嘅 hok6:

⠓ h-
⠻ -ok
⠄ 6

但係，粵拼係 1993 年推出，而香港點字背後嘅拼音系統係七、八十年代製成，最近修訂係 1990 年。所以兩者並不能直接對應。

因為點字製作牽涉漢字轉換語音，而自動、準確 (> 99.5%) 噉達成呢個目標係粵語字體背後引擎獨有嘅技術。於是，早兩個月我就嘗試製作一套「粵語字體･點字版」，睇下可否減省大量校對嘅人力需要。

Cantonese Braille used by the blind in Hong Kong is a phonetic system. For example, the character 「學」corresponds to ⠓⠻⠄, or, h ok 6 in LSHK Jyutping:

⠓ h-
⠻ -ok
⠄ 6

However, Jyutping was published in 1993, and the phonetic system underpinning Cantonese Braille was prepared in the 1970-80s with last revision in 1990. The two schemes does not map directly.

Preparing Cantonese Braille involves automating Chinese characters into phonetic accurately (> 99.5%), which is a problem solved uniquely by the technology backing the Cantonese Font. So in June I tried my hand at preparing a Cantonese Font Braille variant, to see if we can vastly reduce the resources needed to prepare this material.

因為維基百科載有粵拼部件轉點字嘅對應表，所以唔係好難喺粵語字體加插一個後置程序嚟以點字渲染。做完之後，就用嚟做咗啲書面語、口語、文言文、圖案內嵌文字等檔案，並接觸香港服務盲人嘅機構，如心光學校、中央點字製作中心等，睇下可唔可以幫到佢哋。

Wikipedia provided a Jyutping-Braille table for each component, so it was not difficult to append a Braille rendering to the Cantonese Font. When this was completed, I prepared samples of documents and graphics with standard written Chinese, written Cantonese, ancient Chinese literature, and reached out to organizations servicing the blind to see if this would be of help to them.

現行點字製作 How Braille is Currently Prepared

中央點字製作中心好快就回應咗我，並邀請我到石硤尾中心參觀、了解製作流程。

盲人好多係成年之後喪失(部份)視力，而亦都唔識摸點字。基本上係先天、完全睇唔到嘅人士先會由細喺學校學點字，所以實際用點字嘅人可能全香港只有幾千。所以我有啲詫異中央點字製作中心嘅規模：五十全職人員、數千呎工場、並自家擁有各適其適嘅工業級點字印刷工具。

更加令我驚訝嘅係製作嘅系統。點字書刊大部份係用壓機 embosser 壓出嚟嘅；呢啲機接受嘅係 ascii （英文、數字、符號）編碼，每一個 ascii 代表 64 種點字嘅其中一種。

又用返「學」做例子，機器接受嘅編碼係 h]' 對應前音 h-, 後音 -ok, ‘ = 第九聲。點字嘅拼音系統係有分入聲嘅。

製作書籍嘅時候，分工包括一名職員人手將中文書刊打成漢字文字檔，然後（可能使用類似【亮點】嘅轉換器）將漢字轉成 ASCII 編碼，然後人手對比漢字-ASCII 作第一步校對。

因為廣東話點字並唔用空格分字，校對嘅過程係睇下噉樣嘅碼有冇出錯：

The Central Braille Preparation Center quickly responded to my query, and invited me to visit their premise in Shek Kip Mei to better understand the preparation process.

Many people are not born blind, but become blind in adulthood (accident, diabetes etc). They are usually not Braille users. Braille users are usually total blind from birth, and learnt how to touch-read in school. This population is small, maybe numbering in the thousands over all of Hong Kong. So I was somewhat surprised at the scale of the Center, which has all the production machines on a 5000 sq ft (?) premise and a full time staff of fifty.

What was more surprising was the preparation. Most of Braille publications were pressed by an embosser, and these machines take ASCII codes (a mix of latin, numerals, and symbols), each ASCII symbol corresponding to one of the 64 possible (6-dot) Braille arrangement.

Using 「學」as an example again, the machines would take h]' as the input, where h stands for h-, ] stands for the final ], and ‘ for tone-9. Yes, the Braille phonetic uses a nine-tone system.

To prepare a book, a staff would type the publication into a plain text document, and then (probably using a conversion tool like 亮點) convert this plain text to ASCII, and finally proof-read the Chinese text against the ASCII. Because Cantonese Braille is special in that it does not use space to separate characters, the proof-reading process is about checking whether there are errors in… this:

j9h;'hg'j$'hg'j5'g$'h;'k_1x4'hj@n>'p.hg'm+,j$'sx>kbs41hg' )+@ )+@)+@ x$')+@)+@ x$'pk" mb,)+@ mb,)+@fqj

一行行人行入銀行㩒錢去南北行買人參參加善行

做過粵拼校對嘅人，都知道當漢字同拼音分開擺嘅時候，相對於漢字-拼音擺埋一齊，難度會增加。而做過幾千幾萬字粵拼校對嘅人會知道，要執晒所有錯處、要起碼睇兩三輪。

而「亂碼」校對真係再地獄級難度。參觀時介紹、示範嘅校對介面，係一個純文字嘅 DOS 編輯器；工具方面提供零輔助。第一浸漢字轉機器碼嘅工具係 80 年代起建，之後經歷 Big5 變成 Unicode 等，所以擴充區嘅字元未必齊；而轉換準則係以書面語文讀為基準，基本上係冇可能處理粵文/口語（如劇本）或文言文（因為呢啲等同以機器碼重新寫過）。

視全員工校對完機器碼之後，個（純文字）檔案就交棒俾盲人員工，用電子點字閲讀器進行另一次校對。

一頁上限 250 字，一年合約嘅承包量（好似係）250,000 頁，過去四十年、每年千萬字都係噉喺呢層樓揼出嚟。‡

Jyutping proof-reading is made very difficult when the romanization is placed away from the characters; and folks who have proof-read a few thousands characters would know that even several passes are not enough. Those numeral tones are just not very intuitive.

And proof-reading machine ASCII code is truly another level of torment. In the tour, I was shown the UI of proof-reading: it is a DOS era text-editor, providing zero support. The initial CJK-to-ASCII conversion uses inherited tooling from 1980s, in the Big5 encoding era; many of the Unicode Extension Plane characters are missing. The readings are dictionary reading with citation tones; preparing Cantonese text (e.g., screen-play) or classical literary text requires effectively re-writing from scratch in ASCII and out of the transcribing scope.

When sighted proof-readers have completed their proof-reading, the ASCII plain-text is hand off onto blind staff, who use an electronic Braille display to do another round of proof-reading.

A typical page has a maximum of 250 chars, and (IIRC) the Center is contracted for 250,000 pages. In the past forty years, the tens of millions characters produced yearly were all done this way by this one organization (separate organizations were merged; hence Central Braille Preparation Center).

點字準則何處尋？ Decoding an Obscure Standard

中心職員細心睇咗我最初做出嚟嘅稿，話我知有好幾個系統性嘅錯處：維基百科寫嘅轉換方案有甩漏，唔係香港點字嘅標準，係唔用得嘅。但係佢哋當時冇話我知錯乜嘢，亦都俾唔到香港點字嘅標準我，往後書面查詢亦都再冇回覆。

香港公共圖書館有兩本有關香港點字教學嘅書（一本 2007 盲人中心、一本 2020 心光教學）；但係兩本都冇講嗰啲同維基百科唔同嘅地方。

嚟到呢度，我就覺得好奇怪。點解一個社會性嘅標準會咁難揾到嘅呢？

The Center staff reviewed my prepared documents with care, and told me there were systematic errors. The mapping, as written in Wikipedia, has mistakes and does not represent the standard used for Cantonese Braille. I was unable to get a list of what was wrong, I could not get a specification, and there was no responses to my follow-up inquiries.

The HK Public Libraries have two books about Braille in Hong Kong, one by the Society for the Blind in 2007, the other from blind-teaching school in 2020. I tracked both down but neither spoke about differences in the standard from Wikipedia.

I thought it was really strange that a societal standard was so hard to track down.

於是，我就喺社交媒體公開徵求一本會有漢字版本嘅點字書、以及會有相關知識嘅人。有兩位盲人朋友回覆 (特別鳴謝 TY, Tim 🙇‍♂️)，為我揾咗本《小王子》嘅點字版、同埋用【亮點】做特定嘅漢字轉寫。

我要求《小王子》因為知道一定會揾到中譯本；問題係書實在有太多譯本。掙扎咗一日去做個點字轉粵拼嘅轉換器，然後擇明用咗半個鐘就搭咗個大概 🙇‍♂️。噉就可以入手呢一個版本嘅漢字《小王子》。

對比之下，其實點字拼音同粵拼大同小異，分別四行講晒：

用入聲系統、-p / -t / -k 尾嘅 1, 3, 6 調會改寫為 7, 8, 9 調
j- 喺 -i, -im, -in, -iu, -yu, -yun, -yut 前不標（-ik, -ing 照標）
w- 喺 -u, -ui, -un, -ut 前不表
漢字要最少兩個碼位，唔足就用特殊留白代替

So I asked for help on Threads, for a book embossed in Cantonese Braille that would have a Chinese edition. Two blind people responded (thank you TY, Tim) and found a Braille version of Le Petit Prince for me, and did some conversion using a convertor they have access to (which may or may not be what the Center uses).

I asked for The Little Prince because we will for sure be able to find the Chinese version; the problem is that we need to read the Braille to find the right translation. Without knowing the Braille rules in its entirety, it was a struggle to parse / read the Braille; Chaak somehow built a converter in less than an hour. And thus I tracked down the edition.

It turns out that the romanization system underpinning Cantonese Braille is very close to Jyutping, and there are only four differences:

in “entering tones” (when the coda is -p / -t / -k), tones 1, 3, 6 are written as 7, 8, 9 respectively
j- before -i, -im, -in, -iu, -yu, -yun, -yut are not written (but yes before -ik, -ing)
w- before -u, -ui, -un, -ut are not written
each Chinese character must occupy a minimum of two ASCII (pad with a blank)

粵語字體･點字版 Cantonese Font, Braille edition

呢幾日，用啱啱知道嘅規矩做起咗一套成形嘅「粵語字體･點字版」，包括處理漢字、英文、數字、中英標點嘅能力。對比「亮點-機械碼校對」有以下優點：

書面語（既有方案嘅強項）準繩度會提高十倍，由約十餘字要改一個變成約二百字改一個。
能夠處理口語/粵文文體。粵語字體能夠自動處理廣東話變調。亮點會以 waa6 標注「廣東話」，如果唔想奇奇怪怪嘅話有需要人手修改。
能夠處理所有漢字。粵語字體內置全部可以粵語發音嘅字符，覆蓋三萬字，港臺繁體字、簡體字、日形漢字、統一碼擴充區域等。
可以睇住漢字-粵拼修改粵拼，然後一鍵生成相應嘅點字。就算文言文呢啲不常用、需要諸多修改嘅文體都可以簡便處理。
除咗可觀點字輸出，後置轉換器仲可以選擇點字、機械碼輸出純文字檔，無論係 embosser 輸出文字定係熱溶膨脹印刷圖像都適用。
點字因為背後仲係「單一漢字」，所以唔會有屬於一個字碼嘅幾個ASCII 斷開兩行嘅問題；又少一個需要校對嘅地方。
人手方面，現時需要幾個月培訓新人學習漢字轉機械碼嘅特殊、不成文規矩。字體方案，只要聘請識粵拼嘅員工，就可以即時幫到手。
所有已經用粵語字體校對過粵拼嘅項目，譬如蔡偉泉翻譯嘅《動物農莊》粵文版，都可以無縫輸出做點字版。
點字字體附有一個 ASCII 版本，可以 cmd-I 雙向切換；變相校對漢字、粵拼、機械碼、點字都可以同時進行
點字字體與標準粵語字體繪製大小相應，可以輕鬆製作視全漢字+粵拼、點字並行版本供視全視障人士共同使用。

Knowing the rules let me prepare the Braille edition of the Cantonese Font. The current version supports Chinese, English, numerals, CJK / English punctuation marks. Compared to a 亮點-ASCII proof-reading workflow, a Font-based workflow has the following advantages:

the strong suit of existing tooling is standard written Chinese, and the error rate was probably 1-in-15. For well-formed standard written Chinese the Cantonese Font benchmarks at around 1-in-150/200, an 10x improvement.
Capacity to handle written / colloquial Cantonese. The Cantonese Font is better at Cantonese than standard written Chinese and handles the irregular tone changes.
Covers all glyphs. The Cantonese Font includes all 30,000 characters that could be pronounced in Cantonese, including Traditional characters (in HK / TW standards), Simplified Chinese, Japanese Kanji, and characters in the Unicode Ext Planes.
Co-locating Chinese-Jyutping is far more ergonomic, and edits using |, ～ and .jyutping mechanisms also convert to Braille. This makes it possible to work on unusual text forms like classical Chinese dynasty literature.
Apart from a visual Braille output suitable for thermoform graphics output, by plugging in the text through a shaper we can get the ASCII plain-text for printing through embosser.
Under the visual appearance of pair/triplets of Braille is a single CJK codepoint. This means the problem of a single character being line-break internally is not even possible.
Currently, a new hire needs several months of onboarding to learn the CJK-ASCII conversion (where there does not seem to be a written standard to follow). A Font-based workflow makes any new hire who knows Jyutping productive from day-1.
All content that had been proof-read with the (regular) Cantonese Font, such as the written Cantonese translation of Animal Farm by Thomas Tsoi, can directly be provided as a Braille edition.
The Font has a sibling Braille-ASCII variant, that can be toggled with cmd-I (Italics). This allows for a seamless workflow in proof-reading Chinese, Jyutping, ASCII, and Braille.
Metrics of the Braille(-ASCII) variants corresponds to the standard Cantonese Font. This permits easy preparations of mixed Chinese and Braille publications (e.g., for reading of sighted parents with children).

整體嚟講應該可以為現有做點字嘅流程簡化 10-20 倍，令校對嘅痛苦度大大減低，亦都令以前必須透過機構做嘅工序大眾化。

成果而家暫時未諗到點樣公開。我希望免費/廉價俾有需要人士使用；收費製作點字嘅機構就要課金購買服務。

Overall this should result in 10-20 fold improvement in efficiency over existing workflow, makes proof-reading far less painful, and makes it possible for individuals to do what could only be done by organizations.

I have not decided on how to make this available yet. I am looking for a way that individuals who need this can access for free / at low cost, while organizations getting paid to prepare Cantonese Braille can purchase this as paid service.

往後工作 / 感想 Future Work / Final Thoughts

近十五年開發嘅字典、輸入法、字體、讀本全部都用粵拼。

因為盲人社群接觸嘅係一套獨特嘅「點拼」系統，雖然佢哋都係用拼音，但係變相因此同所有現代工具絕緣。

亦因為「點拼」spec 唔容易揾，就算有心人想做啲嘢亦都無從入手。於是輸入法會需要用特定嘅「點寫易」、輸出用「亮點」（兩者用同一詞庫、亦綁定 Windows）。建構呢啲工具應該係十五二十年一次嘅政府/馬會/公益金撥款，所以會長期滯後。

講到呢度，睇咗一排點拼，點拼本身有啲得意嘅位，可能有關設計當年只有物理顯示、亦都使用傳統嘅音韻分析：

（1）中文分詞本身係一件複雜嘅事，而廣東話點拼就更加多「邊幾個碼組成一個字」嘅一層。呢點除咗使用者要諗多之外，軟件自動分行亦都會需要特殊處理。中國普通話嘅拼音以單詞作為單位，詞與詞之間有空格，可作參考。

（2）因為點字聲調選擇只用一點嘅點字，就只有六個聲調嘅選擇。其實將入聲合併，1-6 調就可以各有一個單獨標碼。（而家嘅 1，7 調係選擇性不標示；4/9 共用一點。）

（3）「玩」(文讀 wun6，如「玩具」）標成 un 真係怪怪地。

但我感覺上粵語點字系統牽涉太廣泛實體既有項目、改動成本太高，係類似 QWERTY、插蘇等不變不能變嘅項目。

做粵語字體･點字版有冇實際用途？直接使用應該冇（至少短期、一兩年內冇）。製作點字需要專業印刷器材，而呢啲器材唔似正常印刷噉有市場競爭。五年十年，有員工自發使用嚟輔助現有工具缺陷嘅可能。

間接用途係示範到有改變嘅可能，以及將 spec 公開、計劃聚合唔同人嘅蝴蝶效應。譬如電子點字顯示器嘅轉換有一個上游開源方案 liblouis，裏面依賴嘅廣東話點字轉換係一字一音嘅系統、亦都完全缺乏所有統一碼擴展區嘅處理。雖然受限於 louis table 嘅格式，做唔到粵語字體二三百錯一嘅效果，但係三十錯一總比六錯一好。但係呢啲就視乎實際使用者嘅需求，以及其他開發者嘅配合（C lib 攞嚟做乜嘢我識條鐵咩）。

有諗法或者想洽商嘅讀者，歡迎電郵 jon -at- canto.hk 聯絡查詢。

Phonetic tools and content developed in the last 15 years all use Jyutping: this includes dictionaries, input methods, the Cantonese Font, graded readers, and so on.

However, since the blind community of Hong Kong uses an idiosyncratic phonetic system (with no advantage over Jyutping), they are insulated from these tools and content even though they are familiar with a phonetic system.

Because the specification is not so publicly available, even if developers / linguists want to help it is not easy to do so. As such, the input method that is available and the Braille output methods (for embosser and Braille displays) are only centrally produced with government / charity funding once every 15-20 years.

As to the format, Cantonese Braille was designed in an earlier era of phonology, a more resource limited environment, with print as the only media (presumably saving space was important). There are design decisions that would be curious today:

segmentation (what constitutes a multi-char word) is complicated, and Cantonese Braille introduce an additional “which Braille together forms a character” layer of complexity. This complication also applies to software, where line-breaking algorithms were not made with unbroken character sequences in mind. PRC Mandarin Braille pre-segments words and introduces space between them, and might be a reference model.
A design decision was made for “tones designated with one dot Braille”. This really calls for using a six-tone system with explicit marking for each tone (instead of the current compressing 4, 9 into one Braille, and omitting 1, 7).
I (modern Jyutping user) find omitting w- in places like 玩 wun6, thus ending up with un, quite awkward.

The Cantonese Braille format, however, have too much existing physical presence, and probably like QWERTY keyboards or electrical outlet, cannot be modified now.

Is the Cantonese Font Braille variant useful? Not directly, at least not in the short term (1-2 years). Preparing Braille prints need embosser, binding, and other processes and equipment. Unlike printers and copiers, there is no free market competition for their production and efficiency can be an afterthought. In 5-10 years, it is likely that employees individually decides to use the tool to make their lives easier.

There may be indirect utility, in the form of showing that there are ways different from the DOS-era heritage method; in the form of publicizing the specification; and in the form of providing a contact / nucleus for a public interested.

As an example, Braille electronic displays could be driven by software using the open source C library called liblouis, but the Cantonese mapping table assumes one-character-one-reading (literary reading) and that Unicode Extension Planes characters don’t matter. Its accuracy could not be better than 1-in-6. Technical limitations of the format probably means Cantonese Font like 1-in-300 error will never be possible, but a five-fold improvement to 1-in-30 is valuable and plausible. These require users and other developers’ input.

‡ 其實我唔係好明白中心產量計算，但係聽落規模真係驚人。I don’t really know how to calculate the production quota of the Center, but it was well beyond my uneducated expectations.

The Making of Cantonese Font Braille Edition 粵語字體點字版製作誌

廣東話點字 Cantonese Braille

現行點字製作 How Braille is Currently Prepared

點字準則何處尋？ Decoding an Obscure Standard

粵語字體･點字版 Cantonese Font, Braille edition

往後工作 / 感想 Future Work / Final Thoughts

Like this:

Comments

Leave a Reply Cancel reply

Your cart (items: 0)

廣東話點字 Cantonese Braille

現行點字製作 How Braille is Currently Prepared

點字準則何處尋？ Decoding an Obscure Standard

粵語字體･點字版 Cantonese Font, Braille edition

往後工作 / 感想 Future Work / Final Thoughts

Share this:

Like this:

Comments

Leave a Reply Cancel reply