Opinions on Jyutcitzi

Over on Twitter there is a new script in development for one of the languages I speak, Cantonese. It is called jyutcitzi ("yuet chit chi", in a more anglophone-friendly Romanisation scheme) They are all over conventional social media, and I'll link a few here:

Twitter

Medium

As most of the resources here are in Cantonese, I will spend the first few sections of this article going over the motivation and workings behind this new script. Then I will discuss what I like about it, what I don't like about it, and how I think it should be changed.

Declaration

As of time of writing I have no direct relationship with the guy who created the script but he does follow me on Twitter. I have read some of the documentation but my Chinese reading is not perfect and also there is a whole lot and I have not read all of it.

As befitting my conlang background however I have devised a number of scripts for my conlangs and have some experience with fitting scripts to languages. This means that I have some insight as to how a script should be designed. Whether you think this is meaningful is up to you.

Motivation

The Cantonese Script Reform Society believes the best way forward, is to adopt a script that is compatible and mixed with [sinograms] — just like how the Japanese’s writing system allows for the mixing of Kana and Kanji, how the Korean’s allows for the mixed use of Hangul and Hanja. We believe, Jyutcitzi, a phonetic script that is roughly based on the phonetic principle of /faancit/, offers the best way forward.

On the whole, Jyutcitzi is preferable to Jyutping - Medium

Whether this is true in previous centuries or in general is questionable, but nowadays a language's primary identity in the world is indeed in its writing system, and to confer the language the highest status it should be tailored to the language itself. History has given us several examples, such as Korean, Cherokee and Canadian Aboriginal (for Cree and similar languages toward Greenland), but one particular example in West Africa has a succinct Wikipedia article which highlights how important it is to be able to write the language in its own strokes of the pen:

The Adlam script is a script used to write Fulani. The name Adlam is an acronym derived from the first four letters of the alphabet (A, D, L, M), [...] which means "the alphabet that protects the peoples from vanishing".

Adlam script, Wikipedia

Also note that Adlam is used vigorously enough that its inclusion to Unicode is certain, and in fact has been done for a while already, so that is no obstacle.

From all this the case of having Cantonese having its own script is more or less made: to ensure the language has an increased profile against northern Chinese languages (especially Mandarin, the prestige and institutional language currently being enforced against all other languages in the county) and to ensure that Cantonese has its own visual identity distinct from that of any other language, this has to be the case. There are certain other considerations that we may address, but for the sake of brevity we will ignore them. We may revisit this in a later article.

The script

Jyutcitzi (hereafter, JC) is a script that's based on three principles:

Segmentation into syllable blocks, like in Korean, to assist in integration with the existing sinogrammatic superstrate à la kana
Composition and selection of characters according to "Faancit" (fan chit), a pronunciation guide long established as the way to read sinograms before the introduction of the Latin alphabet and the IPA
Selection of graphemes by extracting from existing Chinese characters, like in katakana.

Note the lack of Latin script input.

A syllable is divided into three parts:

The onset, which is the initial consonant, which may be null;
The rime (the word "rhyme", spelt like that in this context), containing the vowel nucleus and any existing coda; and
The tone.

In Cantonese, there are also two syllabic nasals, /m/ and /ŋ/, which are treated as onsets with no rime in JC. In fact they are written as the same character.

The list of characters representing each onset and rime are chosen so that they may be represented in the existing Unicode repertoire, which to my understanding from my own reading is a compromise chosen so that they may be used immediately with modern computers given some concessions. Of course, they are also mnemonic in nature, in that their source sinograms also share the same onset.

Onsets

The onsets are:

 Jyutping (IPA)  JC    O
-------------------------
 /Ø/             𭕄    H 
 b /p/           比    H 
 p /pʰ/          并    V 
 m               文    H 
 f               夫    V 
 d /t/           大    H 
 t /tʰ/          天    H 
 n               乃    V 
 l               力    V 
 z /ts/          止    V 
 c /tsʰ/         此    H 
 s               厶    H 
 j               央    H 
 g /k/           丩    V 
 k /kʰ/          臼    H 
 h               亾    V 
 ng /ŋ/          乂乂  H 
 gw /kʷ/         古    V 
 kw /kʷʰ/        夸    V 
 w               禾    H 
 /m̩ ŋ̩/           𠄡    F

We can arrange the onsets in a more conventional consonant chart thus:

 P.O.A. \ M.O.A.  Nasal  Plosive  Fricative  Affricate  Approx.  Liq. 
----------------------------------------------------------------------
 Labial           文     比                             禾            
 [+asp]                  并                                           
 [+syllabic]      𠄡                                                  
----------------------------------------------------------------------
 Dental           乃     大       厶         止         央       力   
 [+asp]                  天                  此                       
----------------------------------------------------------------------
 Velar            乂乂   丩                                           
 [+asp]                  臼                                           
 [+lab]                  古                                           
 [+asp][+lab]            夸                                           
 [+syllabic]      𠄡                                                  
----------------------------------------------------------------------
 Glottal                          亾

Where [+asp] means "with additional aspiration", [+lab] means "with additional labialisation", and [+syllabic] means "as a syllabic consonant".

One departure from the principle of using existing Unicode characters is the usage of ⿰乂乂, which is not in Unicode and is approximated here as the two halves (乂) put one after another, occupying two squares. The documentation states that it is derived from 爻, which is available in Unicode and has the correct onset. 𠄡 is the old form of 五 ("five"), which is indeed pronounced as the syllabic velar nasal.

Each onset has its own orientation, indicated by O in the above table, which indicates whether it occupies the top half (if H) or the left half (if V) of the syllable block. The syllabic nasal occupies the entire syllable block, so it gets the letter F in that column.

The null initial 𭕄 is used with a rime character to indicate a null onset, or it can be used with an onset character to indicate a lone consonant, which is helpful for writing foreign loanwords.

Rime

The rime is subdivided into the nucleus and the coda. The nucleus is as in conventional linguistics, whereas the coda can in the case of Cantonese be understood as the off-glide in the second vowel. This is conventionally drawn as in the following table.

 N \ C   Ø  | i   u  | m   n   ng /ŋ/ | p   t   k  |
------------+--------+----------------+------------+
 aa /a/  乍 | 介  丂 | 彡  万  生     | 甲  压  百 |
 a /ɐ/      | 兮  久 | 今  云  亙     | 十  乜  仄 |
 e       旡 | 丌  了 | 壬  円  正     | 夾  叐  尺 |
 i       子 |     么 | 欠  千  丁     | 頁  必  夕 |
 o       个 | 丐  冇 |     干  王     |     匃  乇 |
 u       乎 | 会     |     本  工     |     末  玉 |
 oe /œ/  𠄒 |        |         丈     |         玉 |
 eo /ɵ/     | 句     |     卂         |     𥘅     |
 yu /y/  𠄒 |        |     元         |     乙     |

These take up the remainder of the square that the onset left, squeezing to fit.

In loanwords, there can be an additional final /s/, which does not appear in Chinese-derived vocabulary. To write final -s, write a 厶 to the right of an H-block syllable, or to the bottom of a V-block syllable, squeezing to fit inside a syllable block. Phonotactics disallow adding any final to a syllabic nasal. Alternatively, the null initial 𭕄 may be used as a placeholder so that the final 厶 is placed below.

Tone

Tones are indicated by way of diacritical marks to the right of the top-right corner of the block. This is true regardless of the direction the text is written in, vertically or horizontally.

 Tone number  Contour  JC 
--------------------------
           1        5  ′  
           2       35  ´  
           3        4  `  
           4        1  ″  
           5       13  ˝  
           6        2  ̏   
    Toneless        –

Tone is not written unless it is to disambiguate between words of the same tone. For example, "that one" go2 go3 is written 丩个´·丩个`, but only if context requires otherwise. Of course, one can add tones for educational or pedantic reasons as well.

Toneless words do not exist in Chinese-derived vocabulary but might arguably appear in loanwords. Nativised loanwords frequently have tones imposed on them.

Linear writing

If technical limitations prohibit assembling the graphemes into syllable blocks, an acceptable alternative is to write all of them in a line, separated by middle dots in syllable boundaries if no other punctuation does. This linear writing is demonstrated earlier. It is similar to linear Hangul which was a system invented for similar reasons.

One never writes linearly using a pen and paper, because one is not limited technically like this.

Critique

Alright so now that we have described how the script works and also some of its more hairier edge cases, we'll now switch to a more personal tone and discuss my thoughts on how the script works, what details it has and how it is to be used.

The motivations are correct

This is the key point I want to front-load first, because I fully agree with the idea of creating a script for Cantonese to differentiate it from other languages, with Mandarin being the primary target of differentiation. A visual identity is a key part of a language today, even though it is not necessary a hundred or a thousand years ago, and even if it is not inherently part of a language as in its spoken part.

Additionally, a Romanisation may be helpful as part of a language, but it shouldn't be a language's script for the simple reason that Latin is not well-adapted to any other language other than Latin – hell, it's not even well-adapted to English. A foreign script not being able to write a native language easily risks being alienated and left unused, as is the case of Athabaskan languages and Canadian Aboriginal Syllabics.

Issues with regards to detail

The syllabic nasals need to be differentiated

Currently the script does not differentiate between syllabic /m/ and syllabic /ŋ/, using the same grapheme 𠄡 for both of them. As it so happens, there is in fact a minimal pair: m4 is 唔 (the negation morpheme), and ng4 is 吳 (a surname). While in practice it's easy to distinguish between them, it's unsatisfactory that the script conflates them for no real reason.

I think that these syllables should be separated. Because 唔 is the /only/ character pronounced /m/ with any tone, we could just use it wholesale as the glyph for the syllable, and have 𠄡 represent /ŋ/ as it is now. Alternatively, we could break with the mnemonic nature for these characters and substitute an alternative method of extracting characters as in katakana: we use 𠄡 for /m/ only, justifying it by noting that it looks like the cross symbol ✗ frequently used to indicate incorrectness, and that it is a part extracted from 唔.

For /ŋ/, I propose 𠂉 (from 午) or 夨 (from 吳). Reasons for 𠂉 include:

No inherent pronunciation, which maintains the mnemonic system;
Easier to write

While 夨 can be chosen because:

The one pronunciation it does have (cak1) is obscure and unknown
It fills the entire square, so it looks like a syllabic consonant
It's in a more supported range of Unicode

Ultimately regardless of the choice, the fact that Cantonese is one of the few languages where a legitimate word (a lexeme, in particular) can be completely vowel-less should be emphasised in the script a little bit more than it usually is.

The tone diacritics are bad

The tone diacritics are a bit of an afterthought, and they are hard to distinguish from one another. The difference between a straight-down stroke, one tilting to the left and one tilting to the right is just a little bit too hard to handle correctly in handwriting.

Instead, tone diacritics should be more mnemonic as well. I propose the following script system, which can already be written with existing Unicode solutions:

 Tone number  Contour  New form 
--------------------------------
           1        5  ◌̄        
           2       35  ◌́        
           3        4  ◌¯       
           4        1  ◌̱        
           5       13  ◌̗        
           6        2  ◌_       
    Toneless        –  ◌

The dotted circle here represents the entire syllable block, or the rime part of the character in linear writing. For tones 3 and 6, one writes the mark beside the character in question. They are placed in these relative positions regardless of horizontal or vertical writing, but for tones 1, 2, 4 and 6, they can be written displaced slightly to the left if other characters intrude on the square or it can be confused with a 丶 or a 一.

This system is notably easier to remember, is mnemonic, and significantly easier to write and decode. Level tones are indicated with flat lines and rising tones are indicated with acutes, and the position of the line roughly indicates where the tone height comes from or goes to. It also survives bad writing and low resolutions better than the existing method.

Misleading sub-components

The character 大 is also part of 夸, which isn't ideal even if it is mnemonic. 亏 is a better choice and preserves the mnemonic as well as being smaller to write.

A similar triple exists with 比止 and 此. The first two can be replaced with 匕 (H) and 只 (V). It's not easy to find a satisfactory replacement for 此 but something should be done to reduce the stroke count of that one.

The rimes on the other hand I can find little objection to noting that they are the main part of the syllable and therefore can tolerate smaller differences.

Font issues

The font supplied to write JC in its syllable block, non-linear format, which utilises the Private Use Area for all missing compounds, isn't very nice. In particular, the stroke widths are not optimised and it creates a kind of "squished feeling" that all such characters have. Additionally, the character halves don't kern with each other as well as they should.

This isn't a critique of the system itself, just the font that goes with it. I could not supply a better font either, so I could settle for the existing ones.

Accidental gaps need filling

The system is clearly meant to include foreign loanwords that violate inherited Cantonese phonotactics but to do this there needs to be extra characters that fill in some accidental gaps in Cantonese orthography. One of the biggest gaps I can identify right now is the bare vowel /ɐ/, which should have its own character for use with certain English vowels that have the vowel /ʌ/, like "must" (mas3) or "love" (laf1), which appear in Cantonese every so often.

More philosophical problems

口 is OK!

There is a sort of rejection in some of the articles with regard to the usage or even over-usage of the radical 口 to represent Cantonese words. We've seen one of them: 唔. This so happens to be the most common one in use, which is not surprising as the negation particle is in fact very much used. Other examples include:

The possessive marker, 嘅
"at", 喺
"Mark", "trademark", 嘜
"Correct", 啱
"Huh?", 吓
"Cry", 喊
"Rest", 唞
The counter word meaning "a lump of", 嚿

More complete list of such particles

The problem with using the mouth radical, so the claim goes, is that it demotes Cantonese to be just regular (prescribed) Chinese with funny characters, and that it does nothing to establish the visual identity of the writing. So the argument goes, this does not bode well for the survival of the language, since the visual identity is highly critical.

While this is true, I would like to push back on one very important point: people /already know/ them. For the particularly common ones, such as 嘅喺啱 and 唔, these are much better known than the allegedly "correct" characters, if they exist in the first place. For some of these, such as 喊, using a mouth radical is arguably correct anyway because you do indeed cry with your mouth.

The desire here is to replace all the grammatical particles with JC, à la kana, and then leave all content words to remain in sinograms, which is more etymologically correct. Words that have obscure sinograms, as well as foreign loanwords, are also written in JC. Compare with Japanese and Korean:

 Situation              Japanese   Korean (KMS)  Cantonese with JC 
-------------------------------------------------------------------
 Sinitic content words  Sinograms  Sinograms     Sinograms         
 Native content words   Sinograms  Hangul        Sinograms         
 Grammatical words      Hiragana   Hangul        Jyutcitzi         
 Foreign words          Katakana   Hangul        Jyutcitzi

The table makes it look very nice and neat, but unlike Japanese and Korean there is one thing that Cantonese has: a relationship with other Chinese languages. This makes the distinction between "Sinitic" and "Native" not easy to determine, though of course there are some clear-cut cases (Cantonese retains some substrate from a long-lost language when the Chinese came in and occupied the area two thousand or so years ago). Key is that it's not always clear whether a grammatical word is "Cantonese" or "Sinitic" or if that distinction is even relevant; while these words tend to not jump linguistic boundaries, making the distinction easy for Japanese and Korean, with Cantonese function words may well have existed in old Chinese before it differentiated, which arguably means that they should retain their sinogrammatic forms.

Certainly then some of the more obscure 口 compounds should be rejected and replaced with JC, especially if they are ad-hoc or rarely attested; further, native Cantonese morphemes with no sinogram equivalents should be written with JC no matter if they are content or structure, rather than trying to invent new ones. However, in my opinion, some of the more common ones should be left alone, especially if they have some other justification, whether if it's because of some auxiliary meaning that makes a 口 semantophore correct, or just because it is already in frequent use and widely understood We should sacrifice some level of purity for less friction in adoption, for a pure script is no use if no one uses it.

Jyutcitzi as phonophores

One interesting feature of using the syllable block as a way of organising characters is that one can then use it as a part of another character. Specifically, you can use it as part of a phono-semantic compound, and create new characters out of it. For example, the word "interview" is loaned into Cantonese as "in1", and we can write it like so:

⿰訁⿱央千

which we understand as ⟨訁|⿱央千⟩, i.e. 訁 is the semantophore, as we speak in an interview, and ⿱央千 is the phonophore, meaning we pronounce it like ·央千·.

I have mixed feelings about this feature, and can't fully commit to either liking or hating it. I'll summarise my thoughts as such.

First, languages change but writing does not. Cantonese is going to evolve no matter what, and like Hangul, if we commit to using this as a way to write words, then there will be a time when the faancit is no longer accurate and this must be learnt separately. To be sure this is also going to happen regardless of whether you put a semantophore in the syllable block or not, but this is not a simple case of forming characters like that. At risk of contradicting myself, using another sinogram that sounds the same or similar to construct new sinograms for writing loanwords is much more robust in terms of future development.

Second, re-inserting semantic meaning like this is helpful in understanding and is as far as I can tell unique amongst the East Asian languages, and it is attested elsewhere; ancient Egyptian writing uses this method as well to write words with a meaning component and a pronunciation component. It certainly does help to guide foreign readers already well-versed in sinogrammatic construction into the system.

Finally, this adds a massive burden into Unicode. It may never be included in Unicode if the vast majority of combinations remain unused and the number of possible semantophores is unchecked.

As I said above, I have no particular comment on that feature, but here are my thoughts that guide me towards one answer or another.

How many jyutcitzi?

Another minor detail that I think should be considered. Apparently this is left unspecified in the documentation deliberately, so this is just me expressing some undirected opinion.

As the table comparing sinogram usage above shows, there is a sliding scale to how often we should use phonetic-only symbols and when we can leverage the shared meaning of sinograms to allow for some level of inter-lingual communication. I really like the fact that Japanese uses sinograms, as it allows me to leverage my existing knowledge of them to get an "in" onto the language and let me read things immediately. It's why I put off Korean so much as it does not use sinograms anymore so I get no "in".

So I think Cantonese should be written with a good amount of content words and even some function words written in sinograms, so that other sinogram users can leverage their knowledge to understand some of Cantonese as well. In particular, the usage of 唔 is justified, as if we write all instances of the Japanese negative particle ない as 無い, even if this is not etymologically correct.

Summary

I like the idea of a native script for Cantonese, and I think jyutcitzi is a very good proposal for it. There are some small things that I would like to change about it, both in the small details and in the general picture of how it should be used. I will probably learn this script more carefully (but with my changes, especially re the tones) and use it when I write Cantonese, which, unfortunately, is fairly rare.