( sz332 | 2017. 05. 08., h – 16:56 )

Nem, mert két külön dologról beszélünk. Maga az UTF8 megoldotta azt a problémát, hogy hogyan írjunk különböző nyelvek karaktereit
egy szövegbe.

Abban igazad van, hogy a Han unification nem volt egy annyira jó ötlet feltétlenül...

Csak hogy a többiek is értsék, hogy miről beszélgetünk:

https://en.wikipedia.org/wiki/Han_unification

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the so-called CJK languages into a single set of unified characters. Han characters are a common feature of written Chinese (hanzi), Japanese (kanji), and Korean (hanja).

Még egy kis kiegészítés, ami viszont a másik oldalról szól:

https://news.ycombinator.com/item?id=9222056

SiVal 782 days ago [-]

No. Using different code points for the same character used in different languages creates big problems. It would be like having different code points for 'A' depending on whether it was used in English, Spanish, German, etc. If you somehow ended up writing "color" with both 'o' characters from the Spanish ABCs and the others from the English ABCs, you'd have a real mess when it came to sorting, searching, name matching (what language is "Hans"?) etc. It is far more convenient to allow the character sequence "color" or "Hans" to be language independent, even if the font choices, pronunciation, sort order, etc., are language dependent.

Chinese, Japanese, and Korean writers face similar issues. The characters they use to write the name of China or Japan, the ten digits, the characters for year, month, and day in dates, and so many thousands of others in Chinese characters are what they all consider to be the same characters. That is not all characters, but it includes so many that insisting on different code points by language would make a real mess. Hong Kong has many characters that are unique to HK Cantonese. So, should Cantonese have a full set of all Chinese characters that are the "Cantonese characters"? How about Shanghainese, then? Or Hakka? Teochew (Chaozhou) or a dozen Chinese languages? Full, independent sets of all Chinese characters for each? Suppose you accidentally used an input method in HK and wrote the name of some Beijing gov't ministry using characters that looked identical to their Mandarin counterparts but were entirely different codepoints? Now what? You can't find your search term? You mess up the database and have two identical-looking keys?

No, Han unification is not conceptually different from unifying ABCs used by English and Spanish speakers, Cyrillic used by Russians or Serbs, etc., except that there are many more characters, so the boundary between what should be unified and what shouldn't contains more items in the gray zone to cause debate. Having no Han unification at all wouldn't solve all problems, it would create all sorts of absurdity.

Ebből egy dolog következik: a különböző nyelveket összehozni azért egy nagyon nemtriviális feladat.