Knowledge

Han unification

Source 📝

762:
feature of rich text protocols and not properly handled by the plain text goals of Unicode. However, when the change from one glyph to another constitutes a change from one grapheme to another—where a glyph cannot possibly still, for example, mean the same grapheme understood as the small letter "a"—Unicode separates those into separate code points. For Unihan the same thing is done whenever the abstract meaning changes, however rather than speaking of the abstract meaning of a grapheme (the letter "a"), the unification of Han ideographs assigns a new code point for each different meaning—even if that meaning is expressed by distinct graphemes in different languages. Although a grapheme such as "ö" might mean something different in English (as used in the word "coördinated") than it does in German (as used in the word "schön"), it is still the same grapheme and can be easily unified so that English and German can share a common abstract Latin writing system (along with Latin itself). This example also points to another reason that "abstract character" and grapheme as an abstract unit in a written language do not necessarily map one-to-one. In English the
1354:
conflicts with the stated goal of Unicode to take away that overhead, and to allow any number of any of the world's scripts to be on the same document with one encoding system. Chapter One of the handbook states that "With Unicode, the information technology industry has replaced proliferating character sets with data stability, global interoperability and data interchange, simplified software, and reduced development costs. While taking the ASCII character set as its starting point, the Unicode Standard goes far beyond ASCII's limited ability to encode only the upper- and lowercase letters A through Z. It provides the capacity to encode all characters used for the written languages of the world – more than 1 million characters can be encoded. No escape sequence or control code is required to specify any character in any language. The Unicode character encoding treats alphabetic characters, ideographic characters, and symbols equivalently, which means they can be used in any mixture and with equal facility."
785:, first introduced in version 3.2 and supplemented in version 4.0. While variation selectors are treated as combining characters, they have no associated diacritic or mark. Instead, by combining with a base character, they signal the two character sequence selects a variation (typically in terms of grapheme, but also in terms of underlying meaning as in the case of a location name or other proper noun) of the base character. This then is not a selection of an alternate glyph, but the selection of a grapheme variation or a variation of the base abstract character. Such a two-character sequence however can be easily mapped to a separate single glyph in modern fonts. Since Unicode has assigned 256 separate variation selectors, it is capable of assigning 256 variations for any Han ideograph. Such variations can be specific to one language or another and enable the encoding of plain text that includes such grapheme variations. 738:(generating the combination "å") might be understood by a user as a single grapheme while being composed of multiple Unicode abstract characters. In addition, Unicode also assigns some code points to a small number (other than for compatibility reasons) of formatting characters, whitespace characters, and other abstract characters that are not graphemes, but instead used to control the breaks between lines, words, graphemes and grapheme clusters. With the unified Han ideographs, the Unicode Standard makes a departure from prior practices in assigning abstract characters not as graphemes, but according to the underlying meaning of the grapheme: what linguists sometimes call 794:
Japanese writing systems historically, the inability to specify a particular variant was considered a significant obstacle to the use of Unicode in scholarly work. For example, the unification of "grass" (explained above), means that a historical text cannot be encoded so as to preserve its peculiar orthography. Instead, for example, the scholar would be required to locate the desired glyph in a specific typeface in order to convey the text as written, defeating the purpose of a unified character set. Unicode has responded to these needs by assigning variation selectors so that authors can select grapheme variations of particular ideographs (or even other characters).
798:
displayed incorrectly. (Proper names tend to be especially orthographically conservative—compare this to changing the spelling of one's name to suit a language reform in the US or UK.) While this may be considered primarily a graphical representation or rendering problem to be overcome by more artful fonts, the widespread use of Unicode would make it difficult to preserve such distinctions. The problem of one character representing semantically different concepts is also present in the Latin part of Unicode. The Unicode character for a curved apostrophe is the same as the character for a right single quote (’). On the other hand, the capital
873:
in another language style. (That is to say, it would be difficult to access "grass" with the four-stroke radical more typical of Traditional Chinese in a Japanese environment, which fonts would typically depict the three-stroke radical.) Unihan proponents tend to favor markup languages for defining language strings, but this would not ensure the use of a specific variant in the case given, only the language-specific font more likely to depict a character as that variant. (At this point, merely stylistic differences do enter in, as a selection of Japanese and Chinese fonts are not likely to be visually compatible.)
869:). Yet for a reader of Latin script based languages the two variations of the "a" character are both recognized as the same grapheme. Graphemes present in national character code standards have been added to Unicode, as required by Unicode's Source Separation rule, even where they can be composed of characters already available. The national character code standards existing in CJK languages are considerably more involved, given the technological limitations under which they evolved, and so the official CJK participants in Han unification may well have been amenable to reform. 313: 1397:
when Unicode's definition is that specialized semantic variants have the same meaning only in certain contexts. Languages use them differently. A pair whose characters are 100% drop-in replacements for each other in Japanese may not be so flexible in Chinese. Thus, any comprehensive merger of recommended code points would have to maintain some variants that differ only slightly in appearance even if the meaning is 100% the same for all contexts in one language, because in another language the two characters may not be 100% drop-in replacements.
608:
may be the same for CJK languages, the glyphs in common use for the same characters may not be. For example, the traditional Chinese glyph for "grass" uses four strokes for the "grass" radical , whereas the simplified Chinese, Japanese, and Korean glyphs use three. But there is only one Unicode point for the grass character (U+8349) regardless of writing system. Another example is the ideograph for "one," which is different in Chinese, Japanese, and Korean. Many people think that the three versions should be encoded differently.
674: 3507:, which provides information about all of the unified Han characters encoded in the Unicode Standard, including mappings to various national and industry standards, indices into standard dictionaries, encoded variants, pronunciations in various languages, and an English definition. The database is available to the public as text files and via an interactive website. The latter also includes representative glyphs and definitions for compound words drawn from the free Japanese 8077: 6033: 6022: 474: 77: 1441:(from a font) suitable to the specified language. (Besides actual character variation—look for differences in stroke order, number, or direction—the typefaces may also reflect different typographical styles, as with serif and non-serif alphabets.) This only works for fallback glyph selection if you have CJK fonts installed on your system and the font selected to display this article does not include glyphs for these characters. 238: 36: 532: 179: 834:(CJK-JRG) favored a proposal (DIS 10646) for a non-unified character set, "which was thrown out in favor of unification with the Unicode Consortium's unified character set by the votes of American and European ISO members" (even though the Japanese position was unclear). Endorsing the Unicode Han unification was a necessary step for the heated ISO 10646/Unicode merger. 1127:, whether that difference be due to simplification, international variance or intra-national variance. However, for some platforms (e.g., smartphones), a device may come with only one font pre-installed. The system font must make a decision for the default glyph for each code point and these glyphs can differ greatly, indicating different underlying graphemes. 2371:. There is a reason for this that has nothing to do with how the domestic bodies view the characters themselves. China went through a process in the twentieth century that changed (if not simplified) several characters. During this transition, there was a need to be able to encode both variants within the same document. Korean has always used the variant of 2512:). The PRC's text encoding bodies did not encode the two variants differently. The fact that almost every other change brought about by the PRC, no matter how minor, did warrant its own code point suggests that this exception may have been unintentional. Unicode copied the existing standards as is, preserving such irregularities. 778:
such as OpenType allow for the mapping of alternate glyphs according to language so that a text rendering system can look to the user's environmental settings to determine which glyph to use. The problem with these approaches is that they fail to meet the goals of Unicode to define a consistent way of encoding multilingual text.
1295:. Much software (such as the MediaWiki software that hosts Knowledge) will replace all canonically equivalent characters that are discouraged (e.g. the angstrom symbol) with the recommended equivalent. Despite the name, CJK "compatibility variants" are canonically equivalent characters and not compatibility characters. 1131:
semantically identical characters that have many variants. In addition to the standard character sets in Simplified Chinese, Traditional Chinese, Korean, Vietnamese, Kyūjitai Japanese and Shinjitai Japanese, there also exist "ancient" forms of characters that are of interest to historians, linguists and philologists.
1396:
One would expect that all simplified characters would simultaneously also be z-variants or semantic variants with their traditional counterparts, but many are neither. It is easier to explain the strange case that semantic variants can be simultaneously both semantic variants and specialized variants
1313:
Some pairs of Traditional and Simplified are also considered to be semantic variants. According to Unicode's definitions, it makes sense that all simplifications (that do not result in wholly different characters being merged for their homophony) will be a form of semantic variant. Unicode classifies
3418:
These compatibility characters (excluding the twelve unified ideographs in the CJK Compatibility Ideographs block) are included for compatibility with legacy text handling systems and other legacy character sets. They include forms of characters for vertical text layout and rich text characters that
1071:
have specifically listed the system as a trade barrier in Japan. The report claimed that the adoption of the TRON-based system by the Japanese government is advantageous to Japanese manufacturers, and thus excluding US operating systems from the huge new market; specifically the report lists MS-DOS,
872:
Unlike European versions, CJK Unicode fonts, due to Han unification, have large but irregular patterns of overlap, requiring language-specific fonts. Unfortunately, language-specific fonts also make it difficult to access a variant which, as with the "grass" example, happens to appear more typically
793:
Since the Unihan standard encodes "abstract characters", not "glyphs", the graphical artifacts produced by Unicode have been considered temporary technical hurdles, and at most, cosmetic. However, again, particularly in Japan, due in part to the way in which Chinese characters were incorporated into
777:
To deal with the use of different graphemes for the same Unihan sememe, Unicode has relied on several mechanisms: especially as it relates to rendering text. One has been to treat it as simply a font issue so that different fonts might be used to render Chinese, Japanese or Korean. Also font formats
2402:
may have to tag the character as "Traditional Chinese" or trust that the recipient's Japanese font uses only the Kyūjitai glyphs, but tags of Traditional Chinese and Simplified Chinese may be necessary to show the two forms side by side in a Japanese textbook. This would preclude one from using the
2386:
Almost all of the variants that the PRC developed or standardized got distinct code points owing simply to the fortune of the Simplified Chinese transition carrying through into the computing age. This privilege however, seems to apply inconsistently, whereas most simplifications performed in Japan
1130:
Consequently, relying on language markup across the board as an approach is beset with two major issues. First, there are contexts where language markup is not available (code commits, plain text). Second, any solution would require every operating system to come pre-installed with many glyphs for
607:
The problem stems from the fact that Unicode encodes characters rather than "glyphs," which are the visual representations of the characters. There are four basic traditions for East Asian character shapes: traditional Chinese, simplified Chinese, Japanese, and Korean. While the Han root character
2307:
In the twentieth century, East Asian countries made their own respective encoding standards. Within each standard, there coexisted variants with distinct code points, hence the distinct code points in Unicode for certain sets of variants. Taking Simplified Chinese as an example, the two character
3256:
In order to resolve issues brought by Han unification, a Unicode Technical Standard known as the Unicode Ideographic Variation Database have been created to resolve the problem of specifying specific glyph in plain text environment. By registering glyph collections into the Ideographic Variation
2515:
The Unicode Consortium has recognized errors in other instances. The myriad Unicode blocks for CJK Han Ideographs have redundancies in original standards, redundancies brought about by flawed importation of the original standards, as well as accidental mergers that are later corrected, providing
761:
For a grapheme to be represented by various glyphs means that the grapheme has glyph variations that are usually determined by selecting one font or another or using glyph substitution features where multiple glyphs are included in a single font. Such glyph variations are considered by Unicode a
685:
A grapheme is the smallest abstract unit of meaning in a writing system. Any grapheme has many possible glyph expressions, but all are recognized as the same grapheme by those with reading and writing knowledge of a particular writing system. Although Unicode typically assigns characters to code
1084:
to cancel the Center of Educational Computing's selection of the TRON-based system for the use of educational computers. The incident is regarded as a symbolic event for the loss of momentum and eventual demise of the BTRON system, which led to the widespread adoption of MS-DOS in Japan and the
1268:
as both its compatibility variant and its z-variant. The compatibility variant field overrides the z-variant field, forcing normalization under all forms, including canonical equivalence. Despite the name, compatibility variants are actually canonically equivalent and are united in any Unicode
1102:
Japanese or Vietnamese. Instead of some variants getting distinct code points while other groups of variants have to share single code points, all variants could be reliably expressed only with metadata tags (e.g., CSS formatting in webpages). The burden would be on all those who use differing
3433:
The International Ideographs Core (IICore) is a subset of 9810 ideographs derived from the CJK Unified Ideographs tables, designed to be implemented in devices with limited memory, input/output capability, and/or applications where the use of the complete ISO 10646 ideograph repertoire is not
1134:
Unicode's Unihan database has already drawn connections between many characters. The Unicode database catalogs the connections between variant characters with distinct code points already. However, for characters with a shared code point, the reference glyph image is usually biased toward the
2519:
For native speakers, variants can be unintelligible or be unacceptable in educated contexts. English speakers may understand a handwritten note saying "4P5 kg" as "495 kg", but writing the nine backwards (so it looks like a "P") can be jarring and would be considered incorrect in any school.
1353:
Unicode claims that "Ideally, there would be no pairs of z-variants in the Unicode Standard." This would make it seem that the goal is to at least unify all minor variants, compatibility redundancies and accidental redundancies, leaving the differentiation to fonts and to language tags. This
912:
U+4E22 for Simplified Chinese GB #2210). It is also noted that Traditional and Simplified characters should be encoded separately according to Unicode Han Unification rules, because they are distinguished in pre-existing PRC character sets. Furthermore, as with other variants, Traditional to
797:
Small differences in graphical representation are also problematic when they affect legibility or belong to the wrong cultural tradition. Besides making some Unicode fonts unusable for texts involving multiple "Unihan languages", names or other orthographically sensitive terminology might be
826:
Some of the controversy stems from the fact that the very decision of performing Han unification was made by the initial Unicode Consortium, which at the time was a consortium of North American companies and organizations (most of them in California), but included no East Asian government
827:
representatives. The initial design goal was to create a 16-bit standard, and Han unification was therefore a critical step for avoiding tens of thousands of character duplications. This 16-bit requirement was later abandoned, making the size of the character set less of an issue today.
742:. This departure therefore is not simply explained by the oft quoted distinction between an abstract character and a glyph, but is more rooted in the difference between an abstract character assigned as a grapheme and an abstract character assigned as a sememe. In contrast, consider 681:" has widely differing glyphs that all represent concrete instances of the same abstract grapheme. Although a native reader of any language using the Latin script recognizes these two glyphs as the same grapheme, to others they might appear to be completely unrelated. 1178:
to be near identical z-variants while at the same time classifying them as significantly different semantic variants. There are also cases of some pairs of characters being simultaneously semantic variants and specialized semantic variants and simplified variants:
900:) and they are, with some differences, more familiar to Korean and Japanese users.) Unicode is seen as neutral with regards to this politically charged issue, and has encoded Simplified and Traditional Chinese glyphs separately (e.g. the ideograph for "discard" is 2303:
No character variant that is exclusive to Korean or Vietnamese has received its own code point, whereas almost all Shinjitai Japanese variants or Simplified Chinese variants each have distinct code points and unambiguous reference glyphs in the Unicode standard.
1072:
OS/2 and UNIX as examples. The Office of USTR was allegedly under Microsoft's influence as its former officer Tom Robertson was then offered a lucrative position by Microsoft. While the TRON system itself was subsequently removed from the list of sanction by
1242:(U+27EAF). If a font has glyphs encoded to both points so that one font is used for both, they should appear identical. These cases are listed as z-variants despite having no variance at all. Intentionally duplicated characters were added to facilitate 630:) are encoded separately in Unicode, as they are not considered national variants. The first is the common form in all three countries, while the second and third are used on financial instruments to prevent tampering (they may be considered variants). 1093:
There has not been any push for full semantic unification of all semantically linked characters, though the idea would treat the respective users of East Asian languages the same, whether they write in Korean, Simplified Chinese, Traditional Chinese,
633:
However, Han unification has also caused considerable controversy, particularly among the Japanese public, who, with the nation's literati, have a history of protesting the culling of historically and culturally significant variants. (See
2396:. This can cause problems for the language tagging strategy. There is no universal tag for the traditional and "simplified" versions of Japanese as there are for Chinese. Thus, any Japanese writer wanting to display the Kyūjitai form of 1310:(U+6F22) does not have this equivalence listed in this entry. Unicode demands that all entries, once admitted, cannot change compatibility or equivalence so that normalization rules for already existing characters do not change. 1246:. Because round-trip conversion was an early selling point of Unicode, this meant that if a national standard in use unnecessarily duplicated a character, Unicode had to do the same. Unicode calls these intentional duplications " 841:, as defined in Unicode, and the related but distinct idea of graphemes. Unicode assigns abstract characters (graphemes), as opposed to glyphs, which are a particular visual representations of a character in a specific 2550:
attributes. However, some variants with arguably minimal differences get distinct codepoints, and not every variant with arguably substantial changes gets a unique codepoint. As an example, take a character such as
822:
While the unification aspect of Unicode is controversial in some quarters for the reasons given above, Unicode itself does now encode a vast number of seldom-used characters of a more-or-less antiquarian nature.
3524:
Most of these are legacy and obsolete characters, however, as per Unicode's objective to encode every writing system that is or has ever been used; only 2000 to 3000 characters are necessary to be considered
2409:
in Unicode, but only for "compatibility reasons". Any Unicode-conformant font must display the Kyūjitai and Shinjitai versions' equivalent code points in Unicode as the same. Unofficially, a font may display
3257:
Database (IVD), it is possible to use Ideographic Variation Selectors to form Ideographic Variation Sequence (IVS) to specify or restrict the appropriate glyph in text processing in a Unicode environment.
2383:(U+5165) radical on top. Therefore, it had no reason to encode both variants. Korean language documents made in the twentieth century had little reason to represent both versions in the same document. 2473:(U+7EA2) got separate code points in the PRC's text encoding standards bodies so Chinese-language documents could use both versions. The two variants received distinct code points in Unicode as well. 2585:(U+514C/U+5151), either method can be used to display the different glyphs. In the following table, each row compares variants that have been assigned different code points. For brevity, note that 596:
may approach or exceed 100,000 characters. Version 1 of Unicode was designed to fit into 16 bits and only 20,940 characters (32%) out of the possible 65,536 were reserved for these
766:, "¨", and the "o" it modifies may be seen as two separate graphemes, whereas in languages such as Swedish, the letter "ö" may be seen as a single grapheme. Similarly in English 3338:(F900–FAFF) (the twelve characters at FA0E, FA0F, FA11, FA13, FA14, FA1F, FA21, FA23, FA24, FA27, FA28 and FA29 are actually "unified ideographs" not "compatibility ideographs") 546: 193: 1369:
as equivalent. Even within Japan, the variants are on different sides of a major simplification called Shinjitai. Unicode would effectively make the PRC's simplification of
758:
and a single quotation mark) are unified because the glyphs are the same. For Unihan the characters are not unified by their appearance, but by their definition or meaning.
7695: 2416:
differently with 海 (U+6D77) as the Shinjitai version and 海 (U+FA45) as the Kyūjitai version (which is identical to the traditional version in written Chinese and Korean).
2387:
and mainland China with code points in national standards, including characters simplified differently in each country, did make it into Unicode as distinct code points.
2350:(U+4EBA). Both variants of the first character got their own distinct code points. However, the two variants of the second character had to share the same code point. 3969: 977:, which is now the base character set for many new standards and protocols, internationally adopted, and is built into the architecture of operating systems ( 6540: 770:
on an "i" is understood as a part of the "i" grapheme whereas in other languages, such as Turkish, the dot may be seen as a separate grapheme added to the
589:(IRG), made up of experts from the Chinese-speaking countries, North and South Korea, Japan, Vietnam, and other countries, is responsible for the process. 3991: 1063:-based system was adopted by Japanese government organizations "Center for Educational Computing" as the system of choice for school education including 642: 6377: 2546:
In some cases, often where the changes are the most striking, Unicode has encoded variant characters, making it unnecessary to switch between fonts or
2353:
The justification Unicode gives is that the national standards body in the PRC made distinct code points for the two variations of the first character
1357:
This leaves the option to settle on one unified reference grapheme for all z-variants, which is contentious since few outside of Japan would recognize
921:
There are several alternative character sets that are not encoding according to the principle of Han Unification, and thus free from its restrictions:
1256:(U+6F22) its compatibility variant. As long as an application uses the same font for both, they should appear identical. Sometimes, as in the case of 710:
However, this quote refers to the fact that some graphemes are composed of several graphic elements or "characters". So, for example, the character
7486: 4925: 1068: 4580: 2500:). Simplified Chinese, Kyūjitai Japanese and Shinjitai Japanese use a three-stroke version, like two plus signs sharing their horizontal strokes ( 2488:(U+8349), the radical was placed at the top, but had two different forms. Traditional Chinese and Korean use a four-stroke version. At the top of 4867: 6485: 6560: 6074: 5990: 1326:
as each other's respective traditional and simplified variants and also as each other's semantic variants. However, while Unicode classifies
2390:
Sixty-two Shinjitai "simplified" characters with distinct code points in Japan got merged with their Kyūjitai traditional equivalents, like
4480: 1381:(U+4FB6) a monumental difference by comparison. Such a plan would also eliminate the very visually distinct variations for characters like 1077: 550: 197: 5975: 3675: 2595:). They will not appear here nor will the simplified Chinese characters that take consistently simplified radical components (e.g., 3936: 3445:
The libUnihan project provides a normalized SQLite Unihan database and corresponding C library. All tables in this database are in
3999: 694:
An abstract character does not necessarily correspond to what a user thinks of as a "character" and should not be confused with a
592:
One rationale was the desire to limit the size of the full Unicode character set, where CJK characters as represented by discrete
8119: 7787: 7541: 5995: 4785: 4127: 3984: 3393: 1045: 2455:
component. However, in mainland China, the standards bodies wanted to standardize the cursive form when used in characters like
948:
These region-dependent character sets are also seen as not affected by Han Unification because of their region-specific nature:
7777: 4770: 4047: 3275: 1405:
In each row of the following table, the same character is repeated in all six columns. However, each column is marked (by the
141: 7526: 6480: 5014: 4693: 113: 7660: 5009: 1073: 1025: 845:. One character may be represented by many distinct glyphs, for example a "g" or an "a", both of which may have one loop ( 8099: 1154:(U+4E22) are examples that Unicode gives as differing in a significant way in their abstract shapes, while Unicode lists 782: 341: 8109: 8054: 7565: 7368: 6112: 4543: 4095: 4083: 4079: 4075: 4071: 4067: 4063: 4059: 4055: 4051: 3977: 3364: 3329: 3323: 3317: 3311: 3305: 3299: 3293: 3287: 3281: 120: 819:. This is, of course, desirable for reasons of compatibility, and deals with a much smaller alphabetic character set. 7610: 7226: 7221: 6724: 6555: 6102: 6067: 3866: 638:. Today, the list of characters officially recognized for use in proper names continues to expand at a modest pace.) 600:. Unicode was later extended to 21 bits allowing many more CJK characters (97,680 are assigned, with room for more). 572: 513: 299: 281: 219: 160: 63: 4378: 4369: 263: 8129: 6644: 4842: 4473: 3806: 1247: 963: 491: 94: 49: 7862: 7797: 7551: 7531: 4847: 4762: 4663: 4123: 4107: 4091: 3411: 3405: 3399: 876:
Chinese users seem to have fewer objections to Han unification, largely because Unicode did not attempt to unify
554: 431:
unit – hence, "Han unification", with the resulting character repertoire sometimes contracted to
201: 127: 5324: 5043: 955:(based on sequence codes to switch between Chinese, Japanese, Korean character sets – hence without unification) 656: 8104: 7615: 6208: 1001: 881: 495: 443: 248: 98: 4185: 3624: 7729: 7700: 7350: 5144: 4958: 4942: 4905: 4752: 4688: 4508: 4145: 3820: 3428: 1418: 885: 877: 452: 3342:
Unicode includes support of CJKV radicals, strokes, punctuation, marks and symbols in the following blocks:
2449:(U+7EA2) are semantically identical and the glyphs differ only in the latter using a cursive version of the 703: 687: 109: 7792: 7680: 7640: 6060: 5883: 5753: 5068: 4678: 4115: 3515:
dictionary projects (which are provided for convenience and are not a formal part of the Unicode Standard).
3387: 3335: 1414: 1243: 1067:. However, in April, a report titled "1989 National Trade Estimate Report on Foreign Trade Barriers" from 865: 859: 853: 847: 8124: 8034: 7645: 7575: 7561: 7546: 7450: 7363: 7335: 7301: 6002: 5119: 4930: 4730: 4466: 4103: 4099: 3358: 3352: 1005: 997: 4158: 781:
So rather than treat the issue as a rich text problem of glyph alternates, Unicode added the concept of
8008: 7953: 7874: 7655: 7311: 7306: 6659: 5682: 5169: 5004: 4999: 4648: 4553: 3657: 831: 586: 416: 7650: 5687: 5063: 4138: 7715: 7670: 7506: 7055: 6759: 6704: 6669: 5607: 4884: 4593: 3739: 1135:
Traditional Chinese version. Also, the decision of whether to classify pairs as semantic variants or
635: 419:. In the formulation of Unicode, an attempt was made to unify these variants by considering them as 7230: 6739: 6719: 6714: 6654: 6649: 6158: 4698: 2589:
variants with different components will usually (and unsurprisingly) take unique codepoints (e.g.,
1338:(U+4EBF) as each other's respective traditional and simplified variants, Unicode does not consider 1017: 1013: 7605: 5484: 3606: 8080: 8064: 7991: 7986: 7948: 7919: 7884: 7316: 7050: 6749: 6634: 6037: 5938: 5858: 5294: 5199: 4548: 4536: 4119: 4087: 3381: 3346: 1304:(U+6F22) was and its entry informs the user of the compatibility information. On the other hand, 484: 357: 337: 331: 259: 87: 24: 1191:(U+4E2A). There are cases of non-mutual equivalence. For example, the Unihan database entry for 541:
may contain an excessive amount of intricate detail that may interest only a particular audience
188:
may contain an excessive amount of intricate detail that may interest only a particular audience
7675: 7665: 7521: 7511: 7045: 6754: 6199: 6186: 6122: 5627: 5374: 5229: 5164: 4879: 4747: 4653: 4612: 4293: 4000: 3266: 3251: 1262:
with U+8ECA and U+F902, the added compatibility character lists the already present version of
597: 20: 2437:, with two variants, the second form being simply the cursive form. The radical components of 7852: 7690: 7625: 7501: 7040: 6194: 5823: 5577: 5572: 5449: 4862: 4627: 3937:"OGCIO : Download Area : International Ideographs Core (IICORE) Comparison Utility" 2520:
Likewise, to users of one CJK language reading a document with "foreign" glyphs: variants of
763: 134: 7070: 5529: 2482:(U+8278) proves how arbitrary the state of affairs is. When used to compose characters like 1269:
normalization scheme and not only under compatibility normalization. This is similar to how
603:
An article hosted by IBM attempts to illustrate part of the motivation for Han unification:
8013: 7685: 7445: 7065: 5808: 5722: 5642: 5364: 5329: 5209: 1064: 55: 3685: 8: 7968: 7595: 7080: 6965: 6955: 6950: 5853: 5763: 5707: 5073: 5053: 4852: 4837: 4742: 4668: 4658: 1430: 930: 401: 255: 5848: 3705: 2461:. Because this change happened relatively recently, there was a transition period. Both 837:
Much of the controversy surrounding Han unification is based on the distinction between
659:), summarizing major criticism against the Han Unification approach adopted by Unicode. 8049: 7897: 7710: 7705: 7630: 6629: 6603: 6127: 6083: 5954: 5873: 5843: 5813: 5793: 5429: 5409: 5159: 4725: 4602: 4598: 4503: 4038: 3550: 3504: 830:
The controversy later extended to the internationally representative ISO: the initial
8039: 7978: 7958: 7620: 7600: 7580: 7208: 6684: 6664: 6176: 5923: 5833: 5818: 5657: 5622: 5434: 5274: 5099: 4910: 4622: 4563: 4343: 4111: 3862: 3814: 3484: 3446: 3442:
The Unihan project has always made an effort to make available their build database.
3375: 3278:(4E00–9FFF) (Otherwise known as URO, abbreviation of Unified Repertoire and Ordering) 1422: 978: 925: 813: 652: 385: 312: 5349: 3918: 3567: 2320:(U+5185) differ in exactly the same way as do the Korean and non-Korean variants of 704:
The Unicode® Standard Version 15.0 – Core Specification §3.4 Characters and Encoding
8114: 7996: 7570: 7536: 7246: 7075: 6026: 5985: 5878: 5828: 5717: 5677: 5602: 5592: 5582: 5454: 5439: 5344: 5319: 5194: 5174: 5034: 4920: 4673: 4632: 1410: 377: 3774: 2538:
may be unreadable to Non-Japanese people. (In Japan, both variants are accepted).
2403:
same font for an entire document, however. There are two distinct code points for
686:
points to express the graphemes within a system of writing, the Unicode Standard (
7963: 6694: 6689: 6679: 6624: 6309: 6299: 6294: 6289: 6284: 6279: 6274: 5980: 5933: 5918: 5778: 5743: 5738: 5672: 5662: 5612: 5479: 5469: 5464: 5414: 5384: 5254: 5244: 5204: 5104: 4994: 4683: 4588: 4558: 4022: 1426: 935: 393: 321: 5189: 3789: 1230:
Some clerical errors led to doubling of completely identical characters such as
1166:
as z-variants, differing only in font styling. Paradoxically, Unicode considers
816: 7496: 7491: 7481: 7476: 7471: 7466: 7430: 7425: 7418: 7413: 7408: 7403: 7398: 7393: 7388: 7383: 7378: 7373: 7241: 7198: 7193: 7188: 7183: 7178: 7173: 7168: 7163: 7158: 7153: 7148: 7143: 7138: 7133: 7128: 7035: 7030: 7025: 7020: 7015: 7010: 7005: 7000: 6995: 6990: 6985: 6980: 6764: 6349: 6269: 6264: 6259: 6254: 6249: 6244: 6239: 6234: 6229: 6097: 5888: 5868: 5788: 5768: 5758: 5667: 5514: 5444: 5419: 5399: 5354: 5339: 5314: 5264: 5239: 5184: 5139: 3844: 3644:"Unicode Technical Note 26: On the Encoding of Latin, Greek, Cyrillic, and Han" 2557:(U+5165), for which the only way to display the variants is to change font (or 1033: 806: 369: 365: 6744: 3371:
Additional compatibility (discouraged use) characters appear in these blocks:
2365:, whereas Korea never made separate code points for the different variants of 626: 620: 614: 405: 8093: 7816: 7236: 7123: 7118: 7113: 7108: 7103: 7098: 6975: 6970: 6960: 6945: 6940: 6935: 6930: 6925: 6920: 6915: 6910: 6905: 6900: 6895: 6890: 6885: 6880: 6875: 6870: 6865: 6860: 6855: 6850: 6845: 6840: 6835: 6830: 6825: 6820: 6815: 6810: 6805: 6800: 6795: 6790: 6785: 6780: 6699: 6674: 6639: 6598: 6344: 5908: 5893: 5773: 5712: 5597: 5539: 5534: 5499: 5474: 5424: 5309: 5299: 5149: 5094: 4937: 4735: 4531: 4017: 3588: 1139:
is not always consistent or clear, despite rationalizations in the handbook.
1081: 1076:
after protests by the organization in May 1989, the trade dispute caused the
361: 553:
any relevant information, and removing excessive detail that may be against
200:
any relevant information, and removing excessive detail that may be against
7836: 7831: 7826: 7821: 7556: 7296: 7291: 7286: 7281: 7276: 7271: 7266: 7261: 7256: 7251: 6734: 6729: 6709: 6593: 6585: 6218: 5903: 5637: 5632: 5587: 5489: 5404: 5394: 5304: 5279: 5219: 5134: 5114: 5109: 5089: 4968: 4915: 3472: 3271:
Ideographic characters assigned by Unicode appear in the following blocks:
1514: 1037: 973:
However, none of these alternative standards has been as widely adopted as
952: 799: 673: 438:
Nevertheless, many characters have regional variants assigned to different
4513: 3956:"libUnihan - A library for Unihan character database in fifth normal form" 3236: 6151: 6134: 5959: 5798: 5783: 5697: 5567: 5544: 5509: 5379: 5359: 5334: 5154: 4617: 4607: 3454: 1434: 1009: 747: 428: 4889: 3900: 3882: 3546: 1095: 884:. (Simplified Chinese characters are used among Chinese speakers in the 8001: 7909: 7762: 7440: 6535: 6505: 6500: 6495: 6490: 6455: 6339: 6334: 6324: 6319: 6117: 6107: 5838: 5289: 5179: 4815: 4523: 3718: 982: 755: 498: in this section. Unsourced material may be challenged and removed. 439: 317: 7942: 6052: 3643: 2561:
attribute) as described in the previous table. On the other hand, for
958: 7889: 7867: 7772: 7585: 6614: 6545: 6525: 6520: 6445: 6440: 5702: 5617: 5549: 5284: 5058: 4978: 4973: 4874: 4857: 3478: 2586: 2338:(U+4EBA). Each respective variant of the second character has either 1221:
was obviously already in the database at the time that the entry for
1136: 1099: 1041: 989: 889: 771: 751: 668: 593: 420: 3955: 2326:(U+5168). Each respective variant of the first character has either 585:
The Unicode Standard details the principles of Han unification. The
473: 266:. Statements consisting only of original research should be removed. 76: 8059: 7914: 7879: 7857: 7767: 7590: 6530: 6515: 6475: 6470: 6465: 6450: 6409: 6404: 6399: 6394: 6389: 6384: 6181: 6171: 6167: 6141: 5928: 5898: 5748: 5733: 5728: 5519: 5269: 5249: 5214: 5124: 4963: 4780: 4396: 4385: 3807:"Unicode in Japan: Guide to a technical and psychological struggle" 3466: 1053: 1049: 896:. Traditional Chinese characters are used in Hong Kong and Taiwan ( 893: 842: 651:" (We are feeling anxious for the future character encoding system 412: 373: 5369: 754:, where graphemes with widely different meanings (for example, an 7929: 7725: 7635: 7516: 7090: 6460: 6435: 6425: 6163: 5652: 5647: 5524: 5459: 5389: 5129: 4489: 4004: 974: 940: 353: 7934: 7924: 7902: 7782: 7757: 7752: 7435: 7326: 7216: 6575: 6565: 6550: 6367: 5913: 5692: 5504: 5494: 5224: 4810: 4805: 4775: 4374: 3760: 3512: 3503:
Unihan can also refer to the Unihan Database maintained by the
767: 739: 424: 2541: 8029: 7747: 7742: 7737: 7354: 7060: 6570: 6510: 6372: 6146: 6007: 5863: 5803: 5259: 5234: 4800: 4795: 4790: 3547:"Unicode® Standard Annex #38 | UNICODE HAN DATABASE (UNIHAN)" 3508: 3434:
feasible. There are 9810 characters in the current standard.
1438: 1060: 1029: 1021: 985: 967: 838: 809: 743: 397: 389: 381: 3755: 3753: 2573:(U+5185) gets a unique codepoint. For some characters, like 7340: 6430: 3450: 993: 897: 376:. Han characters are a feature shared in common by written 4458: 7807: 3750: 3680: 1085:
eventual adoption of Unicode with its successor Windows.
788: 1400: 913:
Simplified characters is not a one-to-one relationship.
19:"Unihan" redirects here. For the historical period, see 3245: 2532:
can be missing a stroke/have an extraneous stroke, and
3625:"Chapter 18: East Asia, Principles of Han Unification" 3453:, while its database, UnihanDb, is released under the 1088: 802: 678: 2494:
should be something that looks like two plus signs (
3481: – Glyphs with minor typographical differences 643:
Japan Electronic Industries Development Association
636:
Kanji § Orthographic reform and lists of kanji
101:. Unsourced material may be challenged and removed. 5033: 3901:"UTS #37: Unicode Ideographic Variation Database" 3419:Unicode recommends handling through other means. 8091: 8009:Unicode control, format and separator characters 3845:"The Most Popular Operating System in the World" 3422: 1298:漢 (U+FA9A) was added to the database later than 1203:(U+9F9C) to be its z-variant, but the entry for 1069:Office of the United States Trade Representative 3583: 3581: 3216: 3170: 3008: 2822: 2765: 2708: 2590: 2411: 2404: 2397: 2391: 1364: 1358: 1222: 1210: 1192: 1173: 1161: 906:U+4E1F for Traditional Chinese Big5 #A5E1 and 646: 6068: 4474: 3985: 3607:"Unihan Database Lookup: Sample lookup for 中" 3475: – Assimilation into Han Chinese culture 3203: 3190: 3157: 3144: 3126: 3111: 3098: 3076: 3063: 3041: 3028: 2995: 2982: 2960: 2947: 2925: 2912: 2890: 2877: 2855: 2842: 2809: 2796: 2778: 2752: 2739: 2721: 2695: 2682: 2660: 2647: 2614: 2608: 2602: 2596: 2580: 2574: 2568: 2562: 2552: 2533: 2527: 2521: 2507: 2501: 2495: 2489: 2483: 2477: 2468: 2462: 2456: 2450: 2444: 2438: 2432: 2426: 2420: 2360: 2354: 2345: 2339: 2333: 2327: 2315: 2309: 1409:attribute) as being in a different language: 1388: 1382: 1376: 1370: 1345: 1339: 1333: 1327: 1321: 1315: 1305: 1299: 1263: 1257: 1251: 1237: 1231: 1216: 1204: 1198: 1186: 1180: 1167: 1155: 1149: 1143: 1122: 1116: 1110: 1104: 907: 901: 455: 446: 2378: 2372: 2366: 2321: 1282:is canonically equivalent to a pre-composed 1078:Ministry of International Trade and Industry 463: 3809:. Archived from the original on 2009-06-27. 3578: 3562: 3560: 3469: – Official Chinese character encoding 2542:Examples of some non-unified Han ideographs 64:Learn how and when to remove these messages 6075: 6061: 5976:Cultural, political, and religious symbols 4481: 4467: 3992: 3978: 3734: 3732: 662: 1681:close (simplified) / laugh (traditional) 612:In fact, the three ideographs for "one" ( 573:Learn how and when to remove this message 514:Learn how and when to remove this message 300:Learn how and when to remove this message 282:Learn how and when to remove this message 220:Learn how and when to remove this message 161:Learn how and when to remove this message 3883:"UAX #38: Unicode Han Database (Unihan)" 3877: 3875: 3641: 3557: 3437: 672: 311: 6082: 4509:ISO/IEC 10646 (Universal Character Set) 4128:CJK Compatibility Ideographs Supplement 3729: 3394:CJK Compatibility Ideographs Supplement 2516:precedent for dis-unifying characters. 1350:to be semantic variants of each other. 372:languages into a single set of unified 342:question marks, boxes, or other symbols 16:Effort to map CJK characters in Unicode 8092: 4441: 3842: 3787: 3708:Steven J. Searle; Web Master, TRON Web 1293:LATIN CAPITAL LETTER A WITH RING ABOVE 6056: 5032: 4462: 3973: 3872: 1437:should select, for each character, a 1401:Examples of language-dependent glyphs 645:(JEIDA) published a pamphlet titled " 415:typically use regional or historical 5010:International Components for Unicode 4959:Common Locale Data Repository (CLDR) 3246:Ideographic Variation Database (IVD) 2425:(U+7CF8) is used in characters like 1074:Section 301 of the Trade Act of 1974 1026:International Components for Unicode 525: 496:adding citations to reliable sources 467: 427:representing the same "grapheme" or 411:Modern Chinese, Japanese and Korean 231: 172: 99:adding citations to reliable sources 70: 29: 3449:. libUnihan is released under the 1089:Merger of all equivalent characters 13: 7419:Norwegian and Danish (alternative) 5991:Mathematical operators and symbols 4096:Ideographic Description Characters 4084:CJK Unified Ideographs Extension I 4080:CJK Unified Ideographs Extension H 4076:CJK Unified Ideographs Extension G 4072:CJK Unified Ideographs Extension F 4068:CJK Unified Ideographs Extension E 4064:CJK Unified Ideographs Extension D 4060:CJK Unified Ideographs Extension C 4056:CJK Unified Ideographs Extension B 4052:CJK Unified Ideographs Extension A 3365:Ideographic Description Characters 3330:CJK Unified Ideographs Extension I 3324:CJK Unified Ideographs Extension H 3318:CJK Unified Ideographs Extension G 3312:CJK Unified Ideographs Extension F 3306:CJK Unified Ideographs Extension E 3300:CJK Unified Ideographs Extension D 3294:CJK Unified Ideographs Extension C 3288:CJK Unified Ideographs Extension B 3282:CJK Unified Ideographs Extension A 1048:rendering engines), font formats ( 14: 8141: 3260: 2361: 2355: 2316: 2310: 1250:" as with 漢 (U+FA9A) which calls 1244:bit-for-bit round-trip conversion 992:systems), programming languages ( 417:variants of a given Han character 320:(U+8FD4) in regional versions of 316:Differences for the same Unicode 45:This article has multiple issues. 8076: 8075: 6032: 6031: 6021: 6020: 6003:Phonetic symbols (including IPA) 3953: 3788:Becker, Joseph D. (1998-08-29). 3761:"Ideographic Variation Database" 2619:). This list is not exhaustive. 530: 472: 236: 177: 75: 34: 7863:Digital encoding of APL symbols 7798:Comparison of Unicode encodings 6316:Proposed but not approved 4124:Enclosed Ideographic Supplement 4108:Enclosed CJK Letters and Months 3947: 3929: 3911: 3893: 3855: 3843:Krikke, Jan (15 October 2003). 3836: 3827: 3799: 3781: 3767: 3711: 3699: 3668: 3406:Enclosed Ideographic Supplement 3400:Enclosed CJK Letters and Months 3237:MDBG Chinese-English Dictionary 2282: 2254: 2226: 2198: 2170: 2142: 2114: 2086: 2058: 2030: 2002: 1974: 1946: 1918: 1890: 1862: 1834: 1806: 1778: 1750: 1722: 1694: 1666: 1638: 1610: 1582: 1554: 1526: 1142:So-called semantic variants of 916: 483:needs additional citations for 352:is an effort by the authors of 86:needs additional citations for 53:or discuss these issues on the 8120:Natural language and computing 3650: 3635: 3617: 3599: 3539: 3518: 3497: 3158: 2996: 2983: 2615: 2609: 2603: 2597: 2581: 2575: 2569: 2563: 2508: 2502: 2496: 2490: 2484: 2469: 2463: 2457: 2445: 2439: 2433: 2427: 2292: 2264: 2236: 2208: 2180: 2152: 2124: 2096: 2068: 2040: 2012: 1984: 1956: 1928: 1900: 1872: 1844: 1816: 1788: 1760: 1732: 1704: 1676: 1648: 1620: 1592: 1564: 1536: 1377: 1371: 1346: 1340: 1334: 1328: 1322: 1316: 1217: 1205: 1199: 1187: 1181: 1168: 1150: 1144: 908: 902: 882:Traditional Chinese characters 456: 447: 1: 4943:International Ideographs Core 4753:International Ideographs Core 4694:Alias names and abbreviations 3532: 3429:International Ideographs Core 3423:International Ideographs Core 2526:can appear as mirror images, 878:Simplified Chinese characters 5165:CJK Unified Ideographs (Han) 5015:People involved with Unicode 4116:CJK Compatibility Ideographs 3684:. 2013-12-16. Archived from 3676:"The secret life of Unicode" 3642:Whistler, Ken (2010-10-25). 3388:CJK Compatibility Ideographs 3336:CJK Compatibility Ideographs 2279: 2276: 2251: 2248: 2223: 2220: 2195: 2192: 2167: 2164: 2139: 2136: 2111: 2108: 2083: 2080: 2055: 2052: 2027: 2024: 1999: 1996: 1971: 1968: 1943: 1940: 1915: 1912: 1887: 1884: 1859: 1856: 1831: 1828: 1803: 1800: 1775: 1772: 1747: 1744: 1719: 1716: 1691: 1688: 1663: 1660: 1635: 1632: 1607: 1604: 1579: 1576: 1551: 1548: 1523: 1520: 1215:as a z-variant, even though 789:Unihan "abstract characters" 555:Knowledge's inclusion policy 423: – different 202:Knowledge's inclusion policy 7: 8035:Character encodings in HTML 7369:National Replacement (NRCS) 7336:Japanese language in EBCDIC 4488: 4100:CJK Symbols and Punctuation 3460: 3359:CJK Symbols and Punctuation 262:the claims made and adding 10: 8146: 8100:Chinese-language computing 5005:Ideographic Research Group 5000:ConScript Unicode Registry 4039:Scripts contained in block 3426: 3264: 3249: 886:People's Republic of China 666: 587:Ideographic Research Group 18: 8110:Korean-language computing 8073: 8022: 7977: 7845: 7806: 7724: 7459: 7349: 7325: 7207: 7089: 6773: 6612: 6584: 6418: 6360: 6217: 6090: 6016: 5968: 5947: 5558: 5082: 5042: 5028: 4987: 4951: 4898: 4885:Regional indicator symbol 4828: 4761: 4718: 4711: 4641: 4594:Combining grapheme joiner 4579: 4572: 4522: 4496: 4437: 4011: 3960:libunihan.sourceforge.net 3819:: CS1 maint: unfit URL ( 3740:"Chapter 1: Introduction" 3658:"Han Unification History" 3230: 3217: 3204: 3191: 3171: 3145: 3127: 3112: 3099: 3077: 3064: 3042: 3029: 3009: 2961: 2948: 2926: 2913: 2891: 2878: 2856: 2843: 2823: 2810: 2797: 2779: 2766: 2753: 2740: 2722: 2709: 2696: 2683: 2661: 2648: 2591: 2567:(U+5167), the variant of 2553: 2534: 2528: 2522: 2478: 2451: 2421: 2412: 2405: 2398: 2392: 2379: 2373: 2367: 2346: 2340: 2334: 2328: 2322: 1513: 1475: 1446: 1389: 1383: 1365: 1359: 1306: 1300: 1264: 1258: 1252: 1238: 1232: 1223: 1211: 1193: 1174: 1162: 1156: 1123: 1117: 1111: 1105: 1080:to accept a request from 647: 625: 619: 613: 464:Rationale and controversy 8065:Variable-length encoding 7846:Miscellaneous code pages 6604:Extended Unix Code / EUC 6295:-15 (New Western Europe) 6091:Early telecommunications 6038:Category: Unicode blocks 4843:Compatibility characters 3923:ccjktype.fonts.adobe.com 3775:"Early Years of Unicode" 3719:"IVD/IVSとは - 文字情報基盤整備事業" 3589:"Unihan Database Lookup" 3490: 2476:The case of the radical 832:CJK Joint Research Group 805:is not unified with the 8130:Chinese character lists 7992:C0 and C1 control codes 4763:Comparison of encodings 4689:Halfwidth and fullwidth 4544:Universal Character Set 4120:CJK Compatibility Forms 4088:CJK Radicals Supplement 3382:CJK Compatibility Forms 3347:CJK Radicals Supplement 2291: 2288: 2285: 2263: 2260: 2257: 2235: 2232: 2229: 2207: 2204: 2201: 2179: 2176: 2173: 2151: 2148: 2145: 2123: 2120: 2117: 2095: 2092: 2089: 2067: 2064: 2061: 2039: 2036: 2033: 2011: 2008: 2005: 1983: 1980: 1977: 1955: 1952: 1949: 1927: 1924: 1921: 1899: 1896: 1893: 1871: 1868: 1865: 1843: 1840: 1837: 1815: 1812: 1809: 1787: 1784: 1781: 1759: 1756: 1753: 1731: 1728: 1725: 1703: 1700: 1697: 1675: 1672: 1669: 1647: 1644: 1641: 1619: 1616: 1613: 1591: 1588: 1585: 1563: 1560: 1557: 1535: 1532: 1529: 663:Graphemes versus glyphs 648:未来の文字コード体系に私達は不安をもっています 358:Universal Character Set 25:Unihan (disambiguation) 6240:-3 (Maltese/Esperanto) 6191:World System Teletext 5688:Inscriptional Parthian 5375:Nyiakeng Puachue Hmong 5037:and symbols in Unicode 4654:CJK Unified Ideographs 4048:CJK Unified Ideographs 3276:CJK Unified Ideographs 3267:CJK Unified Ideographs 3252:Variant form (Unicode) 2129:one who does/-ist/-er 1248:compatibility variants 1024:), and libraries (IBM 708: 682: 610: 598:CJK Unified Ideographs 330:This article contains 324: 23:. For other uses, see 8105:Encodings of Japanese 8014:Whitespace characters 7691:Ventura International 5824:Old Persian cuneiform 5683:Inscriptional Pahlavi 5578:Ancient North Arabian 5573:Anatolian hieroglyphs 4863:Precomposed character 4699:Whitespace characters 4628:Zero-width non-joiner 3777:. Unicode Consortium. 3763:. Unicode Consortium. 3746:. Unicode Consortium. 3664:. Unicode Consortium. 3631:. Unicode Consortium. 3613:. Unicode Consortium. 3595:. Unicode Consortium. 3574:. Unicode Consortium. 3438:Unihan database files 1028:(ICU) along with the 692: 677:The Latin lowercase " 676: 605: 315: 7409:Norwegian and Danish 5643:Egyptian hieroglyphs 4848:Duplicate characters 4664:Duplicate characters 3744:The Unicode Standard 3662:The Unicode Standard 3629:The Unicode Standard 3611:The Unicode Standard 3593:The Unicode Standard 3572:The Unicode Standard 1065:compulsory education 736:COMBINING RING ABOVE 721:LATIN SMALL LETTER A 492:improve this article 95:improve this article 7969:Unified Hangul Code 7641:PostScript Standard 7364:Multinational (MCS) 6235:-2 (Central Europe) 6230:-1 (Western Europe) 6084:Character encodings 5708:Khitan small script 5145:Canadian Aboriginal 4880:Variation sequences 4838:Combining character 4748:Variation sequences 4659:Combining character 3861:大下英治 『孫正義 起業の若き獅子』( 3723:mojikiban.ipa.go.jp 1197:(U+4E80) considers 931:CCCII character set 783:variation selectors 764:combining diaeresis 8125:Character encoding 8050:Hardware code page 7810:typesetting system 7646:PostScript Latin 1 7302:Cyrillic + Finnish 7209:Windows code pages 7091:IBM AIX code pages 6419:National standards 6350:Ukrainian Cyrillic 5948:Notational scripts 5899:Tagalog (Baybayin) 5608:Caucasian Albanian 4931:numeric references 4906:Domain names (IDN) 4726:Bidirectional text 4603:Right-to-left mark 4599:Left-to-right mark 4554:Character property 4504:Unicode Consortium 4447:As of version 16.0 3551:Unicode Consortium 3505:Unicode Consortium 966:and its successor 746:'s unification of 683: 332:special characters 325: 247:possibly contains 21:Chu–Han Contention 8087: 8086: 8040:Charset detection 7979:Control character 7661:Sharp calculators 7532:Casio calculators 7460:Platform specific 7312:Cyrillic + German 7307:Cyrillic + French 6725:Maltese/Esperanto 6361:Bibliographic use 6245:-4 (North Europe) 6177:T.51/ISO/IEC 6937 6135:Baudot and Murray 6050: 6049: 6046: 6045: 6027:Category: Unicode 5064:Punctuation marks 5046:inherited scripts 4952:Related standards 4926:entity references 4824: 4823: 4707: 4706: 4623:Zero-width joiner 4456: 4455: 4432: 4431: 4237:2F800–2FA1F 4235:1F200–1F2FF 4215:2EBF0–2EE5F 4213:31350–323AF 4211:30000–3134F 4209:2CEB0–2EBEF 4207:2B820–2CEAF 4205:2B740–2B81F 4203:2A700–2B73F 4201:20000–2A6DF 4112:CJK Compatibility 3706:Unicode Revisited 3485:List of CJK fonts 3447:fifth normal form 3376:CJK Compatibility 3243: 3242: 3239: 3213: 3200: 3187: 3167: 3154: 3141: 3123: 3108: 3095: 3073: 3060: 3038: 3025: 3018:meditation (Zen) 3005: 2992: 2979: 2957: 2944: 2922: 2909: 2887: 2874: 2852: 2839: 2819: 2806: 2793: 2775: 2762: 2749: 2736: 2718: 2705: 2692: 2679: 2657: 2644: 2301: 2300: 1905:secondary/follow 1765:transform/change 1417:and two types of 1059:In March 1989, a 979:Microsoft Windows 926:CNS character set 583: 582: 575: 524: 523: 516: 368:of the so-called 338:rendering support 310: 309: 302: 292: 291: 284: 249:original research 230: 229: 222: 171: 170: 163: 145: 110:"Han unification" 68: 8137: 8079: 8078: 7571:DG International 7446:Special Graphics 7247:Extended Latin-8 6645:Central European 6635:Barents Cyrillic 6340:Barents Cyrillic 6310:-12 (Devanagari) 6306:Abandoned parts 6077: 6070: 6063: 6054: 6053: 6035: 6034: 6024: 6023: 5986:Control Pictures 5939:Zanabazar Square 5678:Imperial Aramaic 5561:historic scripts 5030: 5029: 4890:Emoji skin color 4716: 4715: 4633:Zero-width space 4577: 4576: 4564:Private Use Area 4549:Character charts 4483: 4476: 4469: 4460: 4459: 4445: 4390:Katakana, Common 4014: 4013: 3994: 3987: 3980: 3971: 3970: 3964: 3963: 3951: 3945: 3944: 3941:www.ogcio.gov.hk 3933: 3927: 3926: 3915: 3909: 3908: 3897: 3891: 3890: 3879: 3870: 3859: 3853: 3852: 3849:LinuxInsider.com 3840: 3834: 3833:小林紀興『松下電器の果し状』1章 3831: 3825: 3824: 3818: 3810: 3803: 3797: 3796: 3794: 3785: 3779: 3778: 3771: 3765: 3764: 3757: 3748: 3747: 3736: 3727: 3726: 3715: 3709: 3703: 3697: 3696: 3694: 3693: 3672: 3666: 3665: 3654: 3648: 3647: 3639: 3633: 3632: 3621: 3615: 3614: 3603: 3597: 3596: 3585: 3576: 3575: 3564: 3555: 3554: 3543: 3526: 3522: 3516: 3501: 3231: 3220: 3219: 3211: 3207: 3206: 3198: 3194: 3193: 3185: 3174: 3173: 3165: 3161: 3160: 3152: 3148: 3147: 3139: 3130: 3129: 3121: 3115: 3114: 3106: 3102: 3101: 3093: 3080: 3079: 3071: 3067: 3066: 3058: 3045: 3044: 3036: 3032: 3031: 3023: 3012: 3011: 3003: 2999: 2998: 2990: 2986: 2985: 2977: 2964: 2963: 2955: 2951: 2950: 2942: 2929: 2928: 2920: 2916: 2915: 2907: 2894: 2893: 2885: 2881: 2880: 2872: 2859: 2858: 2850: 2846: 2845: 2837: 2826: 2825: 2817: 2813: 2812: 2804: 2800: 2799: 2791: 2782: 2781: 2773: 2769: 2768: 2760: 2756: 2755: 2747: 2743: 2742: 2734: 2725: 2724: 2716: 2712: 2711: 2703: 2699: 2698: 2690: 2686: 2685: 2677: 2664: 2663: 2655: 2651: 2650: 2642: 2622: 2621: 2618: 2617: 2612: 2611: 2606: 2605: 2600: 2599: 2594: 2593: 2584: 2583: 2578: 2577: 2572: 2571: 2566: 2565: 2560: 2556: 2555: 2549: 2537: 2536: 2531: 2530: 2525: 2524: 2511: 2510: 2505: 2504: 2499: 2498: 2493: 2492: 2487: 2486: 2481: 2480: 2472: 2471: 2466: 2465: 2460: 2459: 2454: 2453: 2448: 2447: 2442: 2441: 2436: 2435: 2430: 2429: 2424: 2423: 2415: 2414: 2408: 2407: 2401: 2400: 2395: 2394: 2382: 2381: 2376: 2375: 2370: 2369: 2364: 2363: 2358: 2357: 2349: 2348: 2343: 2342: 2337: 2336: 2331: 2330: 2325: 2324: 2319: 2318: 2313: 2312: 2294: 2266: 2238: 2210: 2182: 2154: 2126: 2098: 2070: 2042: 2014: 1989:direct/straight 1986: 1958: 1930: 1902: 1874: 1846: 1818: 1790: 1762: 1734: 1706: 1678: 1650: 1622: 1594: 1566: 1538: 1518: 1508: 1503: 1498: 1493: 1488: 1483: 1444: 1443: 1408: 1392: 1391: 1386: 1385: 1380: 1379: 1374: 1373: 1368: 1367: 1362: 1361: 1349: 1348: 1343: 1342: 1337: 1336: 1331: 1330: 1325: 1324: 1319: 1318: 1309: 1308: 1303: 1302: 1294: 1291: 1288: 1286: 1281: 1278: 1275: 1273: 1267: 1266: 1261: 1260: 1255: 1254: 1241: 1240: 1235: 1234: 1226: 1225: 1220: 1219: 1214: 1213: 1208: 1207: 1202: 1201: 1196: 1195: 1190: 1189: 1184: 1183: 1177: 1176: 1171: 1170: 1165: 1164: 1159: 1158: 1153: 1152: 1147: 1146: 1126: 1125: 1120: 1119: 1114: 1113: 1108: 1107: 911: 910: 905: 904: 868: 867: 862: 861: 856: 855: 850: 849: 737: 734: 733: 729: 727: 722: 719: 716: 714: 706: 650: 649: 629: 623: 617: 578: 571: 567: 564: 558: 534: 533: 526: 519: 512: 508: 505: 499: 476: 468: 459: 458: 451:(U+500B) versus 450: 449: 360:to map multiple 305: 298: 287: 280: 276: 273: 267: 264:inline citations 240: 239: 232: 225: 218: 214: 211: 205: 181: 180: 173: 166: 159: 155: 152: 146: 144: 103: 79: 71: 60: 38: 37: 30: 8145: 8144: 8140: 8139: 8138: 8136: 8135: 8134: 8090: 8089: 8088: 8083: 8069: 8045:Han unification 8018: 7973: 7841: 7802: 7720: 7542:Compucolor 8001 7455: 7451:Technical (TCS) 7374:French Canadian 7345: 7321: 7317:Polytonic Greek 7203: 7085: 6769: 6755:Turkic Cyrillic 6670:Font X (Kermit) 6665:Farsi (Persian) 6617: 6608: 6580: 6414: 6356: 6226:Approved parts 6213: 6086: 6081: 6051: 6042: 6012: 5996:List by subject 5969:Symbols, emojis 5964: 5943: 5859:Psalter Pahlavi 5560: 5554: 5415:Pracalit (Newa) 5230:Hanifi Rohingya 5078: 5054:Combining marks 5045: 5038: 5024: 5020:Han unification 4983: 4947: 4894: 4830: 4820: 4757: 4703: 4637: 4581:Special purpose 4568: 4518: 4492: 4487: 4457: 4452: 4451: 4448: 4442: 4433: 4422: 4417: 4404: 4402: 4400: 4395: 4393: 4391: 4389: 4383: 4381: 4372: 4368: 4366: 4364: 4362: 4360: 4358: 4356: 4354: 4352: 4350: 4348: 4346: 4338: 4336: 4334: 4332: 4330: 4328: 4326: 4324: 4322: 4320: 4318: 4316: 4314: 4312: 4310: 4308: 4306: 4304: 4302: 4300: 4298: 4296: 4288: 4286: 4284: 4282: 4280: 4278: 4276: 4274: 4272: 4270: 4268: 4266: 4264: 4262: 4260: 4258: 4256: 4254: 4252: 4250: 4248: 4246: 4240: 4238: 4236: 4234: 4233:FE30–FE4F 4232: 4231:F900–FAFF 4230: 4229:3300–33FF 4228: 4227:3200–32FF 4226: 4225:31C0–31EF 4224: 4223:3000–303F 4222: 4221:2FF0–2FFF 4220: 4219:2F00–2FDF 4218: 4217:2E80–2EFF 4216: 4214: 4212: 4210: 4208: 4206: 4204: 4202: 4200: 4199:3400–4DBF 4198: 4197:4E00–9FFF 4192: 4190: 4188: 4183: 4181: 4179: 4177: 4175: 4173: 4171: 4169: 4167: 4165: 4163: 4161: 4156: 4154: 4152: 4150: 4148: 4143: 4141: 4132: 4130: 4126: 4122: 4118: 4114: 4110: 4106: 4102: 4098: 4094: 4092:Kangxi Radicals 4090: 4086: 4082: 4078: 4074: 4070: 4066: 4062: 4058: 4054: 4050: 4034:Han unification 4007: 3998: 3968: 3967: 3954:Chen, Ding-Yi. 3952: 3948: 3935: 3934: 3930: 3917: 3916: 3912: 3905:www.unicode.org 3899: 3898: 3894: 3887:www.unicode.org 3881: 3880: 3873: 3860: 3856: 3841: 3837: 3832: 3828: 3812: 3811: 3805: 3804: 3800: 3792: 3786: 3782: 3773: 3772: 3768: 3759: 3758: 3751: 3738: 3737: 3730: 3717: 3716: 3712: 3704: 3700: 3691: 3689: 3674: 3673: 3669: 3656: 3655: 3651: 3640: 3636: 3623: 3622: 3618: 3605: 3604: 3600: 3587: 3586: 3579: 3566: 3565: 3558: 3545: 3544: 3540: 3535: 3530: 3529: 3523: 3519: 3502: 3498: 3493: 3463: 3440: 3431: 3425: 3412:Kangxi Radicals 3269: 3263: 3254: 3248: 3221: 3214: 3208: 3201: 3195: 3188: 3175: 3168: 3162: 3155: 3149: 3142: 3131: 3124: 3116: 3109: 3103: 3096: 3081: 3074: 3068: 3061: 3046: 3039: 3033: 3026: 3013: 3006: 3000: 2993: 2987: 2980: 2965: 2958: 2952: 2945: 2930: 2923: 2917: 2910: 2895: 2888: 2882: 2875: 2860: 2853: 2847: 2840: 2827: 2820: 2814: 2807: 2801: 2794: 2783: 2776: 2770: 2763: 2757: 2750: 2744: 2737: 2726: 2719: 2713: 2706: 2700: 2693: 2687: 2680: 2665: 2658: 2652: 2645: 2558: 2547: 2544: 1506: 1501: 1496: 1491: 1486: 1481: 1463: 1461: 1456: 1451: 1406: 1403: 1292: 1289: 1284: 1283: 1279: 1276: 1271: 1270: 1091: 959:Big5 extensions 919: 864: 858: 852: 846: 814:Cyrillic letter 791: 735: 731: 730: 725: 724: 720: 717: 712: 711: 707: 702: 671: 665: 579: 568: 562: 559: 545:Please help by 544: 535: 531: 520: 509: 503: 500: 489: 477: 466: 350:Han unification 347: 346: 345: 336:Without proper 322:Source Han Sans 306: 295: 294: 293: 288: 277: 271: 268: 253: 241: 237: 226: 215: 209: 206: 192:Please help by 191: 182: 178: 167: 156: 150: 147: 104: 102: 92: 80: 39: 35: 28: 17: 12: 11: 5: 8143: 8133: 8132: 8127: 8122: 8117: 8112: 8107: 8102: 8085: 8084: 8081:Character sets 8074: 8071: 8070: 8068: 8067: 8062: 8057: 8052: 8047: 8042: 8037: 8032: 8026: 8024: 8023:Related topics 8020: 8019: 8017: 8016: 8011: 8006: 8005: 8004: 7999: 7989: 7987:Morse prosigns 7983: 7981: 7975: 7974: 7972: 7971: 7966: 7961: 7956: 7951: 7946: 7939: 7938: 7937: 7932: 7927: 7917: 7912: 7907: 7906: 7905: 7900: 7892: 7887: 7882: 7877: 7872: 7871: 7870: 7860: 7855: 7849: 7847: 7843: 7842: 7840: 7839: 7834: 7829: 7824: 7819: 7813: 7811: 7804: 7803: 7801: 7800: 7795: 7790: 7785: 7780: 7775: 7770: 7765: 7760: 7755: 7750: 7745: 7740: 7734: 7732: 7722: 7721: 7719: 7718: 7713: 7708: 7703: 7698: 7693: 7688: 7683: 7681:TI calculators 7678: 7673: 7668: 7663: 7658: 7653: 7648: 7643: 7638: 7633: 7628: 7623: 7618: 7613: 7608: 7603: 7598: 7593: 7588: 7583: 7578: 7573: 7568: 7559: 7554: 7549: 7544: 7539: 7534: 7529: 7524: 7519: 7514: 7509: 7504: 7499: 7494: 7489: 7484: 7479: 7474: 7469: 7463: 7461: 7457: 7456: 7454: 7453: 7448: 7443: 7438: 7433: 7428: 7423: 7422: 7421: 7416: 7411: 7406: 7401: 7396: 7391: 7389:United Kingdom 7386: 7381: 7376: 7366: 7360: 7358: 7347: 7346: 7344: 7343: 7338: 7332: 7330: 7323: 7322: 7320: 7319: 7314: 7309: 7304: 7299: 7294: 7289: 7284: 7279: 7274: 7269: 7264: 7259: 7254: 7249: 7244: 7239: 7234: 7224: 7219: 7213: 7211: 7205: 7204: 7202: 7201: 7196: 7191: 7186: 7181: 7176: 7171: 7166: 7161: 7156: 7151: 7146: 7141: 7136: 7131: 7126: 7121: 7116: 7111: 7106: 7101: 7095: 7093: 7087: 7086: 7084: 7083: 7078: 7073: 7068: 7063: 7058: 7053: 7048: 7043: 7038: 7033: 7028: 7023: 7018: 7013: 7008: 7003: 6998: 6993: 6988: 6983: 6978: 6973: 6968: 6963: 6958: 6953: 6948: 6943: 6938: 6933: 6928: 6923: 6918: 6913: 6908: 6903: 6898: 6893: 6888: 6883: 6878: 6873: 6868: 6863: 6858: 6853: 6848: 6843: 6838: 6833: 6828: 6823: 6818: 6813: 6808: 6803: 6798: 6793: 6788: 6783: 6777: 6775: 6774:DOS code pages 6771: 6770: 6768: 6767: 6762: 6757: 6752: 6747: 6742: 6737: 6732: 6727: 6722: 6720:Latin (Kermit) 6717: 6712: 6707: 6702: 6697: 6692: 6687: 6682: 6677: 6672: 6667: 6662: 6657: 6652: 6647: 6642: 6637: 6632: 6627: 6621: 6619: 6610: 6609: 6607: 6606: 6601: 6596: 6590: 6588: 6582: 6581: 6579: 6578: 6573: 6568: 6563: 6558: 6553: 6548: 6543: 6538: 6533: 6528: 6523: 6518: 6513: 6508: 6503: 6498: 6493: 6488: 6483: 6478: 6473: 6468: 6463: 6458: 6453: 6448: 6443: 6438: 6433: 6428: 6422: 6420: 6416: 6415: 6413: 6412: 6407: 6402: 6397: 6392: 6387: 6382: 6381: 6380: 6375: 6364: 6362: 6358: 6357: 6355: 6354: 6353: 6352: 6347: 6342: 6337: 6329: 6328: 6327: 6322: 6320:KOI-8 Cyrillic 6314: 6313: 6312: 6304: 6303: 6302: 6300:-16 (Romanian) 6297: 6292: 6287: 6282: 6277: 6272: 6267: 6262: 6257: 6252: 6247: 6242: 6237: 6232: 6223: 6221: 6215: 6214: 6212: 6211: 6206: 6205: 6204: 6203: 6202: 6197: 6189: 6184: 6179: 6161: 6156: 6155: 6154: 6144: 6139: 6138: 6137: 6132: 6131: 6130: 6125: 6120: 6115: 6105: 6098:Telegraph code 6094: 6092: 6088: 6087: 6080: 6079: 6072: 6065: 6057: 6048: 6047: 6044: 6043: 6041: 6040: 6029: 6017: 6014: 6013: 6011: 6010: 6005: 6000: 5999: 5998: 5988: 5983: 5978: 5972: 5970: 5966: 5965: 5963: 5962: 5957: 5951: 5949: 5945: 5944: 5942: 5941: 5936: 5931: 5926: 5921: 5916: 5911: 5906: 5901: 5896: 5891: 5886: 5881: 5876: 5871: 5866: 5861: 5856: 5851: 5846: 5841: 5836: 5831: 5826: 5821: 5816: 5811: 5806: 5801: 5796: 5791: 5786: 5781: 5776: 5771: 5766: 5761: 5756: 5751: 5746: 5741: 5736: 5731: 5726: 5720: 5715: 5710: 5705: 5700: 5695: 5690: 5685: 5680: 5675: 5670: 5665: 5660: 5655: 5650: 5645: 5640: 5635: 5630: 5625: 5620: 5615: 5610: 5605: 5600: 5595: 5590: 5585: 5580: 5575: 5570: 5564: 5562: 5556: 5555: 5553: 5552: 5547: 5542: 5537: 5532: 5527: 5522: 5517: 5512: 5507: 5502: 5497: 5492: 5487: 5482: 5477: 5472: 5467: 5462: 5457: 5452: 5450:Sorang Sompeng 5447: 5442: 5437: 5432: 5427: 5422: 5417: 5412: 5407: 5402: 5397: 5392: 5387: 5382: 5377: 5372: 5367: 5362: 5357: 5352: 5347: 5342: 5340:Miao (Pollard) 5337: 5332: 5327: 5322: 5317: 5312: 5307: 5302: 5297: 5292: 5287: 5282: 5277: 5272: 5267: 5262: 5257: 5252: 5247: 5242: 5237: 5232: 5227: 5222: 5217: 5212: 5207: 5202: 5197: 5192: 5187: 5182: 5177: 5172: 5167: 5162: 5157: 5152: 5147: 5142: 5137: 5132: 5127: 5122: 5117: 5112: 5107: 5102: 5097: 5092: 5086: 5084: 5083:Modern scripts 5080: 5079: 5077: 5076: 5071: 5066: 5061: 5056: 5050: 5048: 5040: 5039: 5026: 5025: 5023: 5022: 5017: 5012: 5007: 5002: 4997: 4991: 4989: 4988:Related topics 4985: 4984: 4982: 4981: 4976: 4971: 4966: 4961: 4955: 4953: 4949: 4948: 4946: 4945: 4940: 4935: 4934: 4933: 4928: 4918: 4913: 4908: 4902: 4900: 4896: 4895: 4893: 4892: 4887: 4882: 4877: 4872: 4871: 4870: 4860: 4855: 4850: 4845: 4840: 4834: 4832: 4826: 4825: 4822: 4821: 4819: 4818: 4813: 4808: 4803: 4798: 4793: 4788: 4783: 4778: 4773: 4767: 4765: 4759: 4758: 4756: 4755: 4750: 4745: 4740: 4739: 4738: 4728: 4722: 4720: 4713: 4709: 4708: 4705: 4704: 4702: 4701: 4696: 4691: 4686: 4681: 4676: 4671: 4666: 4661: 4656: 4651: 4645: 4643: 4639: 4638: 4636: 4635: 4630: 4625: 4620: 4615: 4610: 4605: 4596: 4591: 4585: 4583: 4574: 4570: 4569: 4567: 4566: 4561: 4556: 4551: 4546: 4541: 4540: 4539: 4528: 4526: 4520: 4519: 4517: 4516: 4511: 4506: 4500: 4498: 4494: 4493: 4486: 4485: 4478: 4471: 4463: 4454: 4453: 4450: 4449: 4446: 4439: 4438: 4435: 4434: 4430: 4429: 4426: 4423: 4420: 4418: 4415: 4413: 4410: 4406: 4405: 4341: 4339: 4329:12 are unified 4291: 4289: 4243: 4241: 4195: 4193: 4135: 4133: 4045: 4042: 4041: 4036: 4031: 4028: 4025: 4020: 4012: 4009: 4008: 4001:CJK ideographs 3997: 3996: 3989: 3982: 3974: 3966: 3965: 3946: 3928: 3910: 3892: 3871: 3854: 3835: 3826: 3798: 3780: 3766: 3749: 3728: 3710: 3698: 3667: 3649: 3634: 3616: 3598: 3577: 3556: 3537: 3536: 3534: 3531: 3528: 3527: 3517: 3495: 3494: 3492: 3489: 3488: 3487: 3482: 3476: 3470: 3462: 3459: 3439: 3436: 3427:Main article: 3424: 3421: 3416: 3415: 3409: 3403: 3397: 3391: 3385: 3379: 3369: 3368: 3362: 3356: 3350: 3340: 3339: 3333: 3327: 3321: 3315: 3309: 3303: 3297: 3291: 3285: 3279: 3265:Main article: 3262: 3261:Unicode ranges 3259: 3250:Main article: 3247: 3244: 3241: 3240: 3228: 3227: 3224: 3222: 3215: 3209: 3202: 3196: 3189: 3182: 3181: 3178: 3176: 3169: 3163: 3156: 3150: 3143: 3136: 3135: 3132: 3125: 3119: 3117: 3110: 3104: 3097: 3090: 3089: 3086: 3084: 3082: 3075: 3069: 3062: 3055: 3054: 3051: 3049: 3047: 3040: 3034: 3027: 3020: 3019: 3016: 3014: 3007: 3001: 2994: 2988: 2981: 2974: 2973: 2970: 2968: 2966: 2959: 2953: 2946: 2939: 2938: 2935: 2933: 2931: 2924: 2918: 2911: 2904: 2903: 2900: 2898: 2896: 2889: 2883: 2876: 2869: 2868: 2865: 2863: 2861: 2854: 2848: 2841: 2834: 2833: 2830: 2828: 2821: 2815: 2808: 2802: 2795: 2788: 2787: 2784: 2777: 2771: 2764: 2758: 2751: 2745: 2738: 2731: 2730: 2727: 2720: 2714: 2707: 2701: 2694: 2688: 2681: 2674: 2673: 2670: 2668: 2666: 2659: 2653: 2646: 2639: 2638: 2635: 2634:Other variant 2632: 2629: 2626: 2543: 2540: 2299: 2298: 2295: 2290: 2287: 2284: 2281: 2278: 2275: 2271: 2270: 2267: 2262: 2259: 2256: 2253: 2250: 2247: 2243: 2242: 2241:way/path/road 2239: 2234: 2231: 2228: 2225: 2222: 2219: 2215: 2214: 2211: 2206: 2203: 2200: 2197: 2194: 2191: 2187: 2186: 2183: 2178: 2175: 2172: 2169: 2166: 2163: 2159: 2158: 2155: 2150: 2147: 2144: 2141: 2138: 2135: 2131: 2130: 2127: 2122: 2119: 2116: 2113: 2110: 2107: 2103: 2102: 2099: 2094: 2091: 2088: 2085: 2082: 2079: 2075: 2074: 2071: 2066: 2063: 2060: 2057: 2054: 2051: 2047: 2046: 2043: 2038: 2035: 2032: 2029: 2026: 2023: 2019: 2018: 2015: 2010: 2007: 2004: 2001: 1998: 1995: 1991: 1990: 1987: 1982: 1979: 1976: 1973: 1970: 1967: 1963: 1962: 1959: 1954: 1951: 1948: 1945: 1942: 1939: 1935: 1934: 1931: 1926: 1923: 1920: 1917: 1914: 1911: 1907: 1906: 1903: 1898: 1895: 1892: 1889: 1886: 1883: 1879: 1878: 1877:arrive/resist 1875: 1870: 1867: 1864: 1861: 1858: 1855: 1851: 1850: 1847: 1842: 1839: 1836: 1833: 1830: 1827: 1823: 1822: 1819: 1814: 1811: 1808: 1805: 1802: 1799: 1795: 1794: 1791: 1786: 1783: 1780: 1777: 1774: 1771: 1767: 1766: 1763: 1758: 1755: 1752: 1749: 1746: 1743: 1739: 1738: 1735: 1730: 1727: 1724: 1721: 1718: 1715: 1711: 1710: 1707: 1702: 1699: 1696: 1693: 1690: 1687: 1683: 1682: 1679: 1674: 1671: 1668: 1665: 1662: 1659: 1655: 1654: 1651: 1646: 1643: 1640: 1637: 1634: 1631: 1627: 1626: 1623: 1618: 1615: 1612: 1609: 1606: 1603: 1599: 1598: 1595: 1590: 1587: 1584: 1581: 1578: 1575: 1571: 1570: 1569:cause/command 1567: 1562: 1559: 1556: 1553: 1550: 1547: 1543: 1542: 1539: 1534: 1531: 1528: 1525: 1522: 1519: 1510: 1509: 1504: 1499: 1494: 1489: 1484: 1478: 1477: 1474: 1471: 1468: 1465: 1458: 1457:(traditional) 1453: 1448: 1402: 1399: 1209:does not list 1090: 1087: 971: 970: 961: 956: 946: 945: 938: 933: 928: 918: 915: 790: 787: 723:combined with 700: 688:section 3.4 D7 667:Main article: 664: 661: 581: 580: 538: 536: 529: 522: 521: 480: 478: 471: 465: 462: 366:Han characters 362:character sets 340:, you may see 328: 327: 326: 308: 307: 290: 289: 244: 242: 235: 228: 227: 185: 183: 176: 169: 168: 83: 81: 74: 69: 43: 42: 40: 33: 15: 9: 6: 4: 3: 2: 8142: 8131: 8128: 8126: 8123: 8121: 8118: 8116: 8113: 8111: 8108: 8106: 8103: 8101: 8098: 8097: 8095: 8082: 8072: 8066: 8063: 8061: 8058: 8056: 8053: 8051: 8048: 8046: 8043: 8041: 8038: 8036: 8033: 8031: 8028: 8027: 8025: 8021: 8015: 8012: 8010: 8007: 8003: 8000: 7998: 7995: 7994: 7993: 7990: 7988: 7985: 7984: 7982: 7980: 7976: 7970: 7967: 7965: 7962: 7960: 7957: 7955: 7952: 7950: 7947: 7945: 7944: 7940: 7936: 7933: 7931: 7928: 7926: 7923: 7922: 7921: 7918: 7916: 7913: 7911: 7908: 7904: 7901: 7899: 7896: 7895: 7893: 7891: 7888: 7886: 7883: 7881: 7878: 7876: 7873: 7869: 7866: 7865: 7864: 7861: 7859: 7856: 7854: 7851: 7850: 7848: 7844: 7838: 7835: 7833: 7830: 7828: 7825: 7823: 7820: 7818: 7815: 7814: 7812: 7809: 7805: 7799: 7796: 7794: 7791: 7789: 7786: 7784: 7781: 7779: 7776: 7774: 7771: 7769: 7766: 7764: 7761: 7759: 7756: 7754: 7751: 7749: 7746: 7744: 7741: 7739: 7736: 7735: 7733: 7731: 7730:ISO/IEC 10646 7727: 7723: 7717: 7714: 7712: 7709: 7707: 7704: 7702: 7699: 7697: 7694: 7692: 7689: 7687: 7684: 7682: 7679: 7677: 7674: 7672: 7669: 7667: 7664: 7662: 7659: 7657: 7654: 7652: 7649: 7647: 7644: 7642: 7639: 7637: 7634: 7632: 7629: 7627: 7624: 7622: 7619: 7617: 7614: 7612: 7609: 7607: 7604: 7602: 7599: 7597: 7594: 7592: 7589: 7587: 7584: 7582: 7579: 7577: 7574: 7572: 7569: 7567: 7563: 7560: 7558: 7555: 7553: 7550: 7548: 7547:Compucolor II 7545: 7543: 7540: 7538: 7535: 7533: 7530: 7528: 7525: 7523: 7520: 7518: 7515: 7513: 7510: 7508: 7505: 7503: 7502:Acorn RISC OS 7500: 7498: 7495: 7493: 7490: 7488: 7485: 7483: 7480: 7478: 7475: 7473: 7470: 7468: 7465: 7464: 7462: 7458: 7452: 7449: 7447: 7444: 7442: 7439: 7437: 7434: 7432: 7431:8-bit Turkish 7429: 7427: 7424: 7420: 7417: 7415: 7412: 7410: 7407: 7405: 7402: 7400: 7397: 7395: 7392: 7390: 7387: 7385: 7382: 7380: 7377: 7375: 7372: 7371: 7370: 7367: 7365: 7362: 7361: 7359: 7356: 7352: 7348: 7342: 7339: 7337: 7334: 7333: 7331: 7328: 7324: 7318: 7315: 7313: 7310: 7308: 7305: 7303: 7300: 7298: 7295: 7293: 7290: 7288: 7285: 7283: 7280: 7278: 7275: 7273: 7270: 7268: 7265: 7263: 7260: 7258: 7255: 7253: 7250: 7248: 7245: 7243: 7240: 7238: 7235: 7232: 7228: 7225: 7223: 7220: 7218: 7215: 7214: 7212: 7210: 7206: 7200: 7197: 7195: 7192: 7190: 7187: 7185: 7182: 7180: 7177: 7175: 7172: 7170: 7167: 7165: 7162: 7160: 7157: 7155: 7152: 7150: 7147: 7145: 7142: 7140: 7137: 7135: 7132: 7130: 7127: 7125: 7122: 7120: 7117: 7115: 7112: 7110: 7107: 7105: 7102: 7100: 7097: 7096: 7094: 7092: 7088: 7082: 7079: 7077: 7074: 7072: 7069: 7067: 7064: 7062: 7059: 7057: 7054: 7052: 7049: 7047: 7044: 7042: 7039: 7037: 7034: 7032: 7029: 7027: 7024: 7022: 7019: 7017: 7014: 7012: 7009: 7007: 7004: 7002: 6999: 6997: 6994: 6992: 6989: 6987: 6984: 6982: 6979: 6977: 6974: 6972: 6969: 6967: 6964: 6962: 6959: 6957: 6954: 6952: 6949: 6947: 6944: 6942: 6939: 6937: 6934: 6932: 6929: 6927: 6924: 6922: 6919: 6917: 6914: 6912: 6909: 6907: 6904: 6902: 6899: 6897: 6894: 6892: 6889: 6887: 6884: 6882: 6879: 6877: 6874: 6872: 6869: 6867: 6864: 6862: 6859: 6857: 6854: 6852: 6849: 6847: 6844: 6842: 6839: 6837: 6834: 6832: 6829: 6827: 6824: 6822: 6819: 6817: 6814: 6812: 6809: 6807: 6804: 6802: 6799: 6797: 6794: 6792: 6789: 6787: 6784: 6782: 6779: 6778: 6776: 6772: 6766: 6763: 6761: 6758: 6756: 6753: 6751: 6748: 6746: 6743: 6741: 6738: 6736: 6733: 6731: 6728: 6726: 6723: 6721: 6718: 6716: 6713: 6711: 6708: 6706: 6703: 6701: 6698: 6696: 6693: 6691: 6688: 6686: 6683: 6681: 6678: 6676: 6673: 6671: 6668: 6666: 6663: 6661: 6658: 6656: 6653: 6651: 6648: 6646: 6643: 6641: 6638: 6636: 6633: 6631: 6628: 6626: 6623: 6622: 6620: 6616: 6611: 6605: 6602: 6600: 6599:ISO/IEC 10367 6597: 6595: 6592: 6591: 6589: 6587: 6583: 6577: 6574: 6572: 6569: 6567: 6564: 6562: 6559: 6557: 6554: 6552: 6549: 6547: 6544: 6542: 6539: 6537: 6534: 6532: 6529: 6527: 6524: 6522: 6519: 6517: 6514: 6512: 6509: 6507: 6504: 6502: 6499: 6497: 6494: 6492: 6489: 6487: 6484: 6482: 6479: 6477: 6474: 6472: 6469: 6467: 6464: 6462: 6459: 6457: 6454: 6452: 6449: 6447: 6444: 6442: 6439: 6437: 6434: 6432: 6429: 6427: 6424: 6423: 6421: 6417: 6411: 6408: 6406: 6403: 6401: 6398: 6396: 6393: 6391: 6388: 6386: 6383: 6379: 6376: 6374: 6371: 6370: 6369: 6366: 6365: 6363: 6359: 6351: 6348: 6346: 6343: 6341: 6338: 6336: 6333: 6332: 6330: 6326: 6323: 6321: 6318: 6317: 6315: 6311: 6308: 6307: 6305: 6301: 6298: 6296: 6293: 6291: 6288: 6286: 6283: 6281: 6278: 6276: 6273: 6271: 6268: 6266: 6263: 6261: 6258: 6256: 6253: 6251: 6250:-5 (Cyrillic) 6248: 6246: 6243: 6241: 6238: 6236: 6233: 6231: 6228: 6227: 6225: 6224: 6222: 6220: 6216: 6210: 6207: 6201: 6198: 6196: 6193: 6192: 6190: 6188: 6185: 6183: 6180: 6178: 6175: 6174: 6173: 6169: 6165: 6162: 6160: 6157: 6153: 6150: 6149: 6148: 6145: 6143: 6140: 6136: 6133: 6129: 6126: 6124: 6121: 6119: 6116: 6114: 6111: 6110: 6109: 6106: 6104: 6101: 6100: 6099: 6096: 6095: 6093: 6089: 6085: 6078: 6073: 6071: 6066: 6064: 6059: 6058: 6055: 6039: 6030: 6028: 6019: 6018: 6015: 6009: 6006: 6004: 6001: 5997: 5994: 5993: 5992: 5989: 5987: 5984: 5982: 5979: 5977: 5974: 5973: 5971: 5967: 5961: 5958: 5956: 5953: 5952: 5950: 5946: 5940: 5937: 5935: 5932: 5930: 5927: 5925: 5922: 5920: 5919:Tulu Tigalari 5917: 5915: 5912: 5910: 5907: 5905: 5902: 5900: 5897: 5895: 5894:Sylheti Nagri 5892: 5890: 5887: 5885: 5884:South Arabian 5882: 5880: 5877: 5875: 5872: 5870: 5867: 5865: 5862: 5860: 5857: 5855: 5852: 5850: 5847: 5845: 5842: 5840: 5837: 5835: 5832: 5830: 5827: 5825: 5822: 5820: 5817: 5815: 5812: 5810: 5809:Old Hungarian 5807: 5805: 5802: 5800: 5797: 5795: 5792: 5790: 5787: 5785: 5782: 5780: 5777: 5775: 5772: 5770: 5767: 5765: 5762: 5760: 5757: 5755: 5752: 5750: 5747: 5745: 5742: 5740: 5737: 5735: 5732: 5730: 5727: 5724: 5721: 5719: 5716: 5714: 5711: 5709: 5706: 5704: 5701: 5699: 5696: 5694: 5691: 5689: 5686: 5684: 5681: 5679: 5676: 5674: 5671: 5669: 5666: 5664: 5661: 5659: 5656: 5654: 5651: 5649: 5646: 5644: 5641: 5639: 5636: 5634: 5631: 5629: 5626: 5624: 5621: 5619: 5616: 5614: 5611: 5609: 5606: 5604: 5601: 5599: 5596: 5594: 5591: 5589: 5586: 5584: 5581: 5579: 5576: 5574: 5571: 5569: 5566: 5565: 5563: 5557: 5551: 5548: 5546: 5543: 5541: 5538: 5536: 5533: 5531: 5528: 5526: 5523: 5521: 5518: 5516: 5513: 5511: 5508: 5506: 5503: 5501: 5498: 5496: 5493: 5491: 5488: 5486: 5483: 5481: 5478: 5476: 5473: 5471: 5468: 5466: 5463: 5461: 5458: 5456: 5453: 5451: 5448: 5446: 5443: 5441: 5438: 5436: 5433: 5431: 5428: 5426: 5423: 5421: 5418: 5416: 5413: 5411: 5408: 5406: 5403: 5401: 5398: 5396: 5393: 5391: 5388: 5386: 5383: 5381: 5378: 5376: 5373: 5371: 5368: 5366: 5363: 5361: 5358: 5356: 5353: 5351: 5348: 5346: 5343: 5341: 5338: 5336: 5333: 5331: 5330:Mende Kikakui 5328: 5326: 5325:Masaram Gondi 5323: 5321: 5318: 5316: 5313: 5311: 5310:Lisu (Fraser) 5308: 5306: 5303: 5301: 5298: 5296: 5293: 5291: 5288: 5286: 5283: 5281: 5278: 5276: 5273: 5271: 5268: 5266: 5263: 5261: 5258: 5256: 5253: 5251: 5248: 5246: 5243: 5241: 5238: 5236: 5233: 5231: 5228: 5226: 5223: 5221: 5218: 5216: 5213: 5211: 5210:Gunjala Gondi 5208: 5206: 5203: 5201: 5198: 5196: 5193: 5191: 5188: 5186: 5183: 5181: 5178: 5176: 5173: 5171: 5168: 5166: 5163: 5161: 5158: 5156: 5153: 5151: 5148: 5146: 5143: 5141: 5138: 5136: 5133: 5131: 5128: 5126: 5123: 5121: 5118: 5116: 5113: 5111: 5108: 5106: 5103: 5101: 5098: 5096: 5093: 5091: 5088: 5087: 5085: 5081: 5075: 5072: 5070: 5067: 5065: 5062: 5060: 5057: 5055: 5052: 5051: 5049: 5047: 5041: 5036: 5031: 5027: 5021: 5018: 5016: 5013: 5011: 5008: 5006: 5003: 5001: 4998: 4996: 4993: 4992: 4990: 4986: 4980: 4977: 4975: 4972: 4970: 4967: 4965: 4962: 4960: 4957: 4956: 4954: 4950: 4944: 4941: 4939: 4936: 4932: 4929: 4927: 4924: 4923: 4922: 4919: 4917: 4914: 4912: 4909: 4907: 4904: 4903: 4901: 4897: 4891: 4888: 4886: 4883: 4881: 4878: 4876: 4873: 4869: 4866: 4865: 4864: 4861: 4859: 4856: 4854: 4851: 4849: 4846: 4844: 4841: 4839: 4836: 4835: 4833: 4827: 4817: 4814: 4812: 4809: 4807: 4804: 4802: 4799: 4797: 4794: 4792: 4789: 4787: 4784: 4782: 4779: 4777: 4774: 4772: 4769: 4768: 4766: 4764: 4760: 4754: 4751: 4749: 4746: 4744: 4741: 4737: 4736:ISO/IEC 14651 4734: 4733: 4732: 4729: 4727: 4724: 4723: 4721: 4717: 4714: 4710: 4700: 4697: 4695: 4692: 4690: 4687: 4685: 4682: 4680: 4677: 4675: 4672: 4670: 4667: 4665: 4662: 4660: 4657: 4655: 4652: 4650: 4647: 4646: 4644: 4640: 4634: 4631: 4629: 4626: 4624: 4621: 4619: 4616: 4614: 4611: 4609: 4606: 4604: 4600: 4597: 4595: 4592: 4590: 4587: 4586: 4584: 4582: 4578: 4575: 4571: 4565: 4562: 4560: 4557: 4555: 4552: 4550: 4547: 4545: 4542: 4538: 4535: 4534: 4533: 4530: 4529: 4527: 4525: 4521: 4515: 4512: 4510: 4507: 4505: 4502: 4501: 4499: 4495: 4491: 4484: 4479: 4477: 4472: 4470: 4465: 4464: 4461: 4444: 4440: 4436: 4427: 4424: 4419: 4414: 4411: 4408: 4407: 4403: 4398: 4387: 4380: 4376: 4371: 4345: 4340: 4337: 4295: 4290: 4287: 4242: 4239: 4194: 4191: 4187: 4160: 4147: 4140: 4134: 4131: 4129: 4125: 4121: 4117: 4113: 4109: 4105: 4101: 4097: 4093: 4089: 4085: 4081: 4077: 4073: 4069: 4065: 4061: 4057: 4053: 4049: 4044: 4043: 4040: 4037: 4035: 4032: 4029: 4026: 4024: 4021: 4019: 4016: 4015: 4010: 4006: 4002: 3995: 3990: 3988: 3983: 3981: 3976: 3975: 3972: 3961: 3957: 3950: 3942: 3938: 3932: 3924: 3920: 3914: 3906: 3902: 3896: 3888: 3884: 3878: 3876: 3868: 3867:4-06-208718-9 3864: 3858: 3850: 3846: 3839: 3830: 3822: 3816: 3808: 3802: 3791: 3784: 3776: 3770: 3762: 3756: 3754: 3745: 3741: 3735: 3733: 3724: 3720: 3714: 3707: 3702: 3688:on 2013-12-16 3687: 3683: 3682: 3677: 3671: 3663: 3659: 3653: 3645: 3638: 3630: 3626: 3620: 3612: 3608: 3602: 3594: 3590: 3584: 3582: 3573: 3569: 3563: 3561: 3553:. 2023-09-01. 3552: 3548: 3542: 3538: 3521: 3514: 3510: 3506: 3500: 3496: 3486: 3483: 3480: 3477: 3474: 3471: 3468: 3465: 3464: 3458: 3456: 3452: 3448: 3443: 3435: 3430: 3420: 3413: 3410: 3408:(1F200–1F2FF) 3407: 3404: 3401: 3398: 3396:(2F800–2FA1F) 3395: 3392: 3389: 3386: 3383: 3380: 3377: 3374: 3373: 3372: 3366: 3363: 3360: 3357: 3354: 3351: 3348: 3345: 3344: 3343: 3337: 3334: 3332:(2EBF0–2EE5F) 3331: 3328: 3326:(31350–323AF) 3325: 3322: 3320:(30000–3134F) 3319: 3316: 3314:(2CEB0–2EBEF) 3313: 3310: 3308:(2B820–2CEAF) 3307: 3304: 3302:(2B740–2B81F) 3301: 3298: 3296:(2A700–2B73F) 3295: 3292: 3290:(20000–2A6DF) 3289: 3286: 3283: 3280: 3277: 3274: 3273: 3272: 3268: 3258: 3253: 3238: 3234: 3229: 3225: 3223: 3210: 3197: 3184: 3183: 3179: 3177: 3164: 3151: 3138: 3137: 3133: 3120: 3118: 3105: 3092: 3091: 3087: 3085: 3083: 3070: 3057: 3056: 3052: 3050: 3048: 3035: 3022: 3021: 3017: 3015: 3002: 2989: 2976: 2975: 2971: 2969: 2967: 2954: 2941: 2940: 2936: 2934: 2932: 2919: 2906: 2905: 2901: 2899: 2897: 2884: 2871: 2870: 2866: 2864: 2862: 2849: 2836: 2835: 2831: 2829: 2816: 2803: 2790: 2789: 2785: 2772: 2759: 2746: 2733: 2732: 2728: 2715: 2702: 2689: 2676: 2675: 2671: 2669: 2667: 2654: 2641: 2640: 2636: 2633: 2630: 2627: 2624: 2623: 2620: 2588: 2539: 2517: 2513: 2474: 2467:(U+7D05) and 2443:(U+7D05) and 2417: 2388: 2384: 2351: 2314:(U+5167) and 2305: 2296: 2273: 2272: 2268: 2245: 2244: 2240: 2217: 2216: 2212: 2189: 2188: 2184: 2161: 2160: 2156: 2133: 2132: 2128: 2105: 2104: 2100: 2077: 2076: 2072: 2049: 2048: 2044: 2021: 2020: 2016: 1993: 1992: 1988: 1965: 1964: 1960: 1937: 1936: 1932: 1909: 1908: 1904: 1881: 1880: 1876: 1853: 1852: 1848: 1825: 1824: 1820: 1797: 1796: 1792: 1769: 1768: 1764: 1741: 1740: 1736: 1713: 1712: 1708: 1685: 1684: 1680: 1657: 1656: 1652: 1629: 1628: 1624: 1601: 1600: 1597:exempt/spare 1596: 1573: 1572: 1568: 1545: 1544: 1540: 1516: 1512: 1511: 1505: 1500: 1495: 1490: 1485: 1480: 1479: 1472: 1469: 1466: 1462:(traditional, 1459: 1454: 1452:(simplified) 1449: 1445: 1442: 1440: 1436: 1432: 1428: 1424: 1420: 1416: 1412: 1398: 1394: 1387:(U+76F4) and 1375:(U+4FA3) and 1355: 1351: 1332:(U+5104) and 1311: 1296: 1280:ANGSTROM SIGN 1249: 1245: 1236:(U+FA23) and 1228: 1227:was written. 1185:(U+500B) and 1148:(U+4E1F) and 1140: 1138: 1132: 1128: 1101: 1097: 1086: 1083: 1082:Masayoshi Son 1079: 1075: 1070: 1066: 1062: 1057: 1056:) and so on. 1055: 1051: 1047: 1043: 1039: 1035: 1031: 1027: 1023: 1019: 1015: 1011: 1007: 1003: 999: 995: 991: 987: 984: 980: 976: 969: 965: 962: 960: 957: 954: 951: 950: 949: 944: 943: 939: 937: 934: 932: 929: 927: 924: 923: 922: 914: 899: 895: 891: 887: 883: 879: 874: 870: 844: 840: 835: 833: 828: 824: 820: 818: 815: 811: 808: 804: 801: 795: 786: 784: 779: 775: 773: 769: 765: 759: 757: 753: 749: 745: 741: 705: 699: 697: 691: 689: 680: 675: 670: 660: 658: 654: 644: 641:In 1993, the 639: 637: 631: 628: 622: 616: 609: 604: 601: 599: 595: 590: 588: 577: 574: 566: 563:November 2020 556: 552: 548: 542: 539:This section 537: 528: 527: 518: 515: 507: 497: 493: 487: 486: 481:This section 479: 475: 470: 469: 461: 454: 445: 441: 436: 434: 430: 426: 422: 418: 414: 409: 407: 403: 399: 395: 391: 387: 383: 379: 375: 371: 367: 363: 359: 355: 351: 343: 339: 335: 333: 323: 319: 314: 304: 301: 286: 283: 275: 272:February 2024 265: 261: 257: 251: 250: 245:This article 243: 234: 233: 224: 221: 213: 210:December 2020 203: 199: 195: 189: 186:This article 184: 175: 174: 165: 162: 154: 151:February 2010 143: 140: 136: 133: 129: 126: 122: 119: 115: 112: –  111: 107: 106:Find sources: 100: 96: 90: 89: 84:This article 82: 78: 73: 72: 67: 65: 58: 57: 52: 51: 46: 41: 32: 31: 26: 22: 8044: 7997:ISO/IEC 6429 7954:Stanford/ITS 7941: 7875:ARIB STD-B24 7656:Sega SC-3000 7557:DEC RADIX 50 6594:ISO/IEC 8859 6586:ISO/IEC 2022 6331:Adaptations 6290:-14 (Celtic) 6285:-13 (Baltic) 6275:-10 (Nordic) 6270:-9 (Turkish) 6219:ISO/IEC 8859 5774:Meetei Mayek 5725:(Chorasmian) 5628:Cypro-Minoan 5405:Pahawh Hmong 5220:Gurung Khema 5019: 4969:ISO/IEC 8859 4811:UTF-32/UCS-4 4806:UTF-16/UCS-2 4613:Variant form 4443: 4342: 4292: 4244: 4196: 4136: 4046: 4033: 3959: 3949: 3940: 3931: 3922: 3913: 3904: 3895: 3886: 3869:)pp. 285-294 3857: 3848: 3838: 3829: 3801: 3790:"Unicode 88" 3783: 3769: 3743: 3722: 3713: 3701: 3690:. Retrieved 3686:the original 3679: 3670: 3661: 3652: 3637: 3628: 3619: 3610: 3601: 3592: 3571: 3568:"Unihan.zip" 3541: 3520: 3511:and Chinese 3499: 3473:Sinicization 3444: 3441: 3432: 3417: 3370: 3341: 3270: 3255: 3232: 3226:to research 2628:Traditional 2545: 2518: 2514: 2475: 2419:The radical 2418: 2389: 2385: 2352: 2344:(U+5165) or 2332:(U+5165) or 2308:variants of 2306: 2302: 1404: 1395: 1356: 1352: 1312: 1297: 1229: 1141: 1133: 1129: 1103:versions of 1092: 1058: 972: 953:ISO/IEC 2022 947: 941: 920: 917:Alternatives 875: 871: 836: 829: 825: 821: 807:Greek letter 800:Latin letter 796: 792: 780: 776: 760: 709: 695: 693: 690:) cautions: 684: 640: 632: 611: 606: 602: 591: 584: 569: 560: 547:spinning off 540: 510: 501: 490:Please help 485:verification 482: 437: 432: 429:orthographic 410: 349: 348: 329: 296: 278: 269: 246: 216: 207: 194:spinning off 187: 157: 148: 138: 131: 124: 117: 105: 93:Please help 88:verification 85: 61: 54: 48: 47:Please help 44: 7716:ZX Spectrum 7671:Sinclair QL 7507:Amstrad CPC 7426:8-bit Greek 7353:terminals ( 7066:Iran System 6618:("scripts") 6265:-8 (Hebrew) 6255:-6 (Arabic) 6152:ISO/IEC 646 5960:SignWriting 5829:Old Sogdian 5799:Nandinagari 5723:Khwarezmian 5633:Dives Akuru 5559:Ancient and 5545:Warang Citi 5410:Pau Cin Hau 5365:New Tai Lue 5360:Nag Mundari 5335:Medefaidrin 5044:Common and 4853:Equivalence 4831:code points 4829:On pairs of 4743:Equivalence 4618:Word joiner 4608:Soft hyphen 4524:Code points 4335:Not unified 4333:Not unified 4331:Not unified 4327:Not unified 4325:Not unified 4323:Not unified 4321:Not unified 4319:Not unified 4317:Not unified 4315:Not unified 4104:CJK Strokes 4027:Chart range 3455:MIT License 3414:(2F00–2FDF) 3402:(3200–32FF) 3390:(F900–FAFF) 3384:(FE30–FE4F) 3378:(3300–33FF) 3367:(2FF0–2FFF) 3361:(3000–303F) 3355:(31C0–31EF) 3353:CJK Strokes 3349:(2E80–2EFF) 3284:(3400–4DBF) 2832:give birth 2625:Simplified 1737:knife edge 1473:Vietnamese 1464:Hong Kong) 1447:Code point 1419:traditional 1010:Common Lisp 988:, and many 772:dotless "ı" 748:punctuation 504:August 2007 444:Traditional 440:code points 8094:Categories 8002:JIS X 0211 7910:ISO-IR-169 7763:UTF-EBCDIC 7329:code pages 7056:CSX+ Indic 6660:Devanagari 6615:Code pages 6536:LST 1590-4 6506:JIS X 0213 6501:JIS X 0212 6496:JIS X 0208 6491:JIS X 0201 6456:GOST 10859 6378:CCCII/EACC 6280:-11 (Thai) 6260:-7 (Greek) 6195:background 6118:Wabun/Kana 5854:Phoenician 5839:Old Uyghur 5834:Old Turkic 5819:Old Permic 5814:Old Italic 5764:Manichaean 5658:Glagolitic 5435:Saurashtra 5180:Devanagari 5059:Diacritics 4816:UTF-EBCDIC 4719:Algorithms 4712:Processing 4649:Characters 4573:Characters 4377:, Common, 4030:Characters 4018:Block name 3692:2023-09-30 3533:References 2867:companion 2729:two, both 2213:edge/horn 2101:empty/air 1653:all/total 1492:zh-Hant-HK 1431:Vietnamese 1415:simplified 1393:(U+96C7). 1137:z-variants 1098:Japanese, 857:) or two ( 756:apostrophe 752:diacritics 551:relocating 460:(U+4E2A). 453:Simplified 442:, such as 421:allographs 402:Vietnamese 374:characters 318:code point 256:improve it 198:relocating 121:newspapers 50:improve it 8055:MICR code 7890:IEC-P27-1 7868:ISO-IR-68 7773:DIN 91379 7651:SAM Coupé 7586:GSM 03.38 7576:Galaksija 7071:Kamenický 7051:CSX Indic 6760:Ukrainian 6546:Shift JIS 6526:KS X 1002 6521:KS X 1001 6446:DIN 66003 6441:CNS 11643 6209:Transcode 6187:ITU T.101 6113:Non-Latin 5849:ʼPhags-pa 5844:Palmyrene 5794:Nabataean 5718:Khudawadi 5703:Kharosthi 5618:Cuneiform 5593:Bhaiksuki 5588:Bassa Vah 5455:Sundanese 5430:Samaritan 5345:Mongolian 5320:Malayalam 5285:Kirat Rai 4995:Anomalies 4979:ISO 15924 4974:DIN 91379 4875:Z-variant 4858:Homoglyph 4731:Collation 4379:Inherited 3525:literate. 3479:Z-variant 3180:tortoise 2972:to leave 2631:Japanese 2587:shinjitai 2377:with the 1467:Japanese 1100:Shinjitai 1042:Uniscribe 990:Unix-like 890:Singapore 669:Allograph 594:ideograms 413:typefaces 260:verifying 56:talk page 8060:Mojibake 7915:ISO 2033 7880:Fieldata 7858:ASMO 449 7768:GB 18030 7728: / 7676:Teletext 7666:Sharp MZ 7596:HP FOCAL 7591:HP Roman 7522:Atari ST 7512:Apple II 7046:CS Indic 6740:Romanian 6715:Keyboard 6695:Gurmukhi 6690:Gujarati 6680:Georgian 6655:Cyrillic 6650:Croatian 6625:Armenian 6531:LST 1564 6516:KPS 9566 6476:GB 18030 6471:GB 12052 6466:GB 12345 6451:ELOT 927 6385:ISO 5426 6345:Estonian 6182:ITU T.61 6172:Teletext 6168:Videotex 6142:Fieldata 6128:Cyrillic 5981:Currency 5955:Duployan 5929:Vithkuqi 5924:Ugaritic 5779:Meroitic 5749:Mahajani 5734:Linear B 5729:Linear A 5520:Tifinagh 5485:Tai Viet 5480:Tai Tham 5470:Tagbanwa 5385:Ol Chiki 5275:Kayah Li 5270:Katakana 5255:Javanese 5250:Hiragana 5240:Hanunuoo 5215:Gurmukhi 5205:Gujarati 5195:Georgian 5170:Cyrillic 5160:Cherokee 5125:Bopomofo 5105:Balinese 5100:Armenian 4964:GB 18030 4781:Punycode 4669:Numerals 4601: / 4514:Versions 4399:, Common 4397:Hiragana 4388:, Common 4386:Katakana 4384:Hangul, 3815:cite web 3467:GB 18030 3461:See also 2902:to cash 2786:to ride 2672:to lose 2637:English 1961:picture 1821:feeling 1793:outside 1476:English 1460:Chinese 1455:Chinese 1450:Chinese 1423:Japanese 1277:Å 1096:Kyūjitai 1054:OpenType 1050:TrueType 1034:Graphite 894:Malaysia 843:typeface 732:◌̊ 701:—  696:grapheme 657:20985671 386:Japanese 356:and the 8115:Unicode 7949:SEASCII 7943:Mojikyō 7930:KOI8-RU 7853:ABICOMP 7726:Unicode 7636:PETSCII 7626:NEC APC 7562:DEC MCS 7517:ATASCII 7414:Swedish 7399:Finnish 7384:Spanish 7076:Mazovia 7041:ABICOMP 6750:Turkish 6705:Iceland 6613:Mac OS 6556:TIS-620 6461:GB 2312 6436:BraSCII 6426:ArmSCII 6164:Teletex 6123:Chinese 5889:Soyombo 5879:Sogdian 5874:Siddham 5869:Sharada 5789:Multani 5769:Marchen 5759:Mandaic 5754:Makasar 5668:Grantha 5653:Elymaic 5648:Elbasan 5623:Cypriot 5583:Avestan 5525:Tirhuta 5515:Tibetan 5460:Sunuwar 5445:Sinhala 5440:Shavian 5420:Ranjana 5400:Osmanya 5390:Ol Onal 5315:Lontara 5265:Kannada 5175:Deseret 5140:Burmese 5130:Braille 5120:Bengali 5074:Numbers 5035:Scripts 4684:Symbols 4674:Scripts 4497:Unicode 4490:Unicode 4313:Unified 4311:Unified 4309:Unified 4307:Unified 4305:Unified 4303:Unified 4301:Unified 4299:Unified 4297:Unified 4294:Unified 4005:Unicode 3233:Sources 3088:hungry 2937:inside 2506:, i.e. 2274:U+9AA8 2269:employ 2246:U+96C7 2218:U+9053 2190:U+89D2 2162:U+8525 2134:U+8349 2106:U+8005 2078:U+7A7A 2050:U+795E 2022:U+793a 1994:U+771F 1966:U+76F4 1938:U+753B 1910:U+6D77 1882:U+6B21 1854:U+62B5 1849:talent 1826:U+624D 1798:U+60C5 1770:U+5916 1742:U+5316 1714:U+5203 1686:U+5177 1658:U+5173 1630:U+5168 1602:U+5165 1574:U+514D 1546:U+4EE4 1507:vi-Hani 1487:zh-Hant 1482:zh-Hans 1470:Korean 1435:browser 1411:Chinese 1061:(B)TRON 975:Unicode 942:Mojikyō 812:or the 768:the dot 740:sememes 406:chữ Hán 378:Chinese 364:of the 354:Unicode 254:Please 135:scholar 7959:Symbol 7935:KOI8-U 7925:KOI8-R 7793:TACE16 7783:CESU-8 7778:BOCU-1 7758:UTF-32 7753:UTF-16 7696:WISCII 7686:TRS-80 7606:SQUOZE 7601:HP RPL 7441:Hebrew 7436:SI 960 7404:French 7327:EBCDIC 7217:CER-GS 6700:Hebrew 6675:Gaelic 6640:Celtic 6630:Arabic 6576:YUSCII 6566:VISCII 6551:SI 960 6541:PASCII 6390:5426-2 6368:MARC-8 6103:Needle 6036:  6025:  5934:Yezidi 5914:Todhri 5909:Tangut 5744:Lydian 5739:Lycian 5713:Khojki 5693:Kaithi 5673:Hatran 5663:Gothic 5613:Coptic 5603:Carian 5598:Brāhmī 5540:Wancho 5505:Thaana 5500:Telugu 5495:Tangsa 5475:Tai Le 5465:Syriac 5425:Rejang 5300:Lepcha 5245:Hebrew 5225:Hangul 5150:Chakma 5095:Arabic 5069:Spaces 4776:CESU-8 4771:BOCU-1 4679:Spaces 4428:  4425:  4421:99,737 4412:  4409:Totals 4394:Common 4382:Common 4375:Hangul 4370:Common 4249:42,720 4245:20,992 3865:  3513:CEDICT 3212:U+7814 3199:U+784F 3186:U+7814 3166:U+4E80 3153:U+9F9C 3140:U+9F9F 3122:U+9AD9 3107:U+9AD8 3094:U+9AD8 3072:U+9913 3059:U+997F 3053:taxes 3037:U+7A05 3024:U+7A0E 3004:U+7985 2991:U+79AA 2978:U+7985 2956:U+5225 2943:U+522B 2921:U+5167 2908:U+5185 2886:U+514C 2873:U+5151 2851:U+4FB6 2838:U+4FA3 2818:U+7523 2805:U+7522 2792:U+4EA7 2774:U+6909 2761:U+4E57 2748:U+4E58 2735:U+4E58 2717:U+34B3 2704:U+4E21 2691:U+5169 2678:U+4E24 2656:U+4E1F 2643:U+4E22 2185:onion 2157:grass 1625:enter 1433:. The 1427:Korean 1290:Å 1287: 1285:U+00C5 1274: 1272:U+212B 1044:, and 1038:Scribe 998:Python 892:, and 839:glyphs 728: 726:U+030A 718:a 715: 713:U+0061 655:  433:Unihan 425:glyphs 400:) and 394:Korean 137:  130:  123:  116:  108:  8030:CCSID 7903:8-bit 7898:7-bit 7894:INIS 7748:UTF-8 7743:UTF-7 7738:UTF-1 7616:LMBCS 7552:CP/M+ 7394:Dutch 7379:Swiss 7061:CWI-2 6765:VT100 6735:Roman 6730:Ogham 6710:Inuit 6685:Greek 6571:VSCII 6561:TSCII 6511:KOI-7 6486:ISCII 6481:HKSCS 6373:ANSEL 6335:Welsh 6159:BCDIC 6147:ASCII 6108:Morse 6008:Emoji 5904:Takri 5864:Runic 5804:Ogham 5638:Dogra 5490:Tamil 5395:Osage 5370:Nüshu 5305:Limbu 5295:Latin 5280:Khmer 5260:Kanji 5235:Hanja 5200:Greek 5190:Geʽez 5185:Garay 5135:Buhid 5115:Batak 5110:Bamum 5090:Adlam 4938:Input 4916:Fonts 4911:Email 4899:Usage 4801:UTF-8 4796:UTF-7 4791:UTF-1 4642:Lists 4559:Plane 4532:Block 4373:Han, 4261:4,192 4259:4,939 4257:7,473 4255:5,762 4251:4,154 4247:6,592 4189:2 SIP 4182:0 BMP 4180:0 BMP 4178:0 BMP 4176:0 BMP 4174:0 BMP 4172:0 BMP 4170:0 BMP 4168:0 BMP 4166:0 BMP 4164:2 SIP 4162:3 TIP 4155:2 SIP 4153:2 SIP 4151:2 SIP 4149:2 SIP 4142:0 BMP 4023:Plane 3919:"URO" 3793:(PDF) 3509:EDICT 3491:Notes 3134:high 2297:bone 2045:show 2017:true 1709:tool 1439:glyph 1429:, or 1046:ATSUI 1030:Pango 986:macOS 983:Apple 968:HKSCS 880:with 744:ASCII 624:, or 398:hanja 390:kanji 382:hanzi 142:JSTOR 128:books 7964:TRON 7817:Cork 7788:SCSU 7711:ZX81 7706:ZX80 7701:XCCS 7631:NeXT 7611:LICS 7566:NRCS 7527:BICS 7497:1058 7492:1057 7487:1056 7482:1055 7477:1054 7472:1053 7467:1052 7341:DKOI 7297:1270 7292:1258 7287:1257 7282:1256 7277:1255 7272:1254 7267:1253 7262:1252 7257:1251 7252:1250 7242:1169 7199:1133 7194:1124 7189:1046 7184:1019 7179:1018 7174:1017 7169:1016 7164:1015 7159:1014 7154:1013 7149:1012 7144:1010 7139:1009 7134:1008 7129:1006 7036:3846 7031:1127 7026:1118 7021:1117 7016:1116 7011:1115 7006:1098 7001:1044 6996:1043 6991:1042 6986:1040 6981:1034 6745:Sámi 6431:Big5 6410:6862 6405:6438 6400:5428 6395:5427 6325:Sámi 6200:sets 6166:and 5784:Modi 5698:Kawi 5568:Ahom 5530:Toto 5510:Thai 5380:Odia 5355:N'Ko 5155:Cham 4921:HTML 4868:list 4786:SCSU 4537:List 3863:ISBN 3821:link 3451:LGPL 2559:lang 2548:lang 2073:god 1933:sea 1541:now 1517:4ECA 1407:lang 1363:and 1344:and 1320:and 1172:and 1160:and 1052:and 1006:Java 994:Perl 964:GCCS 936:TRON 898:Big5 750:and 653:JPNO 114:news 7920:KOI 7837:OT1 7832:OMS 7827:OML 7822:LY1 7808:TeX 7621:MSX 7581:GEM 7537:CDC 7355:VTx 7351:DEC 7237:950 7231:GBK 7227:936 7222:932 7124:922 7119:921 7114:915 7109:912 7104:896 7099:895 7081:MIK 6976:951 6971:950 6966:949 6961:942 6956:936 6951:932 6946:904 6941:903 6936:899 6931:897 6926:869 6921:868 6916:867 6911:866 6906:865 6901:864 6896:863 6891:862 6886:861 6881:860 6876:859 6871:858 6866:857 6861:856 6856:855 6851:853 6846:852 6841:851 6836:850 6831:778 6826:777 6821:776 6816:775 6811:773 6806:770 6801:737 6796:720 6791:708 6786:668 6781:437 5535:Vai 5350:Mru 5290:Lao 4589:BOM 4401:Han 4392:Han 4367:Han 4365:Han 4363:Han 4361:Han 4359:Han 4357:Han 4355:Han 4353:Han 4351:Han 4349:Han 4347:Han 4344:Han 4285:542 4279:472 4277:256 4275:255 4267:214 4265:115 4263:622 4253:222 4186:SMP 4159:TIP 4146:SIP 4139:BMP 4003:in 3681:IBM 2592:氣/気 1421:), 1022:C++ 1014:APL 549:or 494:by 408:). 392:), 384:), 370:CJK 258:by 196:or 97:by 8096:: 7885:HZ 5550:Yi 4416:21 4283:64 4281:32 4273:39 4271:64 4269:16 4184:1 4157:3 4144:2 4137:0 3958:. 3939:. 3921:. 3903:. 3885:. 3874:^ 3847:. 3817:}} 3813:{{ 3752:^ 3742:. 3731:^ 3721:. 3678:. 3660:. 3627:. 3609:. 3591:. 3580:^ 3570:. 3559:^ 3549:. 3457:. 3235:: 2607:, 2289:骨 2286:骨 2283:骨 2280:骨 2277:骨 2261:雇 2258:雇 2255:雇 2252:雇 2249:雇 2233:道 2230:道 2227:道 2224:道 2221:道 2205:角 2202:角 2199:角 2196:角 2193:角 2177:蔥 2174:蔥 2171:蔥 2168:蔥 2165:蔥 2149:草 2146:草 2143:草 2140:草 2137:草 2121:者 2118:者 2115:者 2112:者 2109:者 2093:空 2090:空 2087:空 2084:空 2081:空 2065:神 2062:神 2059:神 2056:神 2053:神 2037:示 2034:示 2031:示 2028:示 2025:示 2009:眞 2006:真 2003:真 2000:真 1997:真 1981:直 1978:直 1975:直 1972:直 1969:直 1953:画 1950:画 1947:画 1944:画 1941:画 1925:海 1922:海 1919:海 1916:海 1913:海 1897:次 1894:次 1891:次 1888:次 1885:次 1869:抵 1866:抵 1863:抵 1860:抵 1857:抵 1841:才 1838:才 1835:才 1832:才 1829:才 1813:情 1810:情 1807:情 1804:情 1801:情 1785:外 1782:外 1779:外 1776:外 1773:外 1757:化 1754:化 1751:化 1748:化 1745:化 1729:刃 1726:刃 1723:刃 1720:刃 1717:刃 1701:具 1698:具 1695:具 1692:具 1689:具 1673:关 1670:关 1667:关 1664:关 1661:关 1645:全 1642:全 1639:全 1636:全 1633:全 1617:入 1614:入 1611:入 1608:入 1605:入 1589:免 1586:免 1583:免 1580:免 1577:免 1561:令 1558:令 1555:令 1552:令 1549:令 1533:今 1530:今 1527:今 1524:今 1521:今 1515:U+ 1502:ko 1497:ja 1425:, 1239:𧺯 1121:, 1115:, 1109:, 1040:, 1036:, 1032:, 1020:, 1016:, 1012:, 1008:, 1004:, 1002:C# 1000:, 996:, 981:, 888:, 863:, 851:, 774:. 698:. 618:, 435:. 59:. 7564:/ 7357:) 7233:) 7229:( 6170:/ 6076:e 6069:t 6062:v 4482:e 4475:t 4468:v 3993:e 3986:t 3979:v 3962:. 3943:. 3925:. 3907:. 3889:. 3851:. 3823:) 3795:. 3725:. 3695:. 3646:. 3218:研 3205:硏 3192:研 3172:亀 3159:龜 3146:龟 3128:髙 3113:高 3100:高 3078:餓 3065:饿 3043:稅 3030:税 3010:禅 2997:禪 2984:禅 2962:別 2949:别 2927:內 2914:内 2892:兌 2879:兑 2857:侶 2844:侣 2824:産 2811:產 2798:产 2780:椉 2767:乗 2754:乘 2741:乘 2723:㒳 2710:両 2697:兩 2684:两 2662:丟 2649:丢 2616:语 2613:/ 2610:語 2604:红 2601:/ 2598:紅 2582:兑 2579:/ 2576:兌 2570:内 2564:內 2554:入 2535:令 2529:者 2523:骨 2509:草 2503:⺾ 2497:⺿ 2491:草 2485:草 2479:艸 2470:红 2464:紅 2458:红 2452:糸 2446:红 2440:紅 2434:红 2431:/ 2428:紅 2422:糸 2413:海 2406:海 2399:海 2393:海 2380:入 2374:全 2368:全 2362:内 2359:/ 2356:內 2347:人 2341:入 2335:人 2329:入 2323:全 2317:内 2311:內 2293:骨 2265:雇 2237:道 2209:角 2181:蔥 2153:草 2125:者 2097:空 2069:神 2041:示 2013:真 1985:直 1957:画 1929:海 1901:次 1873:抵 1845:才 1817:情 1789:外 1761:化 1733:刃 1705:具 1677:关 1649:全 1621:入 1593:免 1565:令 1537:今 1413:( 1390:雇 1384:直 1378:侶 1372:侣 1366:仏 1360:佛 1347:亿 1341:億 1335:亿 1329:億 1323:丢 1317:丟 1307:漢 1301:漢 1265:車 1259:車 1253:漢 1233:﨣 1224:亀 1218:龜 1212:亀 1206:龜 1200:龜 1194:亀 1188:个 1182:個 1175:両 1169:兩 1163:仏 1157:佛 1151:丢 1145:丟 1124:兔 1118:兩 1112:別 1106:直 1018:C 909:丢 903:丟 866:g 860:a 854:ɡ 848:ɑ 817:А 810:Α 803:A 679:a 627:壱 621:壹 615:一 576:) 570:( 565:) 561:( 557:. 543:. 517:) 511:( 506:) 502:( 488:. 457:个 448:個 404:( 396:( 388:( 380:( 344:. 334:. 303:) 297:( 285:) 279:( 274:) 270:( 252:. 223:) 217:( 212:) 208:( 204:. 190:. 164:) 158:( 153:) 149:( 139:· 132:· 125:· 118:· 91:. 66:) 62:( 27:.

Index

Chu–Han Contention
Unihan (disambiguation)
improve it
talk page
Learn how and when to remove these messages

verification
improve this article
adding citations to reliable sources
"Han unification"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
spinning off
relocating
Knowledge's inclusion policy
Learn how and when to remove this message
original research
improve it
verifying
inline citations
Learn how and when to remove this message
Learn how and when to remove this message

code point
Source Han Sans
special characters

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.