762:
feature of rich text protocols and not properly handled by the plain text goals of
Unicode. However, when the change from one glyph to another constitutes a change from one grapheme to another—where a glyph cannot possibly still, for example, mean the same grapheme understood as the small letter "a"—Unicode separates those into separate code points. For Unihan the same thing is done whenever the abstract meaning changes, however rather than speaking of the abstract meaning of a grapheme (the letter "a"), the unification of Han ideographs assigns a new code point for each different meaning—even if that meaning is expressed by distinct graphemes in different languages. Although a grapheme such as "ö" might mean something different in English (as used in the word "coördinated") than it does in German (as used in the word "schön"), it is still the same grapheme and can be easily unified so that English and German can share a common abstract Latin writing system (along with Latin itself). This example also points to another reason that "abstract character" and grapheme as an abstract unit in a written language do not necessarily map one-to-one. In English the
1354:
conflicts with the stated goal of
Unicode to take away that overhead, and to allow any number of any of the world's scripts to be on the same document with one encoding system. Chapter One of the handbook states that "With Unicode, the information technology industry has replaced proliferating character sets with data stability, global interoperability and data interchange, simplified software, and reduced development costs. While taking the ASCII character set as its starting point, the Unicode Standard goes far beyond ASCII's limited ability to encode only the upper- and lowercase letters A through Z. It provides the capacity to encode all characters used for the written languages of the world – more than 1 million characters can be encoded. No escape sequence or control code is required to specify any character in any language. The Unicode character encoding treats alphabetic characters, ideographic characters, and symbols equivalently, which means they can be used in any mixture and with equal facility."
785:, first introduced in version 3.2 and supplemented in version 4.0. While variation selectors are treated as combining characters, they have no associated diacritic or mark. Instead, by combining with a base character, they signal the two character sequence selects a variation (typically in terms of grapheme, but also in terms of underlying meaning as in the case of a location name or other proper noun) of the base character. This then is not a selection of an alternate glyph, but the selection of a grapheme variation or a variation of the base abstract character. Such a two-character sequence however can be easily mapped to a separate single glyph in modern fonts. Since Unicode has assigned 256 separate variation selectors, it is capable of assigning 256 variations for any Han ideograph. Such variations can be specific to one language or another and enable the encoding of plain text that includes such grapheme variations.
738:(generating the combination "å") might be understood by a user as a single grapheme while being composed of multiple Unicode abstract characters. In addition, Unicode also assigns some code points to a small number (other than for compatibility reasons) of formatting characters, whitespace characters, and other abstract characters that are not graphemes, but instead used to control the breaks between lines, words, graphemes and grapheme clusters. With the unified Han ideographs, the Unicode Standard makes a departure from prior practices in assigning abstract characters not as graphemes, but according to the underlying meaning of the grapheme: what linguists sometimes call
794:
Japanese writing systems historically, the inability to specify a particular variant was considered a significant obstacle to the use of
Unicode in scholarly work. For example, the unification of "grass" (explained above), means that a historical text cannot be encoded so as to preserve its peculiar orthography. Instead, for example, the scholar would be required to locate the desired glyph in a specific typeface in order to convey the text as written, defeating the purpose of a unified character set. Unicode has responded to these needs by assigning variation selectors so that authors can select grapheme variations of particular ideographs (or even other characters).
798:
displayed incorrectly. (Proper names tend to be especially orthographically conservative—compare this to changing the spelling of one's name to suit a language reform in the US or UK.) While this may be considered primarily a graphical representation or rendering problem to be overcome by more artful fonts, the widespread use of
Unicode would make it difficult to preserve such distinctions. The problem of one character representing semantically different concepts is also present in the Latin part of Unicode. The Unicode character for a curved apostrophe is the same as the character for a right single quote (’). On the other hand, the capital
873:
in another language style. (That is to say, it would be difficult to access "grass" with the four-stroke radical more typical of
Traditional Chinese in a Japanese environment, which fonts would typically depict the three-stroke radical.) Unihan proponents tend to favor markup languages for defining language strings, but this would not ensure the use of a specific variant in the case given, only the language-specific font more likely to depict a character as that variant. (At this point, merely stylistic differences do enter in, as a selection of Japanese and Chinese fonts are not likely to be visually compatible.)
869:). Yet for a reader of Latin script based languages the two variations of the "a" character are both recognized as the same grapheme. Graphemes present in national character code standards have been added to Unicode, as required by Unicode's Source Separation rule, even where they can be composed of characters already available. The national character code standards existing in CJK languages are considerably more involved, given the technological limitations under which they evolved, and so the official CJK participants in Han unification may well have been amenable to reform.
313:
1397:
when
Unicode's definition is that specialized semantic variants have the same meaning only in certain contexts. Languages use them differently. A pair whose characters are 100% drop-in replacements for each other in Japanese may not be so flexible in Chinese. Thus, any comprehensive merger of recommended code points would have to maintain some variants that differ only slightly in appearance even if the meaning is 100% the same for all contexts in one language, because in another language the two characters may not be 100% drop-in replacements.
608:
may be the same for CJK languages, the glyphs in common use for the same characters may not be. For example, the traditional
Chinese glyph for "grass" uses four strokes for the "grass" radical , whereas the simplified Chinese, Japanese, and Korean glyphs use three. But there is only one Unicode point for the grass character (U+8349) regardless of writing system. Another example is the ideograph for "one," which is different in Chinese, Japanese, and Korean. Many people think that the three versions should be encoded differently.
674:
3507:, which provides information about all of the unified Han characters encoded in the Unicode Standard, including mappings to various national and industry standards, indices into standard dictionaries, encoded variants, pronunciations in various languages, and an English definition. The database is available to the public as text files and via an interactive website. The latter also includes representative glyphs and definitions for compound words drawn from the free Japanese
8077:
6033:
6022:
474:
77:
1441:(from a font) suitable to the specified language. (Besides actual character variation—look for differences in stroke order, number, or direction—the typefaces may also reflect different typographical styles, as with serif and non-serif alphabets.) This only works for fallback glyph selection if you have CJK fonts installed on your system and the font selected to display this article does not include glyphs for these characters.
238:
36:
532:
179:
834:(CJK-JRG) favored a proposal (DIS 10646) for a non-unified character set, "which was thrown out in favor of unification with the Unicode Consortium's unified character set by the votes of American and European ISO members" (even though the Japanese position was unclear). Endorsing the Unicode Han unification was a necessary step for the heated ISO 10646/Unicode merger.
1127:, whether that difference be due to simplification, international variance or intra-national variance. However, for some platforms (e.g., smartphones), a device may come with only one font pre-installed. The system font must make a decision for the default glyph for each code point and these glyphs can differ greatly, indicating different underlying graphemes.
2371:. There is a reason for this that has nothing to do with how the domestic bodies view the characters themselves. China went through a process in the twentieth century that changed (if not simplified) several characters. During this transition, there was a need to be able to encode both variants within the same document. Korean has always used the variant of
2512:). The PRC's text encoding bodies did not encode the two variants differently. The fact that almost every other change brought about by the PRC, no matter how minor, did warrant its own code point suggests that this exception may have been unintentional. Unicode copied the existing standards as is, preserving such irregularities.
778:
such as OpenType allow for the mapping of alternate glyphs according to language so that a text rendering system can look to the user's environmental settings to determine which glyph to use. The problem with these approaches is that they fail to meet the goals of
Unicode to define a consistent way of encoding multilingual text.
1295:. Much software (such as the MediaWiki software that hosts Knowledge) will replace all canonically equivalent characters that are discouraged (e.g. the angstrom symbol) with the recommended equivalent. Despite the name, CJK "compatibility variants" are canonically equivalent characters and not compatibility characters.
1131:
semantically identical characters that have many variants. In addition to the standard character sets in
Simplified Chinese, Traditional Chinese, Korean, Vietnamese, Kyūjitai Japanese and Shinjitai Japanese, there also exist "ancient" forms of characters that are of interest to historians, linguists and philologists.
1396:
One would expect that all simplified characters would simultaneously also be z-variants or semantic variants with their traditional counterparts, but many are neither. It is easier to explain the strange case that semantic variants can be simultaneously both semantic variants and specialized variants
1313:
Some pairs of
Traditional and Simplified are also considered to be semantic variants. According to Unicode's definitions, it makes sense that all simplifications (that do not result in wholly different characters being merged for their homophony) will be a form of semantic variant. Unicode classifies
3418:
These compatibility characters (excluding the twelve unified ideographs in the CJK Compatibility Ideographs block) are included for compatibility with legacy text handling systems and other legacy character sets. They include forms of characters for vertical text layout and rich text characters that
1071:
have specifically listed the system as a trade barrier in Japan. The report claimed that the adoption of the TRON-based system by the Japanese government is advantageous to Japanese manufacturers, and thus excluding US operating systems from the huge new market; specifically the report lists MS-DOS,
872:
Unlike European versions, CJK Unicode fonts, due to Han unification, have large but irregular patterns of overlap, requiring language-specific fonts. Unfortunately, language-specific fonts also make it difficult to access a variant which, as with the "grass" example, happens to appear more typically
793:
Since the Unihan standard encodes "abstract characters", not "glyphs", the graphical artifacts produced by Unicode have been considered temporary technical hurdles, and at most, cosmetic. However, again, particularly in Japan, due in part to the way in which Chinese characters were incorporated into
777:
To deal with the use of different graphemes for the same Unihan sememe, Unicode has relied on several mechanisms: especially as it relates to rendering text. One has been to treat it as simply a font issue so that different fonts might be used to render Chinese, Japanese or Korean. Also font formats
2402:
may have to tag the character as "Traditional Chinese" or trust that the recipient's Japanese font uses only the Kyūjitai glyphs, but tags of Traditional Chinese and Simplified Chinese may be necessary to show the two forms side by side in a Japanese textbook. This would preclude one from using the
2386:
Almost all of the variants that the PRC developed or standardized got distinct code points owing simply to the fortune of the Simplified Chinese transition carrying through into the computing age. This privilege however, seems to apply inconsistently, whereas most simplifications performed in Japan
1130:
Consequently, relying on language markup across the board as an approach is beset with two major issues. First, there are contexts where language markup is not available (code commits, plain text). Second, any solution would require every operating system to come pre-installed with many glyphs for
607:
The problem stems from the fact that Unicode encodes characters rather than "glyphs," which are the visual representations of the characters. There are four basic traditions for East Asian character shapes: traditional Chinese, simplified Chinese, Japanese, and Korean. While the Han root character
2307:
In the twentieth century, East Asian countries made their own respective encoding standards. Within each standard, there coexisted variants with distinct code points, hence the distinct code points in Unicode for certain sets of variants. Taking Simplified Chinese as an example, the two character
3256:
In order to resolve issues brought by Han unification, a Unicode Technical Standard known as the Unicode Ideographic Variation Database have been created to resolve the problem of specifying specific glyph in plain text environment. By registering glyph collections into the Ideographic Variation
2515:
The Unicode Consortium has recognized errors in other instances. The myriad Unicode blocks for CJK Han Ideographs have redundancies in original standards, redundancies brought about by flawed importation of the original standards, as well as accidental mergers that are later corrected, providing
761:
For a grapheme to be represented by various glyphs means that the grapheme has glyph variations that are usually determined by selecting one font or another or using glyph substitution features where multiple glyphs are included in a single font. Such glyph variations are considered by Unicode a
685:
A grapheme is the smallest abstract unit of meaning in a writing system. Any grapheme has many possible glyph expressions, but all are recognized as the same grapheme by those with reading and writing knowledge of a particular writing system. Although Unicode typically assigns characters to code
1084:
to cancel the Center of Educational Computing's selection of the TRON-based system for the use of educational computers. The incident is regarded as a symbolic event for the loss of momentum and eventual demise of the BTRON system, which led to the widespread adoption of MS-DOS in Japan and the
1268:
as both its compatibility variant and its z-variant. The compatibility variant field overrides the z-variant field, forcing normalization under all forms, including canonical equivalence. Despite the name, compatibility variants are actually canonically equivalent and are united in any Unicode
1102:
Japanese or Vietnamese. Instead of some variants getting distinct code points while other groups of variants have to share single code points, all variants could be reliably expressed only with metadata tags (e.g., CSS formatting in webpages). The burden would be on all those who use differing
3433:
The International Ideographs Core (IICore) is a subset of 9810 ideographs derived from the CJK Unified Ideographs tables, designed to be implemented in devices with limited memory, input/output capability, and/or applications where the use of the complete ISO 10646 ideograph repertoire is not
1134:
Unicode's Unihan database has already drawn connections between many characters. The Unicode database catalogs the connections between variant characters with distinct code points already. However, for characters with a shared code point, the reference glyph image is usually biased toward the
2519:
For native speakers, variants can be unintelligible or be unacceptable in educated contexts. English speakers may understand a handwritten note saying "4P5 kg" as "495 kg", but writing the nine backwards (so it looks like a "P") can be jarring and would be considered incorrect in any school.
1353:
Unicode claims that "Ideally, there would be no pairs of z-variants in the Unicode Standard." This would make it seem that the goal is to at least unify all minor variants, compatibility redundancies and accidental redundancies, leaving the differentiation to fonts and to language tags. This
912:
U+4E22 for Simplified Chinese GB #2210). It is also noted that Traditional and Simplified characters should be encoded separately according to Unicode Han Unification rules, because they are distinguished in pre-existing PRC character sets. Furthermore, as with other variants, Traditional to
797:
Small differences in graphical representation are also problematic when they affect legibility or belong to the wrong cultural tradition. Besides making some Unicode fonts unusable for texts involving multiple "Unihan languages", names or other orthographically sensitive terminology might be
826:
Some of the controversy stems from the fact that the very decision of performing Han unification was made by the initial Unicode Consortium, which at the time was a consortium of North American companies and organizations (most of them in California), but included no East Asian government
827:
representatives. The initial design goal was to create a 16-bit standard, and Han unification was therefore a critical step for avoiding tens of thousands of character duplications. This 16-bit requirement was later abandoned, making the size of the character set less of an issue today.
742:. This departure therefore is not simply explained by the oft quoted distinction between an abstract character and a glyph, but is more rooted in the difference between an abstract character assigned as a grapheme and an abstract character assigned as a sememe. In contrast, consider
681:" has widely differing glyphs that all represent concrete instances of the same abstract grapheme. Although a native reader of any language using the Latin script recognizes these two glyphs as the same grapheme, to others they might appear to be completely unrelated.
1178:
to be near identical z-variants while at the same time classifying them as significantly different semantic variants. There are also cases of some pairs of characters being simultaneously semantic variants and specialized semantic variants and simplified variants:
900:) and they are, with some differences, more familiar to Korean and Japanese users.) Unicode is seen as neutral with regards to this politically charged issue, and has encoded Simplified and Traditional Chinese glyphs separately (e.g. the ideograph for "discard" is
2303:
No character variant that is exclusive to Korean or Vietnamese has received its own code point, whereas almost all Shinjitai Japanese variants or Simplified Chinese variants each have distinct code points and unambiguous reference glyphs in the Unicode standard.
1072:
OS/2 and UNIX as examples. The Office of USTR was allegedly under Microsoft's influence as its former officer Tom Robertson was then offered a lucrative position by Microsoft. While the TRON system itself was subsequently removed from the list of sanction by
1242:(U+27EAF). If a font has glyphs encoded to both points so that one font is used for both, they should appear identical. These cases are listed as z-variants despite having no variance at all. Intentionally duplicated characters were added to facilitate
630:) are encoded separately in Unicode, as they are not considered national variants. The first is the common form in all three countries, while the second and third are used on financial instruments to prevent tampering (they may be considered variants).
1093:
There has not been any push for full semantic unification of all semantically linked characters, though the idea would treat the respective users of East Asian languages the same, whether they write in Korean, Simplified Chinese, Traditional Chinese,
633:
However, Han unification has also caused considerable controversy, particularly among the Japanese public, who, with the nation's literati, have a history of protesting the culling of historically and culturally significant variants. (See
2396:. This can cause problems for the language tagging strategy. There is no universal tag for the traditional and "simplified" versions of Japanese as there are for Chinese. Thus, any Japanese writer wanting to display the Kyūjitai form of
1310:(U+6F22) does not have this equivalence listed in this entry. Unicode demands that all entries, once admitted, cannot change compatibility or equivalence so that normalization rules for already existing characters do not change.
1246:. Because round-trip conversion was an early selling point of Unicode, this meant that if a national standard in use unnecessarily duplicated a character, Unicode had to do the same. Unicode calls these intentional duplications "
841:, as defined in Unicode, and the related but distinct idea of graphemes. Unicode assigns abstract characters (graphemes), as opposed to glyphs, which are a particular visual representations of a character in a specific
2550:
attributes. However, some variants with arguably minimal differences get distinct codepoints, and not every variant with arguably substantial changes gets a unique codepoint. As an example, take a character such as
822:
While the unification aspect of Unicode is controversial in some quarters for the reasons given above, Unicode itself does now encode a vast number of seldom-used characters of a more-or-less antiquarian nature.
3524:
Most of these are legacy and obsolete characters, however, as per Unicode's objective to encode every writing system that is or has ever been used; only 2000 to 3000 characters are necessary to be considered
2409:
in Unicode, but only for "compatibility reasons". Any Unicode-conformant font must display the Kyūjitai and Shinjitai versions' equivalent code points in Unicode as the same. Unofficially, a font may display
3257:
Database (IVD), it is possible to use Ideographic Variation Selectors to form Ideographic Variation Sequence (IVS) to specify or restrict the appropriate glyph in text processing in a Unicode environment.
2383:(U+5165) radical on top. Therefore, it had no reason to encode both variants. Korean language documents made in the twentieth century had little reason to represent both versions in the same document.
2473:(U+7EA2) got separate code points in the PRC's text encoding standards bodies so Chinese-language documents could use both versions. The two variants received distinct code points in Unicode as well.
2585:(U+514C/U+5151), either method can be used to display the different glyphs. In the following table, each row compares variants that have been assigned different code points. For brevity, note that
596:
may approach or exceed 100,000 characters. Version 1 of Unicode was designed to fit into 16 bits and only 20,940 characters (32%) out of the possible 65,536 were reserved for these
766:, "¨", and the "o" it modifies may be seen as two separate graphemes, whereas in languages such as Swedish, the letter "ö" may be seen as a single grapheme. Similarly in English
3338:(F900–FAFF) (the twelve characters at FA0E, FA0F, FA11, FA13, FA14, FA1F, FA21, FA23, FA24, FA27, FA28 and FA29 are actually "unified ideographs" not "compatibility ideographs")
546:
193:
1369:
as equivalent. Even within Japan, the variants are on different sides of a major simplification called Shinjitai. Unicode would effectively make the PRC's simplification of
758:
and a single quotation mark) are unified because the glyphs are the same. For Unihan the characters are not unified by their appearance, but by their definition or meaning.
7695:
2416:
differently with 海 (U+6D77) as the Shinjitai version and 海 (U+FA45) as the Kyūjitai version (which is identical to the traditional version in written Chinese and Korean).
2387:
and mainland China with code points in national standards, including characters simplified differently in each country, did make it into Unicode as distinct code points.
2350:(U+4EBA). Both variants of the first character got their own distinct code points. However, the two variants of the second character had to share the same code point.
3969:
977:, which is now the base character set for many new standards and protocols, internationally adopted, and is built into the architecture of operating systems (
6540:
770:
on an "i" is understood as a part of the "i" grapheme whereas in other languages, such as Turkish, the dot may be seen as a separate grapheme added to the
589:(IRG), made up of experts from the Chinese-speaking countries, North and South Korea, Japan, Vietnam, and other countries, is responsible for the process.
3991:
1063:-based system was adopted by Japanese government organizations "Center for Educational Computing" as the system of choice for school education including
642:
6377:
2546:
In some cases, often where the changes are the most striking, Unicode has encoded variant characters, making it unnecessary to switch between fonts or
2353:
The justification Unicode gives is that the national standards body in the PRC made distinct code points for the two variations of the first character
1357:
This leaves the option to settle on one unified reference grapheme for all z-variants, which is contentious since few outside of Japan would recognize
921:
There are several alternative character sets that are not encoding according to the principle of Han Unification, and thus free from its restrictions:
1256:(U+6F22) its compatibility variant. As long as an application uses the same font for both, they should appear identical. Sometimes, as in the case of
710:
However, this quote refers to the fact that some graphemes are composed of several graphic elements or "characters". So, for example, the character
7486:
4925:
1068:
4580:
2500:). Simplified Chinese, Kyūjitai Japanese and Shinjitai Japanese use a three-stroke version, like two plus signs sharing their horizontal strokes (
2488:(U+8349), the radical was placed at the top, but had two different forms. Traditional Chinese and Korean use a four-stroke version. At the top of
4867:
6485:
6560:
6074:
5990:
1326:
as each other's respective traditional and simplified variants and also as each other's semantic variants. However, while Unicode classifies
2390:
Sixty-two Shinjitai "simplified" characters with distinct code points in Japan got merged with their Kyūjitai traditional equivalents, like
4480:
1381:(U+4FB6) a monumental difference by comparison. Such a plan would also eliminate the very visually distinct variations for characters like
1077:
550:
197:
5975:
3675:
2595:). They will not appear here nor will the simplified Chinese characters that take consistently simplified radical components (e.g.,
3936:
3445:
The libUnihan project provides a normalized SQLite Unihan database and corresponding C library. All tables in this database are in
3999:
694:
An abstract character does not necessarily correspond to what a user thinks of as a "character" and should not be confused with a
592:
One rationale was the desire to limit the size of the full Unicode character set, where CJK characters as represented by discrete
8119:
7787:
7541:
5995:
4785:
4127:
3984:
3393:
1045:
2455:
component. However, in mainland China, the standards bodies wanted to standardize the cursive form when used in characters like
948:
These region-dependent character sets are also seen as not affected by Han Unification because of their region-specific nature:
7777:
4770:
4047:
3275:
1405:
In each row of the following table, the same character is repeated in all six columns. However, each column is marked (by the
141:
7526:
6480:
5014:
4693:
113:
7660:
5009:
1073:
1025:
845:. One character may be represented by many distinct glyphs, for example a "g" or an "a", both of which may have one loop (
8099:
1154:(U+4E22) are examples that Unicode gives as differing in a significant way in their abstract shapes, while Unicode lists
782:
341:
8109:
8054:
7565:
7368:
6112:
4543:
4095:
4083:
4079:
4075:
4071:
4067:
4063:
4059:
4055:
4051:
3977:
3364:
3329:
3323:
3317:
3311:
3305:
3299:
3293:
3287:
3281:
120:
819:. This is, of course, desirable for reasons of compatibility, and deals with a much smaller alphabetic character set.
7610:
7226:
7221:
6724:
6555:
6102:
6067:
3866:
638:. Today, the list of characters officially recognized for use in proper names continues to expand at a modest pace.)
600:. Unicode was later extended to 21 bits allowing many more CJK characters (97,680 are assigned, with room for more).
572:
513:
299:
281:
219:
160:
63:
4378:
4369:
263:
8129:
6644:
4842:
4473:
3806:
1247:
963:
491:
94:
49:
7862:
7797:
7551:
7531:
4847:
4762:
4663:
4123:
4107:
4091:
3411:
3405:
3399:
876:
Chinese users seem to have fewer objections to Han unification, largely because Unicode did not attempt to unify
554:
431:
unit – hence, "Han unification", with the resulting character repertoire sometimes contracted to
201:
127:
5324:
5043:
955:(based on sequence codes to switch between Chinese, Japanese, Korean character sets – hence without unification)
656:
8104:
7615:
6208:
1001:
881:
495:
443:
248:
98:
4185:
3624:
7729:
7700:
7350:
5144:
4958:
4942:
4905:
4752:
4688:
4508:
4145:
3820:
3428:
1418:
885:
877:
452:
3342:
Unicode includes support of CJKV radicals, strokes, punctuation, marks and symbols in the following blocks:
2449:(U+7EA2) are semantically identical and the glyphs differ only in the latter using a cursive version of the
703:
687:
109:
7792:
7680:
7640:
6060:
5883:
5753:
5068:
4678:
4115:
3515:
dictionary projects (which are provided for convenience and are not a formal part of the Unicode Standard).
3387:
3335:
1414:
1243:
1067:. However, in April, a report titled "1989 National Trade Estimate Report on Foreign Trade Barriers" from
865:
859:
853:
847:
8124:
8034:
7645:
7575:
7561:
7546:
7450:
7363:
7335:
7301:
6002:
5119:
4930:
4730:
4466:
4103:
4099:
3358:
3352:
1005:
997:
4158:
781:
So rather than treat the issue as a rich text problem of glyph alternates, Unicode added the concept of
8008:
7953:
7874:
7655:
7311:
7306:
6659:
5682:
5169:
5004:
4999:
4648:
4553:
3657:
831:
586:
416:
7650:
5687:
5063:
4138:
7715:
7670:
7506:
7055:
6759:
6704:
6669:
5607:
4884:
4593:
3739:
1135:
Traditional Chinese version. Also, the decision of whether to classify pairs as semantic variants or
635:
419:. In the formulation of Unicode, an attempt was made to unify these variants by considering them as
7230:
6739:
6719:
6714:
6654:
6649:
6158:
4698:
2589:
variants with different components will usually (and unsurprisingly) take unique codepoints (e.g.,
1338:(U+4EBF) as each other's respective traditional and simplified variants, Unicode does not consider
1017:
1013:
7605:
5484:
3606:
8080:
8064:
7991:
7986:
7948:
7919:
7884:
7316:
7050:
6749:
6634:
6037:
5938:
5858:
5294:
5199:
4548:
4536:
4119:
4087:
3381:
3346:
1304:(U+6F22) was and its entry informs the user of the compatibility information. On the other hand,
484:
357:
337:
331:
259:
87:
24:
1191:(U+4E2A). There are cases of non-mutual equivalence. For example, the Unihan database entry for
541:
may contain an excessive amount of intricate detail that may interest only a particular audience
188:
may contain an excessive amount of intricate detail that may interest only a particular audience
7675:
7665:
7521:
7511:
7045:
6754:
6199:
6186:
6122:
5627:
5374:
5229:
5164:
4879:
4747:
4653:
4612:
4293:
4000:
3266:
3251:
1262:
with U+8ECA and U+F902, the added compatibility character lists the already present version of
597:
20:
2437:, with two variants, the second form being simply the cursive form. The radical components of
7852:
7690:
7625:
7501:
7040:
6194:
5823:
5577:
5572:
5449:
4862:
4627:
3937:"OGCIO : Download Area : International Ideographs Core (IICORE) Comparison Utility"
2520:
Likewise, to users of one CJK language reading a document with "foreign" glyphs: variants of
763:
134:
7070:
5529:
2482:(U+8278) proves how arbitrary the state of affairs is. When used to compose characters like
1269:
normalization scheme and not only under compatibility normalization. This is similar to how
603:
An article hosted by IBM attempts to illustrate part of the motivation for Han unification:
8013:
7685:
7445:
7065:
5808:
5722:
5642:
5364:
5329:
5209:
1064:
55:
3685:
8:
7968:
7595:
7080:
6965:
6955:
6950:
5853:
5763:
5707:
5073:
5053:
4852:
4837:
4742:
4668:
4658:
1430:
930:
401:
255:
5848:
3705:
2461:. Because this change happened relatively recently, there was a transition period. Both
837:
Much of the controversy surrounding Han unification is based on the distinction between
659:), summarizing major criticism against the Han Unification approach adopted by Unicode.
8049:
7897:
7710:
7705:
7630:
6629:
6603:
6127:
6083:
5954:
5873:
5843:
5813:
5793:
5429:
5409:
5159:
4725:
4602:
4598:
4503:
4038:
3550:
3504:
830:
The controversy later extended to the internationally representative ISO: the initial
8039:
7978:
7958:
7620:
7600:
7580:
7208:
6684:
6664:
6176:
5923:
5833:
5818:
5657:
5622:
5434:
5274:
5099:
4910:
4622:
4563:
4343:
4111:
3862:
3814:
3484:
3446:
3442:
The Unihan project has always made an effort to make available their build database.
3375:
3278:(4E00–9FFF) (Otherwise known as URO, abbreviation of Unified Repertoire and Ordering)
1422:
978:
925:
813:
652:
385:
312:
5349:
3918:
3567:
2320:(U+5185) differ in exactly the same way as do the Korean and non-Korean variants of
704:
The Unicode® Standard Version 15.0 – Core Specification §3.4 Characters and Encoding
8114:
7996:
7570:
7536:
7246:
7075:
6026:
5985:
5878:
5828:
5717:
5677:
5602:
5592:
5582:
5454:
5439:
5344:
5319:
5194:
5174:
5034:
4920:
4673:
4632:
1410:
377:
3774:
2538:
may be unreadable to Non-Japanese people. (In Japan, both variants are accepted).
2403:
same font for an entire document, however. There are two distinct code points for
686:
points to express the graphemes within a system of writing, the Unicode Standard (
7963:
6694:
6689:
6679:
6624:
6309:
6299:
6294:
6289:
6284:
6279:
6274:
5980:
5933:
5918:
5778:
5743:
5738:
5672:
5662:
5612:
5479:
5469:
5464:
5414:
5384:
5254:
5244:
5204:
5104:
4994:
4683:
4588:
4558:
4022:
1426:
935:
393:
321:
5189:
3789:
1230:
Some clerical errors led to doubling of completely identical characters such as
1166:
as z-variants, differing only in font styling. Paradoxically, Unicode considers
816:
7496:
7491:
7481:
7476:
7471:
7466:
7430:
7425:
7418:
7413:
7408:
7403:
7398:
7393:
7388:
7383:
7378:
7373:
7241:
7198:
7193:
7188:
7183:
7178:
7173:
7168:
7163:
7158:
7153:
7148:
7143:
7138:
7133:
7128:
7035:
7030:
7025:
7020:
7015:
7010:
7005:
7000:
6995:
6990:
6985:
6980:
6764:
6349:
6269:
6264:
6259:
6254:
6249:
6244:
6239:
6234:
6229:
6097:
5888:
5868:
5788:
5768:
5758:
5667:
5514:
5444:
5419:
5399:
5354:
5339:
5314:
5264:
5239:
5184:
5139:
3844:
3644:"Unicode Technical Note 26: On the Encoding of Latin, Greek, Cyrillic, and Han"
2557:(U+5165), for which the only way to display the variants is to change font (or
1033:
806:
369:
365:
6744:
3371:
Additional compatibility (discouraged use) characters appear in these blocks:
2365:, whereas Korea never made separate code points for the different variants of
626:
620:
614:
405:
8093:
7816:
7236:
7123:
7118:
7113:
7108:
7103:
7098:
6975:
6970:
6960:
6945:
6940:
6935:
6930:
6925:
6920:
6915:
6910:
6905:
6900:
6895:
6890:
6885:
6880:
6875:
6870:
6865:
6860:
6855:
6850:
6845:
6840:
6835:
6830:
6825:
6820:
6815:
6810:
6805:
6800:
6795:
6790:
6785:
6780:
6699:
6674:
6639:
6598:
6344:
5908:
5893:
5773:
5712:
5597:
5539:
5534:
5499:
5474:
5424:
5309:
5299:
5149:
5094:
4937:
4735:
4531:
4017:
3588:
1139:
is not always consistent or clear, despite rationalizations in the handbook.
1081:
1076:
after protests by the organization in May 1989, the trade dispute caused the
361:
553:
any relevant information, and removing excessive detail that may be against
200:
any relevant information, and removing excessive detail that may be against
7836:
7831:
7826:
7821:
7556:
7296:
7291:
7286:
7281:
7276:
7271:
7266:
7261:
7256:
7251:
6734:
6729:
6709:
6593:
6585:
6218:
5903:
5637:
5632:
5587:
5489:
5404:
5394:
5304:
5279:
5219:
5134:
5114:
5109:
5089:
4968:
4915:
3472:
3271:
Ideographic characters assigned by Unicode appear in the following blocks:
1514:
1037:
973:
However, none of these alternative standards has been as widely adopted as
952:
799:
673:
438:
Nevertheless, many characters have regional variants assigned to different
4513:
3956:"libUnihan - A library for Unihan character database in fifth normal form"
3236:
6151:
6134:
5959:
5798:
5783:
5697:
5567:
5544:
5509:
5379:
5359:
5334:
5154:
4617:
4607:
3454:
1434:
1009:
747:
428:
4889:
3900:
3882:
3546:
1095:
884:. (Simplified Chinese characters are used among Chinese speakers in the
8001:
7909:
7762:
7440:
6535:
6505:
6500:
6495:
6490:
6455:
6339:
6334:
6324:
6319:
6117:
6107:
5838:
5289:
5179:
4815:
4523:
3718:
982:
755:
498: in this section. Unsourced material may be challenged and removed.
439:
317:
7942:
6052:
3643:
2561:
attribute) as described in the previous table. On the other hand, for
958:
7889:
7867:
7772:
7585:
6614:
6545:
6525:
6520:
6445:
6440:
5702:
5617:
5549:
5284:
5058:
4978:
4973:
4874:
4857:
3478:
2586:
2338:(U+4EBA). Each respective variant of the second character has either
1221:
was obviously already in the database at the time that the entry for
1136:
1099:
1041:
989:
889:
771:
751:
668:
593:
420:
3955:
2326:(U+5168). Each respective variant of the first character has either
585:
The Unicode Standard details the principles of Han unification. The
473:
266:. Statements consisting only of original research should be removed.
76:
8059:
7914:
7879:
7857:
7767:
7590:
6530:
6515:
6475:
6470:
6465:
6450:
6409:
6404:
6399:
6394:
6389:
6384:
6181:
6171:
6167:
6141:
5928:
5898:
5748:
5733:
5728:
5519:
5269:
5249:
5214:
5124:
4963:
4780:
4396:
4385:
3807:"Unicode in Japan: Guide to a technical and psychological struggle"
3466:
1053:
1049:
896:. Traditional Chinese characters are used in Hong Kong and Taiwan (
893:
842:
651:" (We are feeling anxious for the future character encoding system
412:
373:
5369:
754:, where graphemes with widely different meanings (for example, an
7929:
7725:
7635:
7516:
7090:
6460:
6435:
6425:
6163:
5652:
5647:
5524:
5459:
5389:
5129:
4489:
4004:
974:
940:
353:
7934:
7924:
7902:
7782:
7757:
7752:
7435:
7326:
7216:
6575:
6565:
6550:
6367:
5913:
5692:
5504:
5494:
5224:
4810:
4805:
4775:
4374:
3760:
3512:
3503:
Unihan can also refer to the Unihan Database maintained by the
767:
739:
424:
2541:
8029:
7747:
7742:
7737:
7354:
7060:
6570:
6510:
6372:
6146:
6007:
5863:
5803:
5259:
5234:
4800:
4795:
4790:
3547:"Unicode® Standard Annex #38 | UNICODE HAN DATABASE (UNIHAN)"
3508:
3434:
feasible. There are 9810 characters in the current standard.
1438:
1060:
1029:
1021:
985:
967:
838:
809:
743:
397:
389:
381:
3755:
3753:
2573:(U+5185) gets a unique codepoint. For some characters, like
7340:
6430:
3450:
993:
897:
376:. Han characters are a feature shared in common by written
4458:
7807:
3750:
3680:
1085:
eventual adoption of Unicode with its successor Windows.
788:
1400:
913:
Simplified characters is not a one-to-one relationship.
19:"Unihan" redirects here. For the historical period, see
3245:
2532:
can be missing a stroke/have an extraneous stroke, and
3625:"Chapter 18: East Asia, Principles of Han Unification"
3453:, while its database, UnihanDb, is released under the
1088:
802:
678:
2494:
should be something that looks like two plus signs (
3481: – Glyphs with minor typographical differences
643:
Japan Electronic Industries Development Association
636:
Kanji § Orthographic reform and lists of kanji
101:. Unsourced material may be challenged and removed.
5033:
3901:"UTS #37: Unicode Ideographic Variation Database"
3419:Unicode recommends handling through other means.
8091:
8009:Unicode control, format and separator characters
3845:"The Most Popular Operating System in the World"
3422:
1298:漢 (U+FA9A) was added to the database later than
1203:(U+9F9C) to be its z-variant, but the entry for
1069:Office of the United States Trade Representative
3583:
3581:
3216:
3170:
3008:
2822:
2765:
2708:
2590:
2411:
2404:
2397:
2391:
1364:
1358:
1222:
1210:
1192:
1173:
1161:
906:U+4E1F for Traditional Chinese Big5 #A5E1 and
646:
6068:
4474:
3985:
3607:"Unihan Database Lookup: Sample lookup for 中"
3475: – Assimilation into Han Chinese culture
3203:
3190:
3157:
3144:
3126:
3111:
3098:
3076:
3063:
3041:
3028:
2995:
2982:
2960:
2947:
2925:
2912:
2890:
2877:
2855:
2842:
2809:
2796:
2778:
2752:
2739:
2721:
2695:
2682:
2660:
2647:
2614:
2608:
2602:
2596:
2580:
2574:
2568:
2562:
2552:
2533:
2527:
2521:
2507:
2501:
2495:
2489:
2483:
2477:
2468:
2462:
2456:
2450:
2444:
2438:
2432:
2426:
2420:
2360:
2354:
2345:
2339:
2333:
2327:
2315:
2309:
1409:attribute) as being in a different language:
1388:
1382:
1376:
1370:
1345:
1339:
1333:
1327:
1321:
1315:
1305:
1299:
1263:
1257:
1251:
1237:
1231:
1216:
1204:
1198:
1186:
1180:
1167:
1155:
1149:
1143:
1122:
1116:
1110:
1104:
907:
901:
455:
446:
2378:
2372:
2366:
2321:
1282:is canonically equivalent to a pre-composed
1078:Ministry of International Trade and Industry
463:
3809:. Archived from the original on 2009-06-27.
3578:
3562:
3560:
3469: – Official Chinese character encoding
2542:Examples of some non-unified Han ideographs
64:Learn how and when to remove these messages
6075:
6061:
5976:Cultural, political, and religious symbols
4481:
4467:
3992:
3978:
3734:
3732:
662:
1681:close (simplified) / laugh (traditional)
612:In fact, the three ideographs for "one" (
573:Learn how and when to remove this message
514:Learn how and when to remove this message
300:Learn how and when to remove this message
282:Learn how and when to remove this message
220:Learn how and when to remove this message
161:Learn how and when to remove this message
3883:"UAX #38: Unicode Han Database (Unihan)"
3877:
3875:
3641:
3557:
3437:
672:
311:
6082:
4509:ISO/IEC 10646 (Universal Character Set)
4128:CJK Compatibility Ideographs Supplement
3729:
3394:CJK Compatibility Ideographs Supplement
2516:precedent for dis-unifying characters.
1350:to be semantic variants of each other.
372:languages into a single set of unified
342:question marks, boxes, or other symbols
16:Effort to map CJK characters in Unicode
8092:
4441:
3842:
3787:
3708:Steven J. Searle; Web Master, TRON Web
1293:LATIN CAPITAL LETTER A WITH RING ABOVE
6056:
5032:
4462:
3973:
3872:
1437:should select, for each character, a
1401:Examples of language-dependent glyphs
645:(JEIDA) published a pamphlet titled "
415:typically use regional or historical
5010:International Components for Unicode
4959:Common Locale Data Repository (CLDR)
3246:Ideographic Variation Database (IVD)
2425:(U+7CF8) is used in characters like
1074:Section 301 of the Trade Act of 1974
1026:International Components for Unicode
525:
496:adding citations to reliable sources
467:
427:representing the same "grapheme" or
411:Modern Chinese, Japanese and Korean
231:
172:
99:adding citations to reliable sources
70:
29:
3449:. libUnihan is released under the
1089:Merger of all equivalent characters
13:
7419:Norwegian and Danish (alternative)
5991:Mathematical operators and symbols
4096:Ideographic Description Characters
4084:CJK Unified Ideographs Extension I
4080:CJK Unified Ideographs Extension H
4076:CJK Unified Ideographs Extension G
4072:CJK Unified Ideographs Extension F
4068:CJK Unified Ideographs Extension E
4064:CJK Unified Ideographs Extension D
4060:CJK Unified Ideographs Extension C
4056:CJK Unified Ideographs Extension B
4052:CJK Unified Ideographs Extension A
3365:Ideographic Description Characters
3330:CJK Unified Ideographs Extension I
3324:CJK Unified Ideographs Extension H
3318:CJK Unified Ideographs Extension G
3312:CJK Unified Ideographs Extension F
3306:CJK Unified Ideographs Extension E
3300:CJK Unified Ideographs Extension D
3294:CJK Unified Ideographs Extension C
3288:CJK Unified Ideographs Extension B
3282:CJK Unified Ideographs Extension A
1048:rendering engines), font formats (
14:
8141:
3260:
2361:
2355:
2316:
2310:
1250:" as with 漢 (U+FA9A) which calls
1244:bit-for-bit round-trip conversion
992:systems), programming languages (
417:variants of a given Han character
320:(U+8FD4) in regional versions of
316:Differences for the same Unicode
45:This article has multiple issues.
8076:
8075:
6032:
6031:
6021:
6020:
6003:Phonetic symbols (including IPA)
3953:
3788:Becker, Joseph D. (1998-08-29).
3761:"Ideographic Variation Database"
2619:). This list is not exhaustive.
530:
472:
236:
177:
75:
34:
7863:Digital encoding of APL symbols
7798:Comparison of Unicode encodings
6316:Proposed but not approved
4124:Enclosed Ideographic Supplement
4108:Enclosed CJK Letters and Months
3947:
3929:
3911:
3893:
3855:
3843:Krikke, Jan (15 October 2003).
3836:
3827:
3799:
3781:
3767:
3711:
3699:
3668:
3406:Enclosed Ideographic Supplement
3400:Enclosed CJK Letters and Months
3237:MDBG Chinese-English Dictionary
2282:
2254:
2226:
2198:
2170:
2142:
2114:
2086:
2058:
2030:
2002:
1974:
1946:
1918:
1890:
1862:
1834:
1806:
1778:
1750:
1722:
1694:
1666:
1638:
1610:
1582:
1554:
1526:
1142:So-called semantic variants of
916:
483:needs additional citations for
352:is an effort by the authors of
86:needs additional citations for
53:or discuss these issues on the
8120:Natural language and computing
3650:
3635:
3617:
3599:
3539:
3518:
3497:
3158:
2996:
2983:
2615:
2609:
2603:
2597:
2581:
2575:
2569:
2563:
2508:
2502:
2496:
2490:
2484:
2469:
2463:
2457:
2445:
2439:
2433:
2427:
2292:
2264:
2236:
2208:
2180:
2152:
2124:
2096:
2068:
2040:
2012:
1984:
1956:
1928:
1900:
1872:
1844:
1816:
1788:
1760:
1732:
1704:
1676:
1648:
1620:
1592:
1564:
1536:
1377:
1371:
1346:
1340:
1334:
1328:
1322:
1316:
1217:
1205:
1199:
1187:
1181:
1168:
1150:
1144:
908:
902:
882:Traditional Chinese characters
456:
447:
1:
4943:International Ideographs Core
4753:International Ideographs Core
4694:Alias names and abbreviations
3532:
3429:International Ideographs Core
3423:International Ideographs Core
2526:can appear as mirror images,
878:Simplified Chinese characters
5165:CJK Unified Ideographs (Han)
5015:People involved with Unicode
4116:CJK Compatibility Ideographs
3684:. 2013-12-16. Archived from
3676:"The secret life of Unicode"
3642:Whistler, Ken (2010-10-25).
3388:CJK Compatibility Ideographs
3336:CJK Compatibility Ideographs
2279:
2276:
2251:
2248:
2223:
2220:
2195:
2192:
2167:
2164:
2139:
2136:
2111:
2108:
2083:
2080:
2055:
2052:
2027:
2024:
1999:
1996:
1971:
1968:
1943:
1940:
1915:
1912:
1887:
1884:
1859:
1856:
1831:
1828:
1803:
1800:
1775:
1772:
1747:
1744:
1719:
1716:
1691:
1688:
1663:
1660:
1635:
1632:
1607:
1604:
1579:
1576:
1551:
1548:
1523:
1520:
1215:as a z-variant, even though
789:Unihan "abstract characters"
555:Knowledge's inclusion policy
423: – different
202:Knowledge's inclusion policy
7:
8035:Character encodings in HTML
7369:National Replacement (NRCS)
7336:Japanese language in EBCDIC
4488:
4100:CJK Symbols and Punctuation
3460:
3359:CJK Symbols and Punctuation
262:the claims made and adding
10:
8146:
8100:Chinese-language computing
5005:Ideographic Research Group
5000:ConScript Unicode Registry
4039:Scripts contained in block
3426:
3264:
3249:
886:People's Republic of China
666:
587:Ideographic Research Group
18:
8110:Korean-language computing
8073:
8022:
7977:
7845:
7806:
7724:
7459:
7349:
7325:
7207:
7089:
6773:
6612:
6584:
6418:
6360:
6217:
6090:
6016:
5968:
5947:
5558:
5082:
5042:
5028:
4987:
4951:
4898:
4885:Regional indicator symbol
4828:
4761:
4718:
4711:
4641:
4594:Combining grapheme joiner
4579:
4572:
4522:
4496:
4437:
4011:
3960:libunihan.sourceforge.net
3819:: CS1 maint: unfit URL (
3740:"Chapter 1: Introduction"
3658:"Han Unification History"
3230:
3217:
3204:
3191:
3171:
3145:
3127:
3112:
3099:
3077:
3064:
3042:
3029:
3009:
2961:
2948:
2926:
2913:
2891:
2878:
2856:
2843:
2823:
2810:
2797:
2779:
2766:
2753:
2740:
2722:
2709:
2696:
2683:
2661:
2648:
2591:
2567:(U+5167), the variant of
2553:
2534:
2528:
2522:
2478:
2451:
2421:
2412:
2405:
2398:
2392:
2379:
2373:
2367:
2346:
2340:
2334:
2328:
2322:
1513:
1475:
1446:
1389:
1383:
1365:
1359:
1306:
1300:
1264:
1258:
1252:
1238:
1232:
1223:
1211:
1193:
1174:
1162:
1156:
1123:
1117:
1111:
1105:
1080:to accept a request from
647:
625:
619:
613:
464:Rationale and controversy
8065:Variable-length encoding
7846:Miscellaneous code pages
6604:Extended Unix Code / EUC
6295:-15 (New Western Europe)
6091:Early telecommunications
6038:Category: Unicode blocks
4843:Compatibility characters
3923:ccjktype.fonts.adobe.com
3775:"Early Years of Unicode"
3719:"IVD/IVSとは - 文字情報基盤整備事業"
3589:"Unihan Database Lookup"
3490:
2476:The case of the radical
832:CJK Joint Research Group
805:is not unified with the
8130:Chinese character lists
7992:C0 and C1 control codes
4763:Comparison of encodings
4689:Halfwidth and fullwidth
4544:Universal Character Set
4120:CJK Compatibility Forms
4088:CJK Radicals Supplement
3382:CJK Compatibility Forms
3347:CJK Radicals Supplement
2291:
2288:
2285:
2263:
2260:
2257:
2235:
2232:
2229:
2207:
2204:
2201:
2179:
2176:
2173:
2151:
2148:
2145:
2123:
2120:
2117:
2095:
2092:
2089:
2067:
2064:
2061:
2039:
2036:
2033:
2011:
2008:
2005:
1983:
1980:
1977:
1955:
1952:
1949:
1927:
1924:
1921:
1899:
1896:
1893:
1871:
1868:
1865:
1843:
1840:
1837:
1815:
1812:
1809:
1787:
1784:
1781:
1759:
1756:
1753:
1731:
1728:
1725:
1703:
1700:
1697:
1675:
1672:
1669:
1647:
1644:
1641:
1619:
1616:
1613:
1591:
1588:
1585:
1563:
1560:
1557:
1535:
1532:
1529:
663:Graphemes versus glyphs
648:未来の文字コード体系に私達は不安をもっています
358:Universal Character Set
25:Unihan (disambiguation)
6240:-3 (Maltese/Esperanto)
6191:World System Teletext
5688:Inscriptional Parthian
5375:Nyiakeng Puachue Hmong
5037:and symbols in Unicode
4654:CJK Unified Ideographs
4048:CJK Unified Ideographs
3276:CJK Unified Ideographs
3267:CJK Unified Ideographs
3252:Variant form (Unicode)
2129:one who does/-ist/-er
1248:compatibility variants
1024:), and libraries (IBM
708:
682:
610:
598:CJK Unified Ideographs
330:This article contains
324:
23:. For other uses, see
8105:Encodings of Japanese
8014:Whitespace characters
7691:Ventura International
5824:Old Persian cuneiform
5683:Inscriptional Pahlavi
5578:Ancient North Arabian
5573:Anatolian hieroglyphs
4863:Precomposed character
4699:Whitespace characters
4628:Zero-width non-joiner
3777:. Unicode Consortium.
3763:. Unicode Consortium.
3746:. Unicode Consortium.
3664:. Unicode Consortium.
3631:. Unicode Consortium.
3613:. Unicode Consortium.
3595:. Unicode Consortium.
3574:. Unicode Consortium.
3438:Unihan database files
1028:(ICU) along with the
692:
677:The Latin lowercase "
676:
605:
315:
7409:Norwegian and Danish
5643:Egyptian hieroglyphs
4848:Duplicate characters
4664:Duplicate characters
3744:The Unicode Standard
3662:The Unicode Standard
3629:The Unicode Standard
3611:The Unicode Standard
3593:The Unicode Standard
3572:The Unicode Standard
1065:compulsory education
736:COMBINING RING ABOVE
721:LATIN SMALL LETTER A
492:improve this article
95:improve this article
7969:Unified Hangul Code
7641:PostScript Standard
7364:Multinational (MCS)
6235:-2 (Central Europe)
6230:-1 (Western Europe)
6084:Character encodings
5708:Khitan small script
5145:Canadian Aboriginal
4880:Variation sequences
4838:Combining character
4748:Variation sequences
4659:Combining character
3861:大下英治 『孫正義 起業の若き獅子』(
3723:mojikiban.ipa.go.jp
1197:(U+4E80) considers
931:CCCII character set
783:variation selectors
764:combining diaeresis
8125:Character encoding
8050:Hardware code page
7810:typesetting system
7646:PostScript Latin 1
7302:Cyrillic + Finnish
7209:Windows code pages
7091:IBM AIX code pages
6419:National standards
6350:Ukrainian Cyrillic
5948:Notational scripts
5899:Tagalog (Baybayin)
5608:Caucasian Albanian
4931:numeric references
4906:Domain names (IDN)
4726:Bidirectional text
4603:Right-to-left mark
4599:Left-to-right mark
4554:Character property
4504:Unicode Consortium
4447:As of version 16.0
3551:Unicode Consortium
3505:Unicode Consortium
966:and its successor
746:'s unification of
683:
332:special characters
325:
247:possibly contains
21:Chu–Han Contention
8087:
8086:
8040:Charset detection
7979:Control character
7661:Sharp calculators
7532:Casio calculators
7460:Platform specific
7312:Cyrillic + German
7307:Cyrillic + French
6725:Maltese/Esperanto
6361:Bibliographic use
6245:-4 (North Europe)
6177:T.51/ISO/IEC 6937
6135:Baudot and Murray
6050:
6049:
6046:
6045:
6027:Category: Unicode
5064:Punctuation marks
5046:inherited scripts
4952:Related standards
4926:entity references
4824:
4823:
4707:
4706:
4623:Zero-width joiner
4456:
4455:
4432:
4431:
4237:2F800–2FA1F
4235:1F200–1F2FF
4215:2EBF0–2EE5F
4213:31350–323AF
4211:30000–3134F
4209:2CEB0–2EBEF
4207:2B820–2CEAF
4205:2B740–2B81F
4203:2A700–2B73F
4201:20000–2A6DF
4112:CJK Compatibility
3706:Unicode Revisited
3485:List of CJK fonts
3447:fifth normal form
3376:CJK Compatibility
3243:
3242:
3239:
3213:
3200:
3187:
3167:
3154:
3141:
3123:
3108:
3095:
3073:
3060:
3038:
3025:
3018:meditation (Zen)
3005:
2992:
2979:
2957:
2944:
2922:
2909:
2887:
2874:
2852:
2839:
2819:
2806:
2793:
2775:
2762:
2749:
2736:
2718:
2705:
2692:
2679:
2657:
2644:
2301:
2300:
1905:secondary/follow
1765:transform/change
1417:and two types of
1059:In March 1989, a
979:Microsoft Windows
926:CNS character set
583:
582:
575:
524:
523:
516:
368:of the so-called
338:rendering support
310:
309:
302:
292:
291:
284:
249:original research
230:
229:
222:
171:
170:
163:
145:
110:"Han unification"
68:
8137:
8079:
8078:
7571:DG International
7446:Special Graphics
7247:Extended Latin-8
6645:Central European
6635:Barents Cyrillic
6340:Barents Cyrillic
6310:-12 (Devanagari)
6306:Abandoned parts
6077:
6070:
6063:
6054:
6053:
6035:
6034:
6024:
6023:
5986:Control Pictures
5939:Zanabazar Square
5678:Imperial Aramaic
5561:historic scripts
5030:
5029:
4890:Emoji skin color
4716:
4715:
4633:Zero-width space
4577:
4576:
4564:Private Use Area
4549:Character charts
4483:
4476:
4469:
4460:
4459:
4445:
4390:Katakana, Common
4014:
4013:
3994:
3987:
3980:
3971:
3970:
3964:
3963:
3951:
3945:
3944:
3941:www.ogcio.gov.hk
3933:
3927:
3926:
3915:
3909:
3908:
3897:
3891:
3890:
3879:
3870:
3859:
3853:
3852:
3849:LinuxInsider.com
3840:
3834:
3833:小林紀興『松下電器の果し状』1章
3831:
3825:
3824:
3818:
3810:
3803:
3797:
3796:
3794:
3785:
3779:
3778:
3771:
3765:
3764:
3757:
3748:
3747:
3736:
3727:
3726:
3715:
3709:
3703:
3697:
3696:
3694:
3693:
3672:
3666:
3665:
3654:
3648:
3647:
3639:
3633:
3632:
3621:
3615:
3614:
3603:
3597:
3596:
3585:
3576:
3575:
3564:
3555:
3554:
3543:
3526:
3522:
3516:
3501:
3231:
3220:
3219:
3211:
3207:
3206:
3198:
3194:
3193:
3185:
3174:
3173:
3165:
3161:
3160:
3152:
3148:
3147:
3139:
3130:
3129:
3121:
3115:
3114:
3106:
3102:
3101:
3093:
3080:
3079:
3071:
3067:
3066:
3058:
3045:
3044:
3036:
3032:
3031:
3023:
3012:
3011:
3003:
2999:
2998:
2990:
2986:
2985:
2977:
2964:
2963:
2955:
2951:
2950:
2942:
2929:
2928:
2920:
2916:
2915:
2907:
2894:
2893:
2885:
2881:
2880:
2872:
2859:
2858:
2850:
2846:
2845:
2837:
2826:
2825:
2817:
2813:
2812:
2804:
2800:
2799:
2791:
2782:
2781:
2773:
2769:
2768:
2760:
2756:
2755:
2747:
2743:
2742:
2734:
2725:
2724:
2716:
2712:
2711:
2703:
2699:
2698:
2690:
2686:
2685:
2677:
2664:
2663:
2655:
2651:
2650:
2642:
2622:
2621:
2618:
2617:
2612:
2611:
2606:
2605:
2600:
2599:
2594:
2593:
2584:
2583:
2578:
2577:
2572:
2571:
2566:
2565:
2560:
2556:
2555:
2549:
2537:
2536:
2531:
2530:
2525:
2524:
2511:
2510:
2505:
2504:
2499:
2498:
2493:
2492:
2487:
2486:
2481:
2480:
2472:
2471:
2466:
2465:
2460:
2459:
2454:
2453:
2448:
2447:
2442:
2441:
2436:
2435:
2430:
2429:
2424:
2423:
2415:
2414:
2408:
2407:
2401:
2400:
2395:
2394:
2382:
2381:
2376:
2375:
2370:
2369:
2364:
2363:
2358:
2357:
2349:
2348:
2343:
2342:
2337:
2336:
2331:
2330:
2325:
2324:
2319:
2318:
2313:
2312:
2294:
2266:
2238:
2210:
2182:
2154:
2126:
2098:
2070:
2042:
2014:
1989:direct/straight
1986:
1958:
1930:
1902:
1874:
1846:
1818:
1790:
1762:
1734:
1706:
1678:
1650:
1622:
1594:
1566:
1538:
1518:
1508:
1503:
1498:
1493:
1488:
1483:
1444:
1443:
1408:
1392:
1391:
1386:
1385:
1380:
1379:
1374:
1373:
1368:
1367:
1362:
1361:
1349:
1348:
1343:
1342:
1337:
1336:
1331:
1330:
1325:
1324:
1319:
1318:
1309:
1308:
1303:
1302:
1294:
1291:
1288:
1286:
1281:
1278:
1275:
1273:
1267:
1266:
1261:
1260:
1255:
1254:
1241:
1240:
1235:
1234:
1226:
1225:
1220:
1219:
1214:
1213:
1208:
1207:
1202:
1201:
1196:
1195:
1190:
1189:
1184:
1183:
1177:
1176:
1171:
1170:
1165:
1164:
1159:
1158:
1153:
1152:
1147:
1146:
1126:
1125:
1120:
1119:
1114:
1113:
1108:
1107:
911:
910:
905:
904:
868:
867:
862:
861:
856:
855:
850:
849:
737:
734:
733:
729:
727:
722:
719:
716:
714:
706:
650:
649:
629:
623:
617:
578:
571:
567:
564:
558:
534:
533:
526:
519:
512:
508:
505:
499:
476:
468:
459:
458:
451:(U+500B) versus
450:
449:
360:to map multiple
305:
298:
287:
280:
276:
273:
267:
264:inline citations
240:
239:
232:
225:
218:
214:
211:
205:
181:
180:
173:
166:
159:
155:
152:
146:
144:
103:
79:
71:
60:
38:
37:
30:
8145:
8144:
8140:
8139:
8138:
8136:
8135:
8134:
8090:
8089:
8088:
8083:
8069:
8045:Han unification
8018:
7973:
7841:
7802:
7720:
7542:Compucolor 8001
7455:
7451:Technical (TCS)
7374:French Canadian
7345:
7321:
7317:Polytonic Greek
7203:
7085:
6769:
6755:Turkic Cyrillic
6670:Font X (Kermit)
6665:Farsi (Persian)
6617:
6608:
6580:
6414:
6356:
6226:Approved parts
6213:
6086:
6081:
6051:
6042:
6012:
5996:List by subject
5969:Symbols, emojis
5964:
5943:
5859:Psalter Pahlavi
5560:
5554:
5415:Pracalit (Newa)
5230:Hanifi Rohingya
5078:
5054:Combining marks
5045:
5038:
5024:
5020:Han unification
4983:
4947:
4894:
4830:
4820:
4757:
4703:
4637:
4581:Special purpose
4568:
4518:
4492:
4487:
4457:
4452:
4451:
4448:
4442:
4433:
4422:
4417:
4404:
4402:
4400:
4395:
4393:
4391:
4389:
4383:
4381:
4372:
4368:
4366:
4364:
4362:
4360:
4358:
4356:
4354:
4352:
4350:
4348:
4346:
4338:
4336:
4334:
4332:
4330:
4328:
4326:
4324:
4322:
4320:
4318:
4316:
4314:
4312:
4310:
4308:
4306:
4304:
4302:
4300:
4298:
4296:
4288:
4286:
4284:
4282:
4280:
4278:
4276:
4274:
4272:
4270:
4268:
4266:
4264:
4262:
4260:
4258:
4256:
4254:
4252:
4250:
4248:
4246:
4240:
4238:
4236:
4234:
4233:FE30–FE4F
4232:
4231:F900–FAFF
4230:
4229:3300–33FF
4228:
4227:3200–32FF
4226:
4225:31C0–31EF
4224:
4223:3000–303F
4222:
4221:2FF0–2FFF
4220:
4219:2F00–2FDF
4218:
4217:2E80–2EFF
4216:
4214:
4212:
4210:
4208:
4206:
4204:
4202:
4200:
4199:3400–4DBF
4198:
4197:4E00–9FFF
4192:
4190:
4188:
4183:
4181:
4179:
4177:
4175:
4173:
4171:
4169:
4167:
4165:
4163:
4161:
4156:
4154:
4152:
4150:
4148:
4143:
4141:
4132:
4130:
4126:
4122:
4118:
4114:
4110:
4106:
4102:
4098:
4094:
4092:Kangxi Radicals
4090:
4086:
4082:
4078:
4074:
4070:
4066:
4062:
4058:
4054:
4050:
4034:Han unification
4007:
3998:
3968:
3967:
3954:Chen, Ding-Yi.
3952:
3948:
3935:
3934:
3930:
3917:
3916:
3912:
3905:www.unicode.org
3899:
3898:
3894:
3887:www.unicode.org
3881:
3880:
3873:
3860:
3856:
3841:
3837:
3832:
3828:
3812:
3811:
3805:
3804:
3800:
3792:
3786:
3782:
3773:
3772:
3768:
3759:
3758:
3751:
3738:
3737:
3730:
3717:
3716:
3712:
3704:
3700:
3691:
3689:
3674:
3673:
3669:
3656:
3655:
3651:
3640:
3636:
3623:
3622:
3618:
3605:
3604:
3600:
3587:
3586:
3579:
3566:
3565:
3558:
3545:
3544:
3540:
3535:
3530:
3529:
3523:
3519:
3502:
3498:
3493:
3463:
3440:
3431:
3425:
3412:Kangxi Radicals
3269:
3263:
3254:
3248:
3221:
3214:
3208:
3201:
3195:
3188:
3175:
3168:
3162:
3155:
3149:
3142:
3131:
3124:
3116:
3109:
3103:
3096:
3081:
3074:
3068:
3061:
3046:
3039:
3033:
3026:
3013:
3006:
3000:
2993:
2987:
2980:
2965:
2958:
2952:
2945:
2930:
2923:
2917:
2910:
2895:
2888:
2882:
2875:
2860:
2853:
2847:
2840:
2827:
2820:
2814:
2807:
2801:
2794:
2783:
2776:
2770:
2763:
2757:
2750:
2744:
2737:
2726:
2719:
2713:
2706:
2700:
2693:
2687:
2680:
2665:
2658:
2652:
2645:
2558:
2547:
2544:
1506:
1501:
1496:
1491:
1486:
1481:
1463:
1461:
1456:
1451:
1406:
1403:
1292:
1289:
1284:
1283:
1279:
1276:
1271:
1270:
1091:
959:Big5 extensions
919:
864:
858:
852:
846:
814:Cyrillic letter
791:
735:
731:
730:
725:
724:
720:
717:
712:
711:
707:
702:
671:
665:
579:
568:
562:
559:
545:Please help by
544:
535:
531:
520:
509:
503:
500:
489:
477:
466:
350:Han unification
347:
346:
345:
336:Without proper
322:Source Han Sans
306:
295:
294:
293:
288:
277:
271:
268:
253:
241:
237:
226:
215:
209:
206:
192:Please help by
191:
182:
178:
167:
156:
150:
147:
104:
102:
92:
80:
39:
35:
28:
17:
12:
11:
5:
8143:
8133:
8132:
8127:
8122:
8117:
8112:
8107:
8102:
8085:
8084:
8081:Character sets
8074:
8071:
8070:
8068:
8067:
8062:
8057:
8052:
8047:
8042:
8037:
8032:
8026:
8024:
8023:Related topics
8020:
8019:
8017:
8016:
8011:
8006:
8005:
8004:
7999:
7989:
7987:Morse prosigns
7983:
7981:
7975:
7974:
7972:
7971:
7966:
7961:
7956:
7951:
7946:
7939:
7938:
7937:
7932:
7927:
7917:
7912:
7907:
7906:
7905:
7900:
7892:
7887:
7882:
7877:
7872:
7871:
7870:
7860:
7855:
7849:
7847:
7843:
7842:
7840:
7839:
7834:
7829:
7824:
7819:
7813:
7811:
7804:
7803:
7801:
7800:
7795:
7790:
7785:
7780:
7775:
7770:
7765:
7760:
7755:
7750:
7745:
7740:
7734:
7732:
7722:
7721:
7719:
7718:
7713:
7708:
7703:
7698:
7693:
7688:
7683:
7681:TI calculators
7678:
7673:
7668:
7663:
7658:
7653:
7648:
7643:
7638:
7633:
7628:
7623:
7618:
7613:
7608:
7603:
7598:
7593:
7588:
7583:
7578:
7573:
7568:
7559:
7554:
7549:
7544:
7539:
7534:
7529:
7524:
7519:
7514:
7509:
7504:
7499:
7494:
7489:
7484:
7479:
7474:
7469:
7463:
7461:
7457:
7456:
7454:
7453:
7448:
7443:
7438:
7433:
7428:
7423:
7422:
7421:
7416:
7411:
7406:
7401:
7396:
7391:
7389:United Kingdom
7386:
7381:
7376:
7366:
7360:
7358:
7347:
7346:
7344:
7343:
7338:
7332:
7330:
7323:
7322:
7320:
7319:
7314:
7309:
7304:
7299:
7294:
7289:
7284:
7279:
7274:
7269:
7264:
7259:
7254:
7249:
7244:
7239:
7234:
7224:
7219:
7213:
7211:
7205:
7204:
7202:
7201:
7196:
7191:
7186:
7181:
7176:
7171:
7166:
7161:
7156:
7151:
7146:
7141:
7136:
7131:
7126:
7121:
7116:
7111:
7106:
7101:
7095:
7093:
7087:
7086:
7084:
7083:
7078:
7073:
7068:
7063:
7058:
7053:
7048:
7043:
7038:
7033:
7028:
7023:
7018:
7013:
7008:
7003:
6998:
6993:
6988:
6983:
6978:
6973:
6968:
6963:
6958:
6953:
6948:
6943:
6938:
6933:
6928:
6923:
6918:
6913:
6908:
6903:
6898:
6893:
6888:
6883:
6878:
6873:
6868:
6863:
6858:
6853:
6848:
6843:
6838:
6833:
6828:
6823:
6818:
6813:
6808:
6803:
6798:
6793:
6788:
6783:
6777:
6775:
6774:DOS code pages
6771:
6770:
6768:
6767:
6762:
6757:
6752:
6747:
6742:
6737:
6732:
6727:
6722:
6720:Latin (Kermit)
6717:
6712:
6707:
6702:
6697:
6692:
6687:
6682:
6677:
6672:
6667:
6662:
6657:
6652:
6647:
6642:
6637:
6632:
6627:
6621:
6619:
6610:
6609:
6607:
6606:
6601:
6596:
6590:
6588:
6582:
6581:
6579:
6578:
6573:
6568:
6563:
6558:
6553:
6548:
6543:
6538:
6533:
6528:
6523:
6518:
6513:
6508:
6503:
6498:
6493:
6488:
6483:
6478:
6473:
6468:
6463:
6458:
6453:
6448:
6443:
6438:
6433:
6428:
6422:
6420:
6416:
6415:
6413:
6412:
6407:
6402:
6397:
6392:
6387:
6382:
6381:
6380:
6375:
6364:
6362:
6358:
6357:
6355:
6354:
6353:
6352:
6347:
6342:
6337:
6329:
6328:
6327:
6322:
6320:KOI-8 Cyrillic
6314:
6313:
6312:
6304:
6303:
6302:
6300:-16 (Romanian)
6297:
6292:
6287:
6282:
6277:
6272:
6267:
6262:
6257:
6252:
6247:
6242:
6237:
6232:
6223:
6221:
6215:
6214:
6212:
6211:
6206:
6205:
6204:
6203:
6202:
6197:
6189:
6184:
6179:
6161:
6156:
6155:
6154:
6144:
6139:
6138:
6137:
6132:
6131:
6130:
6125:
6120:
6115:
6105:
6098:Telegraph code
6094:
6092:
6088:
6087:
6080:
6079:
6072:
6065:
6057:
6048:
6047:
6044:
6043:
6041:
6040:
6029:
6017:
6014:
6013:
6011:
6010:
6005:
6000:
5999:
5998:
5988:
5983:
5978:
5972:
5970:
5966:
5965:
5963:
5962:
5957:
5951:
5949:
5945:
5944:
5942:
5941:
5936:
5931:
5926:
5921:
5916:
5911:
5906:
5901:
5896:
5891:
5886:
5881:
5876:
5871:
5866:
5861:
5856:
5851:
5846:
5841:
5836:
5831:
5826:
5821:
5816:
5811:
5806:
5801:
5796:
5791:
5786:
5781:
5776:
5771:
5766:
5761:
5756:
5751:
5746:
5741:
5736:
5731:
5726:
5720:
5715:
5710:
5705:
5700:
5695:
5690:
5685:
5680:
5675:
5670:
5665:
5660:
5655:
5650:
5645:
5640:
5635:
5630:
5625:
5620:
5615:
5610:
5605:
5600:
5595:
5590:
5585:
5580:
5575:
5570:
5564:
5562:
5556:
5555:
5553:
5552:
5547:
5542:
5537:
5532:
5527:
5522:
5517:
5512:
5507:
5502:
5497:
5492:
5487:
5482:
5477:
5472:
5467:
5462:
5457:
5452:
5450:Sorang Sompeng
5447:
5442:
5437:
5432:
5427:
5422:
5417:
5412:
5407:
5402:
5397:
5392:
5387:
5382:
5377:
5372:
5367:
5362:
5357:
5352:
5347:
5342:
5340:Miao (Pollard)
5337:
5332:
5327:
5322:
5317:
5312:
5307:
5302:
5297:
5292:
5287:
5282:
5277:
5272:
5267:
5262:
5257:
5252:
5247:
5242:
5237:
5232:
5227:
5222:
5217:
5212:
5207:
5202:
5197:
5192:
5187:
5182:
5177:
5172:
5167:
5162:
5157:
5152:
5147:
5142:
5137:
5132:
5127:
5122:
5117:
5112:
5107:
5102:
5097:
5092:
5086:
5084:
5083:Modern scripts
5080:
5079:
5077:
5076:
5071:
5066:
5061:
5056:
5050:
5048:
5040:
5039:
5026:
5025:
5023:
5022:
5017:
5012:
5007:
5002:
4997:
4991:
4989:
4988:Related topics
4985:
4984:
4982:
4981:
4976:
4971:
4966:
4961:
4955:
4953:
4949:
4948:
4946:
4945:
4940:
4935:
4934:
4933:
4928:
4918:
4913:
4908:
4902:
4900:
4896:
4895:
4893:
4892:
4887:
4882:
4877:
4872:
4871:
4870:
4860:
4855:
4850:
4845:
4840:
4834:
4832:
4826:
4825:
4822:
4821:
4819:
4818:
4813:
4808:
4803:
4798:
4793:
4788:
4783:
4778:
4773:
4767:
4765:
4759:
4758:
4756:
4755:
4750:
4745:
4740:
4739:
4738:
4728:
4722:
4720:
4713:
4709:
4708:
4705:
4704:
4702:
4701:
4696:
4691:
4686:
4681:
4676:
4671:
4666:
4661:
4656:
4651:
4645:
4643:
4639:
4638:
4636:
4635:
4630:
4625:
4620:
4615:
4610:
4605:
4596:
4591:
4585:
4583:
4574:
4570:
4569:
4567:
4566:
4561:
4556:
4551:
4546:
4541:
4540:
4539:
4528:
4526:
4520:
4519:
4517:
4516:
4511:
4506:
4500:
4498:
4494:
4493:
4486:
4485:
4478:
4471:
4463:
4454:
4453:
4450:
4449:
4446:
4439:
4438:
4435:
4434:
4430:
4429:
4426:
4423:
4420:
4418:
4415:
4413:
4410:
4406:
4405:
4341:
4339:
4329:12 are unified
4291:
4289:
4243:
4241:
4195:
4193:
4135:
4133:
4045:
4042:
4041:
4036:
4031:
4028:
4025:
4020:
4012:
4009:
4008:
4001:CJK ideographs
3997:
3996:
3989:
3982:
3974:
3966:
3965:
3946:
3928:
3910:
3892:
3871:
3854:
3835:
3826:
3798:
3780:
3766:
3749:
3728:
3710:
3698:
3667:
3649:
3634:
3616:
3598:
3577:
3556:
3537:
3536:
3534:
3531:
3528:
3527:
3517:
3495:
3494:
3492:
3489:
3488:
3487:
3482:
3476:
3470:
3462:
3459:
3439:
3436:
3427:Main article:
3424:
3421:
3416:
3415:
3409:
3403:
3397:
3391:
3385:
3379:
3369:
3368:
3362:
3356:
3350:
3340:
3339:
3333:
3327:
3321:
3315:
3309:
3303:
3297:
3291:
3285:
3279:
3265:Main article:
3262:
3261:Unicode ranges
3259:
3250:Main article:
3247:
3244:
3241:
3240:
3228:
3227:
3224:
3222:
3215:
3209:
3202:
3196:
3189:
3182:
3181:
3178:
3176:
3169:
3163:
3156:
3150:
3143:
3136:
3135:
3132:
3125:
3119:
3117:
3110:
3104:
3097:
3090:
3089:
3086:
3084:
3082:
3075:
3069:
3062:
3055:
3054:
3051:
3049:
3047:
3040:
3034:
3027:
3020:
3019:
3016:
3014:
3007:
3001:
2994:
2988:
2981:
2974:
2973:
2970:
2968:
2966:
2959:
2953:
2946:
2939:
2938:
2935:
2933:
2931:
2924:
2918:
2911:
2904:
2903:
2900:
2898:
2896:
2889:
2883:
2876:
2869:
2868:
2865:
2863:
2861:
2854:
2848:
2841:
2834:
2833:
2830:
2828:
2821:
2815:
2808:
2802:
2795:
2788:
2787:
2784:
2777:
2771:
2764:
2758:
2751:
2745:
2738:
2731:
2730:
2727:
2720:
2714:
2707:
2701:
2694:
2688:
2681:
2674:
2673:
2670:
2668:
2666:
2659:
2653:
2646:
2639:
2638:
2635:
2634:Other variant
2632:
2629:
2626:
2543:
2540:
2299:
2298:
2295:
2290:
2287:
2284:
2281:
2278:
2275:
2271:
2270:
2267:
2262:
2259:
2256:
2253:
2250:
2247:
2243:
2242:
2241:way/path/road
2239:
2234:
2231:
2228:
2225:
2222:
2219:
2215:
2214:
2211:
2206:
2203:
2200:
2197:
2194:
2191:
2187:
2186:
2183:
2178:
2175:
2172:
2169:
2166:
2163:
2159:
2158:
2155:
2150:
2147:
2144:
2141:
2138:
2135:
2131:
2130:
2127:
2122:
2119:
2116:
2113:
2110:
2107:
2103:
2102:
2099:
2094:
2091:
2088:
2085:
2082:
2079:
2075:
2074:
2071:
2066:
2063:
2060:
2057:
2054:
2051:
2047:
2046:
2043:
2038:
2035:
2032:
2029:
2026:
2023:
2019:
2018:
2015:
2010:
2007:
2004:
2001:
1998:
1995:
1991:
1990:
1987:
1982:
1979:
1976:
1973:
1970:
1967:
1963:
1962:
1959:
1954:
1951:
1948:
1945:
1942:
1939:
1935:
1934:
1931:
1926:
1923:
1920:
1917:
1914:
1911:
1907:
1906:
1903:
1898:
1895:
1892:
1889:
1886:
1883:
1879:
1878:
1877:arrive/resist
1875:
1870:
1867:
1864:
1861:
1858:
1855:
1851:
1850:
1847:
1842:
1839:
1836:
1833:
1830:
1827:
1823:
1822:
1819:
1814:
1811:
1808:
1805:
1802:
1799:
1795:
1794:
1791:
1786:
1783:
1780:
1777:
1774:
1771:
1767:
1766:
1763:
1758:
1755:
1752:
1749:
1746:
1743:
1739:
1738:
1735:
1730:
1727:
1724:
1721:
1718:
1715:
1711:
1710:
1707:
1702:
1699:
1696:
1693:
1690:
1687:
1683:
1682:
1679:
1674:
1671:
1668:
1665:
1662:
1659:
1655:
1654:
1651:
1646:
1643:
1640:
1637:
1634:
1631:
1627:
1626:
1623:
1618:
1615:
1612:
1609:
1606:
1603:
1599:
1598:
1595:
1590:
1587:
1584:
1581:
1578:
1575:
1571:
1570:
1569:cause/command
1567:
1562:
1559:
1556:
1553:
1550:
1547:
1543:
1542:
1539:
1534:
1531:
1528:
1525:
1522:
1519:
1510:
1509:
1504:
1499:
1494:
1489:
1484:
1478:
1477:
1474:
1471:
1468:
1465:
1458:
1457:(traditional)
1453:
1448:
1402:
1399:
1209:does not list
1090:
1087:
971:
970:
961:
956:
946:
945:
938:
933:
928:
918:
915:
790:
787:
723:combined with
700:
688:section 3.4 D7
667:Main article:
664:
661:
581:
580:
538:
536:
529:
522:
521:
480:
478:
471:
465:
462:
366:Han characters
362:character sets
340:, you may see
328:
327:
326:
308:
307:
290:
289:
244:
242:
235:
228:
227:
185:
183:
176:
169:
168:
83:
81:
74:
69:
43:
42:
40:
33:
15:
9:
6:
4:
3:
2:
8142:
8131:
8128:
8126:
8123:
8121:
8118:
8116:
8113:
8111:
8108:
8106:
8103:
8101:
8098:
8097:
8095:
8082:
8072:
8066:
8063:
8061:
8058:
8056:
8053:
8051:
8048:
8046:
8043:
8041:
8038:
8036:
8033:
8031:
8028:
8027:
8025:
8021:
8015:
8012:
8010:
8007:
8003:
8000:
7998:
7995:
7994:
7993:
7990:
7988:
7985:
7984:
7982:
7980:
7976:
7970:
7967:
7965:
7962:
7960:
7957:
7955:
7952:
7950:
7947:
7945:
7944:
7940:
7936:
7933:
7931:
7928:
7926:
7923:
7922:
7921:
7918:
7916:
7913:
7911:
7908:
7904:
7901:
7899:
7896:
7895:
7893:
7891:
7888:
7886:
7883:
7881:
7878:
7876:
7873:
7869:
7866:
7865:
7864:
7861:
7859:
7856:
7854:
7851:
7850:
7848:
7844:
7838:
7835:
7833:
7830:
7828:
7825:
7823:
7820:
7818:
7815:
7814:
7812:
7809:
7805:
7799:
7796:
7794:
7791:
7789:
7786:
7784:
7781:
7779:
7776:
7774:
7771:
7769:
7766:
7764:
7761:
7759:
7756:
7754:
7751:
7749:
7746:
7744:
7741:
7739:
7736:
7735:
7733:
7731:
7730:ISO/IEC 10646
7727:
7723:
7717:
7714:
7712:
7709:
7707:
7704:
7702:
7699:
7697:
7694:
7692:
7689:
7687:
7684:
7682:
7679:
7677:
7674:
7672:
7669:
7667:
7664:
7662:
7659:
7657:
7654:
7652:
7649:
7647:
7644:
7642:
7639:
7637:
7634:
7632:
7629:
7627:
7624:
7622:
7619:
7617:
7614:
7612:
7609:
7607:
7604:
7602:
7599:
7597:
7594:
7592:
7589:
7587:
7584:
7582:
7579:
7577:
7574:
7572:
7569:
7567:
7563:
7560:
7558:
7555:
7553:
7550:
7548:
7547:Compucolor II
7545:
7543:
7540:
7538:
7535:
7533:
7530:
7528:
7525:
7523:
7520:
7518:
7515:
7513:
7510:
7508:
7505:
7503:
7502:Acorn RISC OS
7500:
7498:
7495:
7493:
7490:
7488:
7485:
7483:
7480:
7478:
7475:
7473:
7470:
7468:
7465:
7464:
7462:
7458:
7452:
7449:
7447:
7444:
7442:
7439:
7437:
7434:
7432:
7431:8-bit Turkish
7429:
7427:
7424:
7420:
7417:
7415:
7412:
7410:
7407:
7405:
7402:
7400:
7397:
7395:
7392:
7390:
7387:
7385:
7382:
7380:
7377:
7375:
7372:
7371:
7370:
7367:
7365:
7362:
7361:
7359:
7356:
7352:
7348:
7342:
7339:
7337:
7334:
7333:
7331:
7328:
7324:
7318:
7315:
7313:
7310:
7308:
7305:
7303:
7300:
7298:
7295:
7293:
7290:
7288:
7285:
7283:
7280:
7278:
7275:
7273:
7270:
7268:
7265:
7263:
7260:
7258:
7255:
7253:
7250:
7248:
7245:
7243:
7240:
7238:
7235:
7232:
7228:
7225:
7223:
7220:
7218:
7215:
7214:
7212:
7210:
7206:
7200:
7197:
7195:
7192:
7190:
7187:
7185:
7182:
7180:
7177:
7175:
7172:
7170:
7167:
7165:
7162:
7160:
7157:
7155:
7152:
7150:
7147:
7145:
7142:
7140:
7137:
7135:
7132:
7130:
7127:
7125:
7122:
7120:
7117:
7115:
7112:
7110:
7107:
7105:
7102:
7100:
7097:
7096:
7094:
7092:
7088:
7082:
7079:
7077:
7074:
7072:
7069:
7067:
7064:
7062:
7059:
7057:
7054:
7052:
7049:
7047:
7044:
7042:
7039:
7037:
7034:
7032:
7029:
7027:
7024:
7022:
7019:
7017:
7014:
7012:
7009:
7007:
7004:
7002:
6999:
6997:
6994:
6992:
6989:
6987:
6984:
6982:
6979:
6977:
6974:
6972:
6969:
6967:
6964:
6962:
6959:
6957:
6954:
6952:
6949:
6947:
6944:
6942:
6939:
6937:
6934:
6932:
6929:
6927:
6924:
6922:
6919:
6917:
6914:
6912:
6909:
6907:
6904:
6902:
6899:
6897:
6894:
6892:
6889:
6887:
6884:
6882:
6879:
6877:
6874:
6872:
6869:
6867:
6864:
6862:
6859:
6857:
6854:
6852:
6849:
6847:
6844:
6842:
6839:
6837:
6834:
6832:
6829:
6827:
6824:
6822:
6819:
6817:
6814:
6812:
6809:
6807:
6804:
6802:
6799:
6797:
6794:
6792:
6789:
6787:
6784:
6782:
6779:
6778:
6776:
6772:
6766:
6763:
6761:
6758:
6756:
6753:
6751:
6748:
6746:
6743:
6741:
6738:
6736:
6733:
6731:
6728:
6726:
6723:
6721:
6718:
6716:
6713:
6711:
6708:
6706:
6703:
6701:
6698:
6696:
6693:
6691:
6688:
6686:
6683:
6681:
6678:
6676:
6673:
6671:
6668:
6666:
6663:
6661:
6658:
6656:
6653:
6651:
6648:
6646:
6643:
6641:
6638:
6636:
6633:
6631:
6628:
6626:
6623:
6622:
6620:
6616:
6611:
6605:
6602:
6600:
6599:ISO/IEC 10367
6597:
6595:
6592:
6591:
6589:
6587:
6583:
6577:
6574:
6572:
6569:
6567:
6564:
6562:
6559:
6557:
6554:
6552:
6549:
6547:
6544:
6542:
6539:
6537:
6534:
6532:
6529:
6527:
6524:
6522:
6519:
6517:
6514:
6512:
6509:
6507:
6504:
6502:
6499:
6497:
6494:
6492:
6489:
6487:
6484:
6482:
6479:
6477:
6474:
6472:
6469:
6467:
6464:
6462:
6459:
6457:
6454:
6452:
6449:
6447:
6444:
6442:
6439:
6437:
6434:
6432:
6429:
6427:
6424:
6423:
6421:
6417:
6411:
6408:
6406:
6403:
6401:
6398:
6396:
6393:
6391:
6388:
6386:
6383:
6379:
6376:
6374:
6371:
6370:
6369:
6366:
6365:
6363:
6359:
6351:
6348:
6346:
6343:
6341:
6338:
6336:
6333:
6332:
6330:
6326:
6323:
6321:
6318:
6317:
6315:
6311:
6308:
6307:
6305:
6301:
6298:
6296:
6293:
6291:
6288:
6286:
6283:
6281:
6278:
6276:
6273:
6271:
6268:
6266:
6263:
6261:
6258:
6256:
6253:
6251:
6250:-5 (Cyrillic)
6248:
6246:
6243:
6241:
6238:
6236:
6233:
6231:
6228:
6227:
6225:
6224:
6222:
6220:
6216:
6210:
6207:
6201:
6198:
6196:
6193:
6192:
6190:
6188:
6185:
6183:
6180:
6178:
6175:
6174:
6173:
6169:
6165:
6162:
6160:
6157:
6153:
6150:
6149:
6148:
6145:
6143:
6140:
6136:
6133:
6129:
6126:
6124:
6121:
6119:
6116:
6114:
6111:
6110:
6109:
6106:
6104:
6101:
6100:
6099:
6096:
6095:
6093:
6089:
6085:
6078:
6073:
6071:
6066:
6064:
6059:
6058:
6055:
6039:
6030:
6028:
6019:
6018:
6015:
6009:
6006:
6004:
6001:
5997:
5994:
5993:
5992:
5989:
5987:
5984:
5982:
5979:
5977:
5974:
5973:
5971:
5967:
5961:
5958:
5956:
5953:
5952:
5950:
5946:
5940:
5937:
5935:
5932:
5930:
5927:
5925:
5922:
5920:
5919:Tulu Tigalari
5917:
5915:
5912:
5910:
5907:
5905:
5902:
5900:
5897:
5895:
5894:Sylheti Nagri
5892:
5890:
5887:
5885:
5884:South Arabian
5882:
5880:
5877:
5875:
5872:
5870:
5867:
5865:
5862:
5860:
5857:
5855:
5852:
5850:
5847:
5845:
5842:
5840:
5837:
5835:
5832:
5830:
5827:
5825:
5822:
5820:
5817:
5815:
5812:
5810:
5809:Old Hungarian
5807:
5805:
5802:
5800:
5797:
5795:
5792:
5790:
5787:
5785:
5782:
5780:
5777:
5775:
5772:
5770:
5767:
5765:
5762:
5760:
5757:
5755:
5752:
5750:
5747:
5745:
5742:
5740:
5737:
5735:
5732:
5730:
5727:
5724:
5721:
5719:
5716:
5714:
5711:
5709:
5706:
5704:
5701:
5699:
5696:
5694:
5691:
5689:
5686:
5684:
5681:
5679:
5676:
5674:
5671:
5669:
5666:
5664:
5661:
5659:
5656:
5654:
5651:
5649:
5646:
5644:
5641:
5639:
5636:
5634:
5631:
5629:
5626:
5624:
5621:
5619:
5616:
5614:
5611:
5609:
5606:
5604:
5601:
5599:
5596:
5594:
5591:
5589:
5586:
5584:
5581:
5579:
5576:
5574:
5571:
5569:
5566:
5565:
5563:
5557:
5551:
5548:
5546:
5543:
5541:
5538:
5536:
5533:
5531:
5528:
5526:
5523:
5521:
5518:
5516:
5513:
5511:
5508:
5506:
5503:
5501:
5498:
5496:
5493:
5491:
5488:
5486:
5483:
5481:
5478:
5476:
5473:
5471:
5468:
5466:
5463:
5461:
5458:
5456:
5453:
5451:
5448:
5446:
5443:
5441:
5438:
5436:
5433:
5431:
5428:
5426:
5423:
5421:
5418:
5416:
5413:
5411:
5408:
5406:
5403:
5401:
5398:
5396:
5393:
5391:
5388:
5386:
5383:
5381:
5378:
5376:
5373:
5371:
5368:
5366:
5363:
5361:
5358:
5356:
5353:
5351:
5348:
5346:
5343:
5341:
5338:
5336:
5333:
5331:
5330:Mende Kikakui
5328:
5326:
5325:Masaram Gondi
5323:
5321:
5318:
5316:
5313:
5311:
5310:Lisu (Fraser)
5308:
5306:
5303:
5301:
5298:
5296:
5293:
5291:
5288:
5286:
5283:
5281:
5278:
5276:
5273:
5271:
5268:
5266:
5263:
5261:
5258:
5256:
5253:
5251:
5248:
5246:
5243:
5241:
5238:
5236:
5233:
5231:
5228:
5226:
5223:
5221:
5218:
5216:
5213:
5211:
5210:Gunjala Gondi
5208:
5206:
5203:
5201:
5198:
5196:
5193:
5191:
5188:
5186:
5183:
5181:
5178:
5176:
5173:
5171:
5168:
5166:
5163:
5161:
5158:
5156:
5153:
5151:
5148:
5146:
5143:
5141:
5138:
5136:
5133:
5131:
5128:
5126:
5123:
5121:
5118:
5116:
5113:
5111:
5108:
5106:
5103:
5101:
5098:
5096:
5093:
5091:
5088:
5087:
5085:
5081:
5075:
5072:
5070:
5067:
5065:
5062:
5060:
5057:
5055:
5052:
5051:
5049:
5047:
5041:
5036:
5031:
5027:
5021:
5018:
5016:
5013:
5011:
5008:
5006:
5003:
5001:
4998:
4996:
4993:
4992:
4990:
4986:
4980:
4977:
4975:
4972:
4970:
4967:
4965:
4962:
4960:
4957:
4956:
4954:
4950:
4944:
4941:
4939:
4936:
4932:
4929:
4927:
4924:
4923:
4922:
4919:
4917:
4914:
4912:
4909:
4907:
4904:
4903:
4901:
4897:
4891:
4888:
4886:
4883:
4881:
4878:
4876:
4873:
4869:
4866:
4865:
4864:
4861:
4859:
4856:
4854:
4851:
4849:
4846:
4844:
4841:
4839:
4836:
4835:
4833:
4827:
4817:
4814:
4812:
4809:
4807:
4804:
4802:
4799:
4797:
4794:
4792:
4789:
4787:
4784:
4782:
4779:
4777:
4774:
4772:
4769:
4768:
4766:
4764:
4760:
4754:
4751:
4749:
4746:
4744:
4741:
4737:
4736:ISO/IEC 14651
4734:
4733:
4732:
4729:
4727:
4724:
4723:
4721:
4717:
4714:
4710:
4700:
4697:
4695:
4692:
4690:
4687:
4685:
4682:
4680:
4677:
4675:
4672:
4670:
4667:
4665:
4662:
4660:
4657:
4655:
4652:
4650:
4647:
4646:
4644:
4640:
4634:
4631:
4629:
4626:
4624:
4621:
4619:
4616:
4614:
4611:
4609:
4606:
4604:
4600:
4597:
4595:
4592:
4590:
4587:
4586:
4584:
4582:
4578:
4575:
4571:
4565:
4562:
4560:
4557:
4555:
4552:
4550:
4547:
4545:
4542:
4538:
4535:
4534:
4533:
4530:
4529:
4527:
4525:
4521:
4515:
4512:
4510:
4507:
4505:
4502:
4501:
4499:
4495:
4491:
4484:
4479:
4477:
4472:
4470:
4465:
4464:
4461:
4444:
4440:
4436:
4427:
4424:
4419:
4414:
4411:
4408:
4407:
4403:
4398:
4387:
4380:
4376:
4371:
4345:
4340:
4337:
4295:
4290:
4287:
4242:
4239:
4194:
4191:
4187:
4160:
4147:
4140:
4134:
4131:
4129:
4125:
4121:
4117:
4113:
4109:
4105:
4101:
4097:
4093:
4089:
4085:
4081:
4077:
4073:
4069:
4065:
4061:
4057:
4053:
4049:
4044:
4043:
4040:
4037:
4035:
4032:
4029:
4026:
4024:
4021:
4019:
4016:
4015:
4010:
4006:
4002:
3995:
3990:
3988:
3983:
3981:
3976:
3975:
3972:
3961:
3957:
3950:
3942:
3938:
3932:
3924:
3920:
3914:
3906:
3902:
3896:
3888:
3884:
3878:
3876:
3868:
3867:4-06-208718-9
3864:
3858:
3850:
3846:
3839:
3830:
3822:
3816:
3808:
3802:
3791:
3784:
3776:
3770:
3762:
3756:
3754:
3745:
3741:
3735:
3733:
3724:
3720:
3714:
3707:
3702:
3688:on 2013-12-16
3687:
3683:
3682:
3677:
3671:
3663:
3659:
3653:
3645:
3638:
3630:
3626:
3620:
3612:
3608:
3602:
3594:
3590:
3584:
3582:
3573:
3569:
3563:
3561:
3553:. 2023-09-01.
3552:
3548:
3542:
3538:
3521:
3514:
3510:
3506:
3500:
3496:
3486:
3483:
3480:
3477:
3474:
3471:
3468:
3465:
3464:
3458:
3456:
3452:
3448:
3443:
3435:
3430:
3420:
3413:
3410:
3408:(1F200–1F2FF)
3407:
3404:
3401:
3398:
3396:(2F800–2FA1F)
3395:
3392:
3389:
3386:
3383:
3380:
3377:
3374:
3373:
3372:
3366:
3363:
3360:
3357:
3354:
3351:
3348:
3345:
3344:
3343:
3337:
3334:
3332:(2EBF0–2EE5F)
3331:
3328:
3326:(31350–323AF)
3325:
3322:
3320:(30000–3134F)
3319:
3316:
3314:(2CEB0–2EBEF)
3313:
3310:
3308:(2B820–2CEAF)
3307:
3304:
3302:(2B740–2B81F)
3301:
3298:
3296:(2A700–2B73F)
3295:
3292:
3290:(20000–2A6DF)
3289:
3286:
3283:
3280:
3277:
3274:
3273:
3272:
3268:
3258:
3253:
3238:
3234:
3229:
3225:
3223:
3210:
3197:
3184:
3183:
3179:
3177:
3164:
3151:
3138:
3137:
3133:
3120:
3118:
3105:
3092:
3091:
3087:
3085:
3083:
3070:
3057:
3056:
3052:
3050:
3048:
3035:
3022:
3021:
3017:
3015:
3002:
2989:
2976:
2975:
2971:
2969:
2967:
2954:
2941:
2940:
2936:
2934:
2932:
2919:
2906:
2905:
2901:
2899:
2897:
2884:
2871:
2870:
2866:
2864:
2862:
2849:
2836:
2835:
2831:
2829:
2816:
2803:
2790:
2789:
2785:
2772:
2759:
2746:
2733:
2732:
2728:
2715:
2702:
2689:
2676:
2675:
2671:
2669:
2667:
2654:
2641:
2640:
2636:
2633:
2630:
2627:
2624:
2623:
2620:
2588:
2539:
2517:
2513:
2474:
2467:(U+7D05) and
2443:(U+7D05) and
2417:
2388:
2384:
2351:
2314:(U+5167) and
2305:
2296:
2273:
2272:
2268:
2245:
2244:
2240:
2217:
2216:
2212:
2189:
2188:
2184:
2161:
2160:
2156:
2133:
2132:
2128:
2105:
2104:
2100:
2077:
2076:
2072:
2049:
2048:
2044:
2021:
2020:
2016:
1993:
1992:
1988:
1965:
1964:
1960:
1937:
1936:
1932:
1909:
1908:
1904:
1881:
1880:
1876:
1853:
1852:
1848:
1825:
1824:
1820:
1797:
1796:
1792:
1769:
1768:
1764:
1741:
1740:
1736:
1713:
1712:
1708:
1685:
1684:
1680:
1657:
1656:
1652:
1629:
1628:
1624:
1601:
1600:
1597:exempt/spare
1596:
1573:
1572:
1568:
1545:
1544:
1540:
1516:
1512:
1511:
1505:
1500:
1495:
1490:
1485:
1480:
1479:
1472:
1469:
1466:
1462:(traditional,
1459:
1454:
1452:(simplified)
1449:
1445:
1442:
1440:
1436:
1432:
1428:
1424:
1420:
1416:
1412:
1398:
1394:
1387:(U+76F4) and
1375:(U+4FA3) and
1355:
1351:
1332:(U+5104) and
1311:
1296:
1280:ANGSTROM SIGN
1249:
1245:
1236:(U+FA23) and
1228:
1227:was written.
1185:(U+500B) and
1148:(U+4E1F) and
1140:
1138:
1132:
1128:
1101:
1097:
1086:
1083:
1082:Masayoshi Son
1079:
1075:
1070:
1066:
1062:
1057:
1056:) and so on.
1055:
1051:
1047:
1043:
1039:
1035:
1031:
1027:
1023:
1019:
1015:
1011:
1007:
1003:
999:
995:
991:
987:
984:
980:
976:
969:
965:
962:
960:
957:
954:
951:
950:
949:
944:
943:
939:
937:
934:
932:
929:
927:
924:
923:
922:
914:
899:
895:
891:
887:
883:
879:
874:
870:
844:
840:
835:
833:
828:
824:
820:
818:
815:
811:
808:
804:
801:
795:
786:
784:
779:
775:
773:
769:
765:
759:
757:
753:
749:
745:
741:
705:
699:
697:
691:
689:
680:
675:
670:
660:
658:
654:
644:
641:In 1993, the
639:
637:
631:
628:
622:
616:
609:
604:
601:
599:
595:
590:
588:
577:
574:
566:
563:November 2020
556:
552:
548:
542:
539:This section
537:
528:
527:
518:
515:
507:
497:
493:
487:
486:
481:This section
479:
475:
470:
469:
461:
454:
445:
441:
436:
434:
430:
426:
422:
418:
414:
409:
407:
403:
399:
395:
391:
387:
383:
379:
375:
371:
367:
363:
359:
355:
351:
343:
339:
335:
333:
323:
319:
314:
304:
301:
286:
283:
275:
272:February 2024
265:
261:
257:
251:
250:
245:This article
243:
234:
233:
224:
221:
213:
210:December 2020
203:
199:
195:
189:
186:This article
184:
175:
174:
165:
162:
154:
151:February 2010
143:
140:
136:
133:
129:
126:
122:
119:
115:
112: –
111:
107:
106:Find sources:
100:
96:
90:
89:
84:This article
82:
78:
73:
72:
67:
65:
58:
57:
52:
51:
46:
41:
32:
31:
26:
22:
8044:
7997:ISO/IEC 6429
7954:Stanford/ITS
7941:
7875:ARIB STD-B24
7656:Sega SC-3000
7557:DEC RADIX 50
6594:ISO/IEC 8859
6586:ISO/IEC 2022
6331:Adaptations
6290:-14 (Celtic)
6285:-13 (Baltic)
6275:-10 (Nordic)
6270:-9 (Turkish)
6219:ISO/IEC 8859
5774:Meetei Mayek
5725:(Chorasmian)
5628:Cypro-Minoan
5405:Pahawh Hmong
5220:Gurung Khema
5019:
4969:ISO/IEC 8859
4811:UTF-32/UCS-4
4806:UTF-16/UCS-2
4613:Variant form
4443:
4342:
4292:
4244:
4196:
4136:
4046:
4033:
3959:
3949:
3940:
3931:
3922:
3913:
3904:
3895:
3886:
3869:)pp. 285-294
3857:
3848:
3838:
3829:
3801:
3790:"Unicode 88"
3783:
3769:
3743:
3722:
3713:
3701:
3690:. Retrieved
3686:the original
3679:
3670:
3661:
3652:
3637:
3628:
3619:
3610:
3601:
3592:
3571:
3568:"Unihan.zip"
3541:
3520:
3511:and Chinese
3499:
3473:Sinicization
3444:
3441:
3432:
3417:
3370:
3341:
3270:
3255:
3232:
3226:to research
2628:Traditional
2545:
2518:
2514:
2475:
2419:The radical
2418:
2389:
2385:
2352:
2344:(U+5165) or
2332:(U+5165) or
2308:variants of
2306:
2302:
1404:
1395:
1356:
1352:
1312:
1297:
1229:
1141:
1133:
1129:
1103:versions of
1092:
1058:
972:
953:ISO/IEC 2022
947:
941:
920:
917:Alternatives
875:
871:
836:
829:
825:
821:
807:Greek letter
800:Latin letter
796:
792:
780:
776:
760:
709:
695:
693:
690:) cautions:
684:
640:
632:
611:
606:
602:
591:
584:
569:
560:
547:spinning off
540:
510:
501:
490:Please help
485:verification
482:
437:
432:
429:orthographic
410:
349:
348:
329:
296:
278:
269:
246:
216:
207:
194:spinning off
187:
157:
148:
138:
131:
124:
117:
105:
93:Please help
88:verification
85:
61:
54:
48:
47:Please help
44:
7716:ZX Spectrum
7671:Sinclair QL
7507:Amstrad CPC
7426:8-bit Greek
7353:terminals (
7066:Iran System
6618:("scripts")
6265:-8 (Hebrew)
6255:-6 (Arabic)
6152:ISO/IEC 646
5960:SignWriting
5829:Old Sogdian
5799:Nandinagari
5723:Khwarezmian
5633:Dives Akuru
5559:Ancient and
5545:Warang Citi
5410:Pau Cin Hau
5365:New Tai Lue
5360:Nag Mundari
5335:Medefaidrin
5044:Common and
4853:Equivalence
4831:code points
4829:On pairs of
4743:Equivalence
4618:Word joiner
4608:Soft hyphen
4524:Code points
4335:Not unified
4333:Not unified
4331:Not unified
4327:Not unified
4325:Not unified
4323:Not unified
4321:Not unified
4319:Not unified
4317:Not unified
4315:Not unified
4104:CJK Strokes
4027:Chart range
3455:MIT License
3414:(2F00–2FDF)
3402:(3200–32FF)
3390:(F900–FAFF)
3384:(FE30–FE4F)
3378:(3300–33FF)
3367:(2FF0–2FFF)
3361:(3000–303F)
3355:(31C0–31EF)
3353:CJK Strokes
3349:(2E80–2EFF)
3284:(3400–4DBF)
2832:give birth
2625:Simplified
1737:knife edge
1473:Vietnamese
1464:Hong Kong)
1447:Code point
1419:traditional
1010:Common Lisp
988:, and many
772:dotless "ı"
748:punctuation
504:August 2007
444:Traditional
440:code points
8094:Categories
8002:JIS X 0211
7910:ISO-IR-169
7763:UTF-EBCDIC
7329:code pages
7056:CSX+ Indic
6660:Devanagari
6615:Code pages
6536:LST 1590-4
6506:JIS X 0213
6501:JIS X 0212
6496:JIS X 0208
6491:JIS X 0201
6456:GOST 10859
6378:CCCII/EACC
6280:-11 (Thai)
6260:-7 (Greek)
6195:background
6118:Wabun/Kana
5854:Phoenician
5839:Old Uyghur
5834:Old Turkic
5819:Old Permic
5814:Old Italic
5764:Manichaean
5658:Glagolitic
5435:Saurashtra
5180:Devanagari
5059:Diacritics
4816:UTF-EBCDIC
4719:Algorithms
4712:Processing
4649:Characters
4573:Characters
4377:, Common,
4030:Characters
4018:Block name
3692:2023-09-30
3533:References
2867:companion
2729:two, both
2213:edge/horn
2101:empty/air
1653:all/total
1492:zh-Hant-HK
1431:Vietnamese
1415:simplified
1393:(U+96C7).
1137:z-variants
1098:Japanese,
857:) or two (
756:apostrophe
752:diacritics
551:relocating
460:(U+4E2A).
453:Simplified
442:, such as
421:allographs
402:Vietnamese
374:characters
318:code point
256:improve it
198:relocating
121:newspapers
50:improve it
8055:MICR code
7890:IEC-P27-1
7868:ISO-IR-68
7773:DIN 91379
7651:SAM Coupé
7586:GSM 03.38
7576:Galaksija
7071:Kamenický
7051:CSX Indic
6760:Ukrainian
6546:Shift JIS
6526:KS X 1002
6521:KS X 1001
6446:DIN 66003
6441:CNS 11643
6209:Transcode
6187:ITU T.101
6113:Non-Latin
5849:ʼPhags-pa
5844:Palmyrene
5794:Nabataean
5718:Khudawadi
5703:Kharosthi
5618:Cuneiform
5593:Bhaiksuki
5588:Bassa Vah
5455:Sundanese
5430:Samaritan
5345:Mongolian
5320:Malayalam
5285:Kirat Rai
4995:Anomalies
4979:ISO 15924
4974:DIN 91379
4875:Z-variant
4858:Homoglyph
4731:Collation
4379:Inherited
3525:literate.
3479:Z-variant
3180:tortoise
2972:to leave
2631:Japanese
2587:shinjitai
2377:with the
1467:Japanese
1100:Shinjitai
1042:Uniscribe
990:Unix-like
890:Singapore
669:Allograph
594:ideograms
413:typefaces
260:verifying
56:talk page
8060:Mojibake
7915:ISO 2033
7880:Fieldata
7858:ASMO 449
7768:GB 18030
7728: /
7676:Teletext
7666:Sharp MZ
7596:HP FOCAL
7591:HP Roman
7522:Atari ST
7512:Apple II
7046:CS Indic
6740:Romanian
6715:Keyboard
6695:Gurmukhi
6690:Gujarati
6680:Georgian
6655:Cyrillic
6650:Croatian
6625:Armenian
6531:LST 1564
6516:KPS 9566
6476:GB 18030
6471:GB 12052
6466:GB 12345
6451:ELOT 927
6385:ISO 5426
6345:Estonian
6182:ITU T.61
6172:Teletext
6168:Videotex
6142:Fieldata
6128:Cyrillic
5981:Currency
5955:Duployan
5929:Vithkuqi
5924:Ugaritic
5779:Meroitic
5749:Mahajani
5734:Linear B
5729:Linear A
5520:Tifinagh
5485:Tai Viet
5480:Tai Tham
5470:Tagbanwa
5385:Ol Chiki
5275:Kayah Li
5270:Katakana
5255:Javanese
5250:Hiragana
5240:Hanunuoo
5215:Gurmukhi
5205:Gujarati
5195:Georgian
5170:Cyrillic
5160:Cherokee
5125:Bopomofo
5105:Balinese
5100:Armenian
4964:GB 18030
4781:Punycode
4669:Numerals
4601: /
4514:Versions
4399:, Common
4397:Hiragana
4388:, Common
4386:Katakana
4384:Hangul,
3815:cite web
3467:GB 18030
3461:See also
2902:to cash
2786:to ride
2672:to lose
2637:English
1961:picture
1821:feeling
1793:outside
1476:English
1460:Chinese
1455:Chinese
1450:Chinese
1423:Japanese
1277:Å
1096:Kyūjitai
1054:OpenType
1050:TrueType
1034:Graphite
894:Malaysia
843:typeface
732:◌̊
701:—
696:grapheme
657:20985671
386:Japanese
356:and the
8115:Unicode
7949:SEASCII
7943:Mojikyō
7930:KOI8-RU
7853:ABICOMP
7726:Unicode
7636:PETSCII
7626:NEC APC
7562:DEC MCS
7517:ATASCII
7414:Swedish
7399:Finnish
7384:Spanish
7076:Mazovia
7041:ABICOMP
6750:Turkish
6705:Iceland
6613:Mac OS
6556:TIS-620
6461:GB 2312
6436:BraSCII
6426:ArmSCII
6164:Teletex
6123:Chinese
5889:Soyombo
5879:Sogdian
5874:Siddham
5869:Sharada
5789:Multani
5769:Marchen
5759:Mandaic
5754:Makasar
5668:Grantha
5653:Elymaic
5648:Elbasan
5623:Cypriot
5583:Avestan
5525:Tirhuta
5515:Tibetan
5460:Sunuwar
5445:Sinhala
5440:Shavian
5420:Ranjana
5400:Osmanya
5390:Ol Onal
5315:Lontara
5265:Kannada
5175:Deseret
5140:Burmese
5130:Braille
5120:Bengali
5074:Numbers
5035:Scripts
4684:Symbols
4674:Scripts
4497:Unicode
4490:Unicode
4313:Unified
4311:Unified
4309:Unified
4307:Unified
4305:Unified
4303:Unified
4301:Unified
4299:Unified
4297:Unified
4294:Unified
4005:Unicode
3233:Sources
3088:hungry
2937:inside
2506:, i.e.
2274:U+9AA8
2269:employ
2246:U+96C7
2218:U+9053
2190:U+89D2
2162:U+8525
2134:U+8349
2106:U+8005
2078:U+7A7A
2050:U+795E
2022:U+793a
1994:U+771F
1966:U+76F4
1938:U+753B
1910:U+6D77
1882:U+6B21
1854:U+62B5
1849:talent
1826:U+624D
1798:U+60C5
1770:U+5916
1742:U+5316
1714:U+5203
1686:U+5177
1658:U+5173
1630:U+5168
1602:U+5165
1574:U+514D
1546:U+4EE4
1507:vi-Hani
1487:zh-Hant
1482:zh-Hans
1470:Korean
1435:browser
1411:Chinese
1061:(B)TRON
975:Unicode
942:Mojikyō
812:or the
768:the dot
740:sememes
406:chữ Hán
378:Chinese
364:of the
354:Unicode
254:Please
135:scholar
7959:Symbol
7935:KOI8-U
7925:KOI8-R
7793:TACE16
7783:CESU-8
7778:BOCU-1
7758:UTF-32
7753:UTF-16
7696:WISCII
7686:TRS-80
7606:SQUOZE
7601:HP RPL
7441:Hebrew
7436:SI 960
7404:French
7327:EBCDIC
7217:CER-GS
6700:Hebrew
6675:Gaelic
6640:Celtic
6630:Arabic
6576:YUSCII
6566:VISCII
6551:SI 960
6541:PASCII
6390:5426-2
6368:MARC-8
6103:Needle
6036:
6025:
5934:Yezidi
5914:Todhri
5909:Tangut
5744:Lydian
5739:Lycian
5713:Khojki
5693:Kaithi
5673:Hatran
5663:Gothic
5613:Coptic
5603:Carian
5598:Brāhmī
5540:Wancho
5505:Thaana
5500:Telugu
5495:Tangsa
5475:Tai Le
5465:Syriac
5425:Rejang
5300:Lepcha
5245:Hebrew
5225:Hangul
5150:Chakma
5095:Arabic
5069:Spaces
4776:CESU-8
4771:BOCU-1
4679:Spaces
4428:
4425:
4421:99,737
4412:
4409:Totals
4394:Common
4382:Common
4375:Hangul
4370:Common
4249:42,720
4245:20,992
3865:
3513:CEDICT
3212:U+7814
3199:U+784F
3186:U+7814
3166:U+4E80
3153:U+9F9C
3140:U+9F9F
3122:U+9AD9
3107:U+9AD8
3094:U+9AD8
3072:U+9913
3059:U+997F
3053:taxes
3037:U+7A05
3024:U+7A0E
3004:U+7985
2991:U+79AA
2978:U+7985
2956:U+5225
2943:U+522B
2921:U+5167
2908:U+5185
2886:U+514C
2873:U+5151
2851:U+4FB6
2838:U+4FA3
2818:U+7523
2805:U+7522
2792:U+4EA7
2774:U+6909
2761:U+4E57
2748:U+4E58
2735:U+4E58
2717:U+34B3
2704:U+4E21
2691:U+5169
2678:U+4E24
2656:U+4E1F
2643:U+4E22
2185:onion
2157:grass
1625:enter
1433:. The
1427:Korean
1290:Å
1287:
1285:U+00C5
1274:
1272:U+212B
1044:, and
1038:Scribe
998:Python
892:, and
839:glyphs
728:
726:U+030A
718:a
715:
713:U+0061
655:
433:Unihan
425:glyphs
400:) and
394:Korean
137:
130:
123:
116:
108:
8030:CCSID
7903:8-bit
7898:7-bit
7894:INIS
7748:UTF-8
7743:UTF-7
7738:UTF-1
7616:LMBCS
7552:CP/M+
7394:Dutch
7379:Swiss
7061:CWI-2
6765:VT100
6735:Roman
6730:Ogham
6710:Inuit
6685:Greek
6571:VSCII
6561:TSCII
6511:KOI-7
6486:ISCII
6481:HKSCS
6373:ANSEL
6335:Welsh
6159:BCDIC
6147:ASCII
6108:Morse
6008:Emoji
5904:Takri
5864:Runic
5804:Ogham
5638:Dogra
5490:Tamil
5395:Osage
5370:Nüshu
5305:Limbu
5295:Latin
5280:Khmer
5260:Kanji
5235:Hanja
5200:Greek
5190:Geʽez
5185:Garay
5135:Buhid
5115:Batak
5110:Bamum
5090:Adlam
4938:Input
4916:Fonts
4911:Email
4899:Usage
4801:UTF-8
4796:UTF-7
4791:UTF-1
4642:Lists
4559:Plane
4532:Block
4373:Han,
4261:4,192
4259:4,939
4257:7,473
4255:5,762
4251:4,154
4247:6,592
4189:2 SIP
4182:0 BMP
4180:0 BMP
4178:0 BMP
4176:0 BMP
4174:0 BMP
4172:0 BMP
4170:0 BMP
4168:0 BMP
4166:0 BMP
4164:2 SIP
4162:3 TIP
4155:2 SIP
4153:2 SIP
4151:2 SIP
4149:2 SIP
4142:0 BMP
4023:Plane
3919:"URO"
3793:(PDF)
3509:EDICT
3491:Notes
3134:high
2297:bone
2045:show
2017:true
1709:tool
1439:glyph
1429:, or
1046:ATSUI
1030:Pango
986:macOS
983:Apple
968:HKSCS
880:with
744:ASCII
624:, or
398:hanja
390:kanji
382:hanzi
142:JSTOR
128:books
7964:TRON
7817:Cork
7788:SCSU
7711:ZX81
7706:ZX80
7701:XCCS
7631:NeXT
7611:LICS
7566:NRCS
7527:BICS
7497:1058
7492:1057
7487:1056
7482:1055
7477:1054
7472:1053
7467:1052
7341:DKOI
7297:1270
7292:1258
7287:1257
7282:1256
7277:1255
7272:1254
7267:1253
7262:1252
7257:1251
7252:1250
7242:1169
7199:1133
7194:1124
7189:1046
7184:1019
7179:1018
7174:1017
7169:1016
7164:1015
7159:1014
7154:1013
7149:1012
7144:1010
7139:1009
7134:1008
7129:1006
7036:3846
7031:1127
7026:1118
7021:1117
7016:1116
7011:1115
7006:1098
7001:1044
6996:1043
6991:1042
6986:1040
6981:1034
6745:Sámi
6431:Big5
6410:6862
6405:6438
6400:5428
6395:5427
6325:Sámi
6200:sets
6166:and
5784:Modi
5698:Kawi
5568:Ahom
5530:Toto
5510:Thai
5380:Odia
5355:N'Ko
5155:Cham
4921:HTML
4868:list
4786:SCSU
4537:List
3863:ISBN
3821:link
3451:LGPL
2559:lang
2548:lang
2073:god
1933:sea
1541:now
1517:4ECA
1407:lang
1363:and
1344:and
1320:and
1172:and
1160:and
1052:and
1006:Java
994:Perl
964:GCCS
936:TRON
898:Big5
750:and
653:JPNO
114:news
7920:KOI
7837:OT1
7832:OMS
7827:OML
7822:LY1
7808:TeX
7621:MSX
7581:GEM
7537:CDC
7355:VTx
7351:DEC
7237:950
7231:GBK
7227:936
7222:932
7124:922
7119:921
7114:915
7109:912
7104:896
7099:895
7081:MIK
6976:951
6971:950
6966:949
6961:942
6956:936
6951:932
6946:904
6941:903
6936:899
6931:897
6926:869
6921:868
6916:867
6911:866
6906:865
6901:864
6896:863
6891:862
6886:861
6881:860
6876:859
6871:858
6866:857
6861:856
6856:855
6851:853
6846:852
6841:851
6836:850
6831:778
6826:777
6821:776
6816:775
6811:773
6806:770
6801:737
6796:720
6791:708
6786:668
6781:437
5535:Vai
5350:Mru
5290:Lao
4589:BOM
4401:Han
4392:Han
4367:Han
4365:Han
4363:Han
4361:Han
4359:Han
4357:Han
4355:Han
4353:Han
4351:Han
4349:Han
4347:Han
4344:Han
4285:542
4279:472
4277:256
4275:255
4267:214
4265:115
4263:622
4253:222
4186:SMP
4159:TIP
4146:SIP
4139:BMP
4003:in
3681:IBM
2592:氣/気
1421:),
1022:C++
1014:APL
549:or
494:by
408:).
392:),
384:),
370:CJK
258:by
196:or
97:by
8096::
7885:HZ
5550:Yi
4416:21
4283:64
4281:32
4273:39
4271:64
4269:16
4184:1
4157:3
4144:2
4137:0
3958:.
3939:.
3921:.
3903:.
3885:.
3874:^
3847:.
3817:}}
3813:{{
3752:^
3742:.
3731:^
3721:.
3678:.
3660:.
3627:.
3609:.
3591:.
3580:^
3570:.
3559:^
3549:.
3457:.
3235::
2607:,
2289:骨
2286:骨
2283:骨
2280:骨
2277:骨
2261:雇
2258:雇
2255:雇
2252:雇
2249:雇
2233:道
2230:道
2227:道
2224:道
2221:道
2205:角
2202:角
2199:角
2196:角
2193:角
2177:蔥
2174:蔥
2171:蔥
2168:蔥
2165:蔥
2149:草
2146:草
2143:草
2140:草
2137:草
2121:者
2118:者
2115:者
2112:者
2109:者
2093:空
2090:空
2087:空
2084:空
2081:空
2065:神
2062:神
2059:神
2056:神
2053:神
2037:示
2034:示
2031:示
2028:示
2025:示
2009:眞
2006:真
2003:真
2000:真
1997:真
1981:直
1978:直
1975:直
1972:直
1969:直
1953:画
1950:画
1947:画
1944:画
1941:画
1925:海
1922:海
1919:海
1916:海
1913:海
1897:次
1894:次
1891:次
1888:次
1885:次
1869:抵
1866:抵
1863:抵
1860:抵
1857:抵
1841:才
1838:才
1835:才
1832:才
1829:才
1813:情
1810:情
1807:情
1804:情
1801:情
1785:外
1782:外
1779:外
1776:外
1773:外
1757:化
1754:化
1751:化
1748:化
1745:化
1729:刃
1726:刃
1723:刃
1720:刃
1717:刃
1701:具
1698:具
1695:具
1692:具
1689:具
1673:关
1670:关
1667:关
1664:关
1661:关
1645:全
1642:全
1639:全
1636:全
1633:全
1617:入
1614:入
1611:入
1608:入
1605:入
1589:免
1586:免
1583:免
1580:免
1577:免
1561:令
1558:令
1555:令
1552:令
1549:令
1533:今
1530:今
1527:今
1524:今
1521:今
1515:U+
1502:ko
1497:ja
1425:,
1239:𧺯
1121:,
1115:,
1109:,
1040:,
1036:,
1032:,
1020:,
1016:,
1012:,
1008:,
1004:,
1002:C#
1000:,
996:,
981:,
888:,
863:,
851:,
774:.
698:.
618:,
435:.
59:.
7564:/
7357:)
7233:)
7229:(
6170:/
6076:e
6069:t
6062:v
4482:e
4475:t
4468:v
3993:e
3986:t
3979:v
3962:.
3943:.
3925:.
3907:.
3889:.
3851:.
3823:)
3795:.
3725:.
3695:.
3646:.
3218:研
3205:硏
3192:研
3172:亀
3159:龜
3146:龟
3128:髙
3113:高
3100:高
3078:餓
3065:饿
3043:稅
3030:税
3010:禅
2997:禪
2984:禅
2962:別
2949:别
2927:內
2914:内
2892:兌
2879:兑
2857:侶
2844:侣
2824:産
2811:產
2798:产
2780:椉
2767:乗
2754:乘
2741:乘
2723:㒳
2710:両
2697:兩
2684:两
2662:丟
2649:丢
2616:语
2613:/
2610:語
2604:红
2601:/
2598:紅
2582:兑
2579:/
2576:兌
2570:内
2564:內
2554:入
2535:令
2529:者
2523:骨
2509:草
2503:⺾
2497:⺿
2491:草
2485:草
2479:艸
2470:红
2464:紅
2458:红
2452:糸
2446:红
2440:紅
2434:红
2431:/
2428:紅
2422:糸
2413:海
2406:海
2399:海
2393:海
2380:入
2374:全
2368:全
2362:内
2359:/
2356:內
2347:人
2341:入
2335:人
2329:入
2323:全
2317:内
2311:內
2293:骨
2265:雇
2237:道
2209:角
2181:蔥
2153:草
2125:者
2097:空
2069:神
2041:示
2013:真
1985:直
1957:画
1929:海
1901:次
1873:抵
1845:才
1817:情
1789:外
1761:化
1733:刃
1705:具
1677:关
1649:全
1621:入
1593:免
1565:令
1537:今
1413:(
1390:雇
1384:直
1378:侶
1372:侣
1366:仏
1360:佛
1347:亿
1341:億
1335:亿
1329:億
1323:丢
1317:丟
1307:漢
1301:漢
1265:車
1259:車
1253:漢
1233:﨣
1224:亀
1218:龜
1212:亀
1206:龜
1200:龜
1194:亀
1188:个
1182:個
1175:両
1169:兩
1163:仏
1157:佛
1151:丢
1145:丟
1124:兔
1118:兩
1112:別
1106:直
1018:C
909:丢
903:丟
866:g
860:a
854:ɡ
848:ɑ
817:А
810:Α
803:A
679:a
627:壱
621:壹
615:一
576:)
570:(
565:)
561:(
557:.
543:.
517:)
511:(
506:)
502:(
488:.
457:个
448:個
404:(
396:(
388:(
380:(
344:.
334:.
303:)
297:(
285:)
279:(
274:)
270:(
252:.
223:)
217:(
212:)
208:(
204:.
190:.
164:)
158:(
153:)
149:(
139:·
132:·
125:·
118:·
91:.
66:)
62:(
27:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.