[6a3a178] | 1 | ### Javascript porting of Markus Kuhn's wcwidth() implementation
|
---|
| 2 |
|
---|
| 3 | The following explanation comes from the original C implementation:
|
---|
| 4 |
|
---|
| 5 | This is an implementation of wcwidth() and wcswidth() (defined in
|
---|
| 6 | IEEE Std 1002.1-2001) for Unicode.
|
---|
| 7 |
|
---|
| 8 | http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
|
---|
| 9 | http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
|
---|
| 10 |
|
---|
| 11 | In fixed-width output devices, Latin characters all occupy a single
|
---|
| 12 | "cell" position of equal width, whereas ideographic CJK characters
|
---|
| 13 | occupy two such cells. Interoperability between terminal-line
|
---|
| 14 | applications and (teletype-style) character terminals using the
|
---|
| 15 | UTF-8 encoding requires agreement on which character should advance
|
---|
| 16 | the cursor by how many cell positions. No established formal
|
---|
| 17 | standards exist at present on which Unicode character shall occupy
|
---|
| 18 | how many cell positions on character terminals. These routines are
|
---|
| 19 | a first attempt of defining such behavior based on simple rules
|
---|
| 20 | applied to data provided by the Unicode Consortium.
|
---|
| 21 |
|
---|
| 22 | For some graphical characters, the Unicode standard explicitly
|
---|
| 23 | defines a character-cell width via the definition of the East Asian
|
---|
| 24 | FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
|
---|
| 25 | In all these cases, there is no ambiguity about which width a
|
---|
| 26 | terminal shall use. For characters in the East Asian Ambiguous (A)
|
---|
| 27 | class, the width choice depends purely on a preference of backward
|
---|
| 28 | compatibility with either historic CJK or Western practice.
|
---|
| 29 | Choosing single-width for these characters is easy to justify as
|
---|
| 30 | the appropriate long-term solution, as the CJK practice of
|
---|
| 31 | displaying these characters as double-width comes from historic
|
---|
| 32 | implementation simplicity (8-bit encoded characters were displayed
|
---|
| 33 | single-width and 16-bit ones double-width, even for Greek,
|
---|
| 34 | Cyrillic, etc.) and not any typographic considerations.
|
---|
| 35 |
|
---|
| 36 | Much less clear is the choice of width for the Not East Asian
|
---|
| 37 | (Neutral) class. Existing practice does not dictate a width for any
|
---|
| 38 | of these characters. It would nevertheless make sense
|
---|
| 39 | typographically to allocate two character cells to characters such
|
---|
| 40 | as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
|
---|
| 41 | represented adequately with a single-width glyph. The following
|
---|
| 42 | routines at present merely assign a single-cell width to all
|
---|
| 43 | neutral characters, in the interest of simplicity. This is not
|
---|
| 44 | entirely satisfactory and should be reconsidered before
|
---|
| 45 | establishing a formal standard in this area. At the moment, the
|
---|
| 46 | decision which Not East Asian (Neutral) characters should be
|
---|
| 47 | represented by double-width glyphs cannot yet be answered by
|
---|
| 48 | applying a simple rule from the Unicode database content. Setting
|
---|
| 49 | up a proper standard for the behavior of UTF-8 character terminals
|
---|
| 50 | will require a careful analysis not only of each Unicode character,
|
---|
| 51 | but also of each presentation form, something the author of these
|
---|
| 52 | routines has avoided to do so far.
|
---|
| 53 |
|
---|
| 54 | http://www.unicode.org/unicode/reports/tr11/
|
---|
| 55 |
|
---|
| 56 | Markus Kuhn -- 2007-05-26 (Unicode 5.0)
|
---|
| 57 |
|
---|
| 58 | Permission to use, copy, modify, and distribute this software
|
---|
| 59 | for any purpose and without fee is hereby granted. The author
|
---|
| 60 | disclaims all warranties with regard to this software.
|
---|
| 61 |
|
---|
| 62 | Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
|
---|
| 63 |
|
---|
| 64 |
|
---|
| 65 |
|
---|