code point
来源:互联网 发布:mac的手绘软件 编辑:程序博客网 时间:2024/05/16 15:56
一个完整的Unicode字符叫CodePoint
一个Java char 叫代码单元code unit;
The Unicode standard was originally designed as a fixed-width 16-bit character
encoding. It has since been changed to allow for characters whose representa-
tion requires more than 16 bits. The range of legal code points is now U+0000 to
U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are
greater than U+FFFF are called supplementary characters. To represent the complete
range of characters using only 16-bit units, the Unicode standard defines an
encoding called UTF-16. In this encoding, supplementary characters are represented
as pairs of 16-bit code units, the first from the high-surrogates range,
(U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to
U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points
and UTF-16 code units are the same.
The Java programming language represents text in sequences of 16-bit code
units, using the UTF-16 encoding. A few APIs, primarily in the Character class,
use 32-bit integers to represent code points as individual entities. The Java platform
provides methods to convert between the two representations.
(From JLS-3.0)
int 值表示所有 Unicode 代码点,包括增补代码点。int 的 21 个低位(最低有效位)用于表示 Unicode 代码点,并且 11 个高位(最高有效位)必须为零。
为什么只用21位就可以了呢?
合法代码点 的范围现在是从 U+0000 到 U+10FFFF
代码点大于 U+FFFF 的字符称为增补字符,范围是0x10000到0x10ffff
0000 0001 0000 0000 0000 0000
0001 0000 1111 1111 1111 1111
可见增补字符只用到了int类型的后21位
一个Java char 叫代码单元code unit;
The Unicode standard was originally designed as a fixed-width 16-bit character
encoding. It has since been changed to allow for characters whose representa-
tion requires more than 16 bits. The range of legal code points is now U+0000 to
U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are
greater than U+FFFF are called supplementary characters. To represent the complete
range of characters using only 16-bit units, the Unicode standard defines an
encoding called UTF-16. In this encoding, supplementary characters are represented
as pairs of 16-bit code units, the first from the high-surrogates range,
(U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to
U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points
and UTF-16 code units are the same.
The Java programming language represents text in sequences of 16-bit code
units, using the UTF-16 encoding. A few APIs, primarily in the Character class,
use 32-bit integers to represent code points as individual entities. The Java platform
provides methods to convert between the two representations.
(From JLS-3.0)
int 值表示所有 Unicode 代码点,包括增补代码点。int 的 21 个低位(最低有效位)用于表示 Unicode 代码点,并且 11 个高位(最高有效位)必须为零。
为什么只用21位就可以了呢?
合法代码点 的范围现在是从 U+0000 到 U+10FFFF
代码点大于 U+FFFF 的字符称为增补字符,范围是0x10000到0x10ffff
0000 0001 0000 0000 0000 0000
0001 0000 1111 1111 1111 1111
可见增补字符只用到了int类型的后21位
- code point
- code unit和code point
- code unit和code point
- code unit和code point
- Character 中的code point
- Code Unit 和 Code Point 初步理解
- Unicode实现细节之code point
- 代码点(Code Point)和代码单元(Code Unit)
- 代码点(code point)和代码单元(code units)
- Point
- Point
- point
- Point
- point
- Point
- Google Summer of Code 2011 point cloud library
- [每天一个知识点]6-Java语言-char和code point
- unicode、UTF-8、UTF-16、UTF-32、code point、code unit、Byte Order Mark(BOM)
- 1.4 ActionScript
- ETM 数据分析
- 1.5 Flex的事件机制
- 公布一些常用的WebServices,希望对大家的应用有帮助~
- 1.6 使用Adobe Flex Builder
- code point
- 定时器与多线程 SetTimer and Multi-Thread 每个线程独立使用一个定时器
- 中内容上对齐的属性 valign="top"
- Oracle Database Vault安装过程中遇到的几个问题及解决
- 关键字: oracle分页就用这一句
- 1.7 在Flex中操作XML
- “只要你真有能力,公司会给你很大的发展空间!”
- myeclipse中分配tomcat启动时所占内存大小
- centos 英文环境下安装中文输入法