Unicode

来源:互联网 发布:天刀捏脸数据萝莉体型 编辑:程序博客网 时间:2024/06/10 04:00
打开网址http://inamidst.com/stuff/unidata/
可以查看unicode以及对应的字符:

点击选择一个字符后,会转到http://www.fileformat.info这个网址,这个网站上会显示该字符的详细信息,包Unicode Data,Encodings,在html/c/c++/java/python 语言中的编码信息。
比如下面是美元符号的信息:
Unicode DataNameDOLLAR SIGNBlockBasic LatinCategorySymbol, Currency [Sc]Combine0BIDIEuropean Number Terminator [ET]MirrorNIndex entriesmilreis
DOLLAR SIGN
escudoCommentsmilreis, escudo
glyph may have one or two vertical bars
other currency symbol characters: U+20A0-U+20B8See Alsocurrency sign U+00A4
heavy dollar sign U+1F4B2VersionUnicode 1.1.0 (June, 1993)EncodingsHTML Entity (decimal)$HTML Entity (hex)$How to type in Microsoft WindowsAlt +0024
Alt 036
Alt 36UTF-8 (hex)0x24 (24)UTF-8 (binary)00100100UTF-16 (hex)0x0024 (0024)UTF-16 (decimal)36UTF-32 (hex)0x00000024 (0024)UTF-32 (decimal)36C/C++/Java source code"\u0024"Python source codeu"\u0024"More...Java Datastring.toUpperCase()$string.toLowerCase()$Character.UnicodeBlockBASIC_LATINCharacter.charCount()1Character.getDirectionality()DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR [5]Character.getNumericValue()-1Character.getType()26Character.isDefined()YesCharacter.isDigit()NoCharacter.isIdentifierIgnorable()NoCharacter.isISOControl()NoCharacter.isJavaIdentifierPart()YesCharacter.isJavaIdentifierStart()YesCharacter.isLetter()NoCharacter.isLetterOrDigit()NoCharacter.isLowerCase()NoCharacter.isMirrored()NoCharacter.isSpaceChar()NoCharacter.isSupplementaryCodePoint()NoCharacter.isTitleCase()NoCharacter.isUnicodeIdentifierPart()NoCharacter.isUnicodeIdentifierStart()NoCharacter.isUpperCase()NoCharacter.isValidCodePoint()YesCharacter.isWhitespace()No
wiki 上code point的解释:
In character encoding terminology, a code point or code position is any of 
the numerical values that make up the code space (or code page).[1] 

For example, ASCIIcomprises 128 code points in the range 0hex to 7Fhex
Extended ASCII comprises 256 code points in the range 0hex to FFhex, and 
Unicode comprises 1,114,112 code points in the range 0hex to 10FFFFhex.
The Unicode code space is divided into seventeen planes (the basic multilingual 
plane, and 16 supplementary planes), each with 65,536 (= 216) code points. 
Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.

在Python中,可以通过unicode name的取得相应的字符,如可以通过名字'dollar sign',
来得到dollar符号:
----------------------------------------------------------------------------------------------------------
>>> dollar = u"\N{dollar sign}"
>>> print dollar
$

----------------------------------------------------------------------------------------------------------

原创粉丝点击