ansi格式的TXT字符串在ios如何解析

来源:互联网 发布:windows 错误恢复 编辑:程序博客网 时间:2024/06/05 18:04

当判断txt的编码不是unicode编码时,再用这些case一个一个去解码,如果解码成功,这个函数是会返回解码后的NSString,如果解码失败,返回nil。做个循环就可以了


txt主流编码有unicode(utf-8,utf16 big,utf16 little), ansi编码(gb2312等),另外起始字节要分有无bom,所以你得先根据每个编码的不同从头几个字节判断该文件为哪种编码,然后再用对应的解码去解。 unicode解码就不说了,ansi解码苹果也有对应的解码器,以下代码列出的5种就可以解码ansi的编码
                        case 1:
                           testString = [[NSString alloc]initWithData:_dataBuffer encoding:-2147482062];
                            break;
                        case 2:
                            testString = [[NSString alloc]initWithData:_dataBuffer encoding:-2147482063];
                            break;
                        case 3:
                            testString = [[NSString alloc]initWithData:_dataBuffer encoding:-2147481552];
                            break;
                        case 4:
                            testString = [[NSString alloc]initWithData:_dataBuffer encoding:-2147481296];
                            break;
                        case 5:
                            testString = [[NSString alloc]initWithData:_dataBuffer encoding:-2147481083];
                        default: 


我找到了,创建mac os程序加入下面的代码


const NSStringEncoding *encodings = [NSString availableStringEncodings];
    NSMutableString *str = [[NSMutableString alloc] init];
    NSStringEncoding encoding;
    while ((encoding = *encodings++) != 0)
    {
        [str appendFormat: @"%@ === %i\n", [NSString localizedNameOfStringEncoding:encoding], encoding];
    }
    NSLog(@"%@",str);

下面是mac os的格式代码:
Western (Mac OS Roman) === 30
Japanese (Mac OS) === -2147483647
Traditional Chinese (Mac OS) === -2147483646
Korean (Mac OS) === -2147483645
Arabic (Mac OS) === -2147483644
Hebrew (Mac OS) === -2147483643
Greek (Mac OS) === -2147483642
Cyrillic (Mac OS) === -2147483641
Devanagari (Mac OS) === -2147483639
Gurmukhi (Mac OS) === -2147483638
Gujarati (Mac OS) === -2147483637
Thai (Mac OS) === -2147483627
Simplified Chinese (Mac OS) === -2147483623
Tibetan (Mac OS) === -2147483622
Central European (Mac OS) === -2147483619
Symbol (Mac OS) === 6
Dingbats (Mac OS) === -2147483614
Turkish (Mac OS) === -2147483613
Croatian (Mac OS) === -2147483612
Icelandic (Mac OS) === -2147483611
Romanian (Mac OS) === -2147483610
Celtic (Mac OS) === -2147483609
Gaelic (Mac OS) === -2147483608
Keyboard Symbols (Mac OS) === -2147483607
Farsi (Mac OS) === -2147483508
Cyrillic (Mac OS Ukrainian) === -2147483496
Inuit (Mac OS) === -2147483412
Unicode (UTF-16) === 10
Unicode (UTF-7) === -2080374528
Unicode (UTF-8) === 4
Unicode (UTF-32) === -1946156800
Unicode (UTF-16BE) === -1879047936
Unicode (UTF-16LE) === -1811939072
Unicode (UTF-32BE) === -1744830208
Unicode (UTF-32LE) === -1677721344
Western (ISO Latin 1) === 5
Central European (ISO Latin 2) === 9
Western (ISO Latin 3) === -2147483133
Central European (ISO Latin 4) === -2147483132
Cyrillic (ISO 8859-5) === -2147483131
Arabic (ISO 8859-6) === -2147483130
Greek (ISO 8859-7) === -2147483129
Hebrew (ISO 8859-8) === -2147483128
Turkish (ISO Latin 5) === -2147483127
Nordic (ISO Latin 6) === -2147483126
Thai (ISO 8859-11) === -2147483125
Baltic (ISO Latin 7) === -2147483123
Celtic (ISO Latin 8) === -2147483122
Western (ISO Latin 9) === -2147483121
Romanian (ISO Latin 10) === -2147483120
Latin-US (DOS) === -2147482624
Greek (DOS) === -2147482619
Baltic (DOS) === -2147482618
Western (DOS Latin 1) === -2147482608
Greek (DOS Greek 1) === -2147482607
Central European (DOS Latin 2) === -2147482606
Cyrillic (DOS) === -2147482605
Turkish (DOS) === -2147482604
Portuguese (DOS) === -2147482603
Icelandic (DOS) === -2147482602
Hebrew (DOS) === -2147482601
Canadian French (DOS) === -2147482600
Arabic (DOS) === -2147482599
Nordic (DOS) === -2147482598
Russian (DOS) === -2147482597
Greek (DOS Greek 2) === -2147482596
Thai (Windows, DOS) === -2147482595
Japanese (Windows, DOS) === 8
Simplified Chinese (Windows, DOS) === -2147482591
Korean (Windows, DOS) === -2147482590
Traditional Chinese (Windows, DOS) === -2147482589
Western (Windows Latin 1) === 12
Central European (Windows Latin 2) === 15
Cyrillic (Windows) === 11
Greek (Windows) === 13
Turkish (Windows Latin 5) === 14
Hebrew (Windows) === -2147482363
Arabic (Windows) === -2147482362
Baltic (Windows) === -2147482361
Vietnamese (Windows) === -2147482360
Western (ASCII) === 1
Japanese (Shift JIS X0213) === -2147482072
Chinese (GBK) === -2147482063
Chinese (GB 18030) === -2147482062
Japanese (ISO 2022-JP) === 21
Japanese (ISO 2022-JP-2) === -2147481567
Japanese (ISO 2022-JP-1) === -2147481566
Chinese (ISO 2022-CN) === -2147481552
Korean (ISO 2022-KR) === -2147481536
Japanese (EUC) === 3
Simplified Chinese (GB 2312) === -2147481296
Traditional Chinese (EUC) === -2147481295
Korean (EUC) === -2147481280
Japanese (Shift JIS) === -2147481087
Cyrillic (KOI8-R) === -2147481086
Traditional Chinese (Big 5) === -2147481085
Western (Mac Mail) === -2147481084
Simplified Chinese (HZ GB 2312) === -2147481083
Traditional Chinese (Big 5 HKSCS) === -2147481082
Ukrainian (KOI8-U) === -2147481080
Traditional Chinese (Big 5-E) === -2147481079
Western (NextStep) === 2
Non-lossy ASCII === 7
Western (EBCDIC Latin Core) === -2147480575
Western (EBCDIC Latin 1) === -2147480574

不过xcode编译器提示的编码代码encoding是unsigned long类型,应该正数也可以
0 0