C#指定编码写文件的那些事

来源:互联网 发布:oracle数据库用户 编辑:程序博客网 时间:2024/05/06 09:43

C#写文件时,StreamWriter有可选参数指定编码格式Encoding,而文件的格式ASCII,UTF-8,UTF-32,Unicode,gb2312对于存储文件内容又格外重要。(关于具体文件编码请上网搜索资料)

测试代码如下:

using System;using System.Collections.Generic;using System.Linq;using System.Text;using System.Threading.Tasks;using System.IO;namespace Test{    class Program    {        static void Main(string[] args)        {            try            {                StreamWriter sw1 = new StreamWriter("1.txt");                StreamWriter sw2 = new StreamWriter("2.txt", false, Encoding.GetEncoding("ASCII"));                StreamWriter sw3 = new StreamWriter("3.txt", false, Encoding.GetEncoding("UTF-8"));                StreamWriter sw4 = new StreamWriter("4.txt", false, Encoding.GetEncoding("UTF-7"));                StreamWriter sw5 = new StreamWriter("5.txt", false, Encoding.GetEncoding("UTF-32"));                StreamWriter sw6 = new StreamWriter("6.txt", false, Encoding.GetEncoding("Unicode"));                StreamWriter sw7 = new StreamWriter("7.txt", false, Encoding.GetEncoding("GB2312"));                                sw1.WriteLine("test 测试");                sw2.WriteLine("test 测试");                sw3.WriteLine("test 测试");                sw4.WriteLine("test 测试");                sw5.WriteLine("test 测试");                sw6.WriteLine("test 测试");                sw7.WriteLine("test 测试");                sw1.Close();                sw2.Close();                sw3.Close();                sw4.Close();                sw5.Close();                sw6.Close();                sw7.Close();            }            catch (IOException)            {                         }        }    }}

运行结果:

生成7个文件,在Notepad++中显示相应文件编码如下:

1.txt  ANSI as UTF-8

内容显示为: test 测试

文件大小:13字节


2.txt  ANSI as UTF-8

内容显示为: test ??

文件大小:9字节


3.txt  UTF-8

内容显示为: test 测试

文件大小:16字节


4.txt  ANSI as UTF-8

内容显示为: test +bUuL1Q-

文件大小:15字节


5.txt  UCS-Little Endian

内容显示为: test 测试

文件大小:40字节


6.txt  UCS-Little Endian

内容显示为: test 测试

文件大小:20字节


7.txt  ANSI

内容显示为: test 测试

文件大小:11字节


两点说明:

1. ANSI

在开始看到ANSI,眼花看成ASCII了(too young too simple TT)

关于ANSI和ASCII区别,ASCII不用再解释,而ANSI是微软搞的一套,即Windows代码页,而对于上述Notepad++中提示的ANSI则是指系统默认代码页(GB2312英文占一个字节,汉字2个字节),具体参考:

http://en.wikipedia.org/wiki/Code_page#Windows_.28ANSI.29_code_pages


2. ANSI as UTF-8 与 UTF-8

In the absence of a BOM, Notepad++ looks for bytes that can't represent ASCII characters because their values are greater than 127 (or 7F hex). If it finds any, but they all conform to the patterns required by UTF-8, it decodes the file as UTF-8 and reports the encoding in the status bar as "ANSI as UTF-8".

个人理解,就是在Notepad++中“ANSI as UTF-8”就是“以“以UTF-8无BOM格式编码”

具体参考:

http://stackoverflow.com/questions/1380690/what-is-ansi-as-utf-8-and-how-can-i-make-fputcsv-generate-utf-8-w-bom


疑惑:

不指定编码格式生成的1.txt和指定编码格式"GB2312"生成的7.txt为什么有差别?System.Console.WriteLine(Encoding.Default.EncodingName)输出 "简体中文(GB2312)",表明默认就是"GB2312",还需要进一步探索。


参考资料:

http://blog.csdn.net/hjsunj/article/details/2223766

http://www.imkevinyang.com/2010/06/%E5%85%B3%E4%BA%8E%E5%AD%97%E7%AC%A6%E7%BC%96%E7%A0%81%EF%BC%8C%E4%BD%A0%E6%89%80%E9%9C%80%E8%A6%81%E7%9F%A5%E9%81%93%E7%9A%84.html

原创粉丝点击