解决读取文件乱码问题小结

来源：互联网发布：软件开发源代码管理编辑：程序博客网时间：2024/05/16 07:54

最经用到读取txt，并对里面的部分进行查找替换，读写的时候出现乱码，以下是查到的判断文件编码方法，不过不太好用

//对于UTF-8编码格式的文本文件，其前3个字节的值就是-17、-69、-65，所以，判定是否是UTF-8编码格式的代码片段如下：

import java.io.File;import java.io.FileInputStream;import java.io.InputStream;public class Test {public static void main(String[] args) {File f=new File("待判定的文本文件名");    try{     InputStream ios=new FileInputStream(f);      byte[] b=new byte[3];      ios.read(b);      ios.close();      if(b[0]==-17&&b[1]==-69&&b[2]==-65)         System.out.println(f.getName()+"编码为UTF-8");      else System.out.println(f.getName()+"可能是GBK");    }catch(Exception e){      e.printStackTrace(); }}}

上边是网上查到的判断文件编码格式的不太好使用着，继续查查到如下内容：

若想实现更复杂的文件编码检测，可以使用一个开源项目cpdetector，代码如下：

detector是探测器，它把探测任务交给具体的探测实现类的实例完成。 cpDetector内置了一些常用的探测实现类，这些探测实现类的实例可以通过add方法

加进来，如ParsingDetector、 JChardetFacade、ASCIIDetector、UnicodeDetector。 detector按照“谁最先返回非空的探测结果，就以该结果为准”的原则

返回探测到的字符集编码。

cpdetector.io.CodepageDetectorProxy detector =   cpdetector.io.CodepageDetectorProxy.getInstance();

ParsingDetector可用于检查HTML、XML等文件或字符流的编码,构造方法中的参数用于指示是否显示探测过程的详细信息，为false不显示。

detector.add(new cpdetector.io.ParsingDetector(false));

JChardetFacade封装了由Mozilla组织提供的JChardet，它可以完成大多数文件的编码测定。所以，一般有了这个探测器就可满足大多数项目的要求，如果

加进来，如ParsingDetector、 JChardetFacade、ASCIIDetector、UnicodeDetector。detector按照“谁最先返回非空的探测结果，就以该结果为准”的原则

返回探测到的字符集编码。

下面是自己写的测试以下：

import info.monitorenter.cpdetector.io.ASCIIDetector;import info.monitorenter.cpdetector.io.CodepageDetectorProxy;import info.monitorenter.cpdetector.io.JChardetFacade;import info.monitorenter.cpdetector.io.UnicodeDetector;import java.io.File;import java.io.IOException;import java.net.MalformedURLException;import java.nio.charset.Charset;public class CharsetUtil {/** * 检查文件的编码格式 * @param path 待查文件路径 * @return String文件的编码名 */public static String getCharset(String path){CodepageDetectorProxy detector = CodepageDetectorProxy.getInstance();detector.add(JChardetFacade.getInstance());detector.add(ASCIIDetector.getInstance());detector.add(UnicodeDetector.getInstance());File file = new File(path);Charset charset = null;try {charset = detector.detectCodepage(file.toURL());if(charset!=null){return charset.name();}else{return null;}} catch (MalformedURLException e) {e.printStackTrace();return null;} catch (IOException e) {e.printStackTrace();return null;}}

注：还需要引入两个Jar包 cpdetector_1.0.8.jar .jar 和jchardet-1.0.jar不然会报错