JAVA获取文件编码

来源:互联网 发布:淘宝怎么看评价几星 编辑:程序博客网 时间:2024/05/16 16:38

       当读取文件时,我们一般都会指定文本或字符串使用的编码格式,但有时我们不清楚是什么编码的时候,我们需要分析文件或字符是什么编码,我们可以使用以下代码.

 /**  * 获取文件编码  * @param file 要分析的文件  **/public static String getCharset(File file) {String charset = "GBK"; // 默认编码byte[] first3Bytes = new byte[3];BufferedInputStream bis = null;try {boolean checked = false;bis = new BufferedInputStream(new FileInputStream(file));bis.mark(0);int read = bis.read(first3Bytes, 0, 3);if (read == -1)return charset;if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {charset = "UTF-16LE";checked = true;} else if (first3Bytes[0] == (byte) 0xEF&& first3Bytes[1] == (byte) 0xBB&& first3Bytes[2] == (byte) 0xBF) {charset = "UTF-8";checked = true;}bis.reset();if (!checked) {int loc = 0;while ((read = bis.read()) != -1) {loc++;if (read >= 0xF0)break;// 单独出现BF以下的,也算是GBKif (0x80 <= read && read <= 0xBF)break;if (0xC0 <= read && read <= 0xDF) {read = bis.read();if (0x80 <= read && read <= 0xBF)// 双字节 (0xC0 - 0xDF)// (0x80 -0xBF),也可能在GB编码内continue;elsebreak;// 也有可能出错,但是几率较小} else if (0xE0 <= read && read <= 0xEF) {read = bis.read();if (0x80 <= read && read <= 0xBF) {read = bis.read();if (0x80 <= read && read <= 0xBF) {charset = "UTF-8";break;} elsebreak;} elsebreak;}}System.out.println(loc + " " + Integer.toHexString(read));}bis.close();} catch (Exception e) {e.printStackTrace();} finally {if (bis != null) {try {bis.close();} catch (Exception e) {e.printStackTrace();}}}return charset;}

原创粉丝点击