读取html文件内容乱码处理

来源:互联网 发布:网站seo方案 编辑:程序博客网 时间:2024/05/22 01:53
1.乱码  先读出 所有的字节码 然后在转换成 需要的字符串
正确方式:
 ByteArrayOutputStream outHtml = new ByteArrayOutputStream(); 
 InputStream inn =  conn.getInputStream();
 byte[] buffer = new byte[1024];  
 int len = 0;
 while((len = inn .read(buffer))!= -1 ){
outHtml.write(buffer,0,len);
 }
 byte[] data = outHtml.toByteArray();
 logger.info("转换前utf-8:"+new String(data,"utf-8")); 

 

错误方式: 是什么导致的乱码呢  为啥本地环境不乱码 到运行环境就乱码呢  难道仅仅是因为 可能读取不全字节 转string 时出现的转码错误? 

 InputStream inn =  conn.getInputStream();
 InputStream inputStream = new BufferedInputStream(inn);  
 StringBuffer htmlContent = new StringBuffer();  
 byte[] b = new byte[1024];  
 for (int n; (n = inputStream.read(b)) != -1;) {  
htmlContent.append(new String(b, 0, n,"utf-8"));  
 }  
 logger.info("获取时:"+htmlContent.toString());
0 0