读取html文件内容乱码处理

来源：互联网发布：网站seo方案编辑：程序博客网时间：2024/05/22 01:53

1.乱码先读出所有的字节码然后在转换成需要的字符串
正确方式：
ByteArrayOutputStream outHtml = new ByteArrayOutputStream();
InputStream inn = conn.getInputStream();
byte[] buffer = new byte[1024];
int len = 0;
while((len = inn .read(buffer))!= -1 ){
outHtml.write(buffer,0,len);
}
byte[] data = outHtml.toByteArray();
logger.info("转换前utf-8:"+new String(data,"utf-8"));

错误方式：是什么导致的乱码呢为啥本地环境不乱码到运行环境就乱码呢难道仅仅是因为可能读取不全字节转string 时出现的转码错误？

InputStream inn = conn.getInputStream();
InputStream inputStream = new BufferedInputStream(inn);
StringBuffer htmlContent = new StringBuffer();
byte[] b = new byte[1024];
for (int n; (n = inputStream.read(b)) != -1;) {
htmlContent.append(new String(b, 0, n,"utf-8"));
}
logger.info("获取时："+htmlContent.toString());

0 0