dom4j 中文乱码问题

来源：互联网发布：销售数据编辑：程序博客网时间：2024/05/17 11:36

在用 dom4j 以 utf8 编码格式生成 xml 文档后，发现该 xml 文档包含中文的部分异常，无法读取。随后被逼无奈，只好使出猥琐招数，直接将要写入 xml 的字符串重新以 utf8 格式编码后再写入 xml：

str = new String(str.getBytes(), "UTF8");

终于，xml 文档异常消除，可以正常读取。然而，其中的中文部分是却乱码，悲了个剧，事情为什么是这个样子呢？

终于知道：问题在于 FileWriter 类的滥用，将 FileWriter 改为 FileOutputStream 之后，问题解决。

1 dom4j 中 XMLWriter 对文件的处理过程：

    public XMLWriter(OutputStream out) throws UnsupportedEncodingException    {        this.format = DEFAULT_FORMAT;        this.writer = createWriter(out, format.getEncoding());        this.autoFlush = true;        namespaceStack.push(Namespace.NO_NAMESPACE);    }    public XMLWriter(OutputStream out, OutputFormat format) throws UnsupportedEncodingException    {        this.format = format;        this.writer = createWriter(out, format.getEncoding());        this.autoFlush = true;        namespaceStack.push(Namespace.NO_NAMESPACE);    }    protected Writer createWriter(OutputStream outStream, String encoding) throws UnsupportedEncodingException    {        return new BufferedWriter( new OutputStreamWriter( outStream, encoding ));    }

结论：dom4j 在生产 xml 文档时，构造其 XMLWriter 所需参数为 OutputStream 对象，而非 Writer 对象。

2 示例：

public void createXML(String fileName) {    Document doc = DocumentHelper.createDocument();    Element rootElement = doc.addElement("animal");    rootElement.addAttribute("name", "汤姆猫");    Element ageElement = rootElement.addElement("age");    ageElement.setText("3岁");    Element colorElement = rootElement.addElement("color");    colorElement.setText("黄色");    try     {        OutputFormat format = OutputFormat.createPrettyPrint();        format.setEncoding("UTF-8");    //XMLWriter xmlWriter = new XMLWriter(new FileWriter(fileName), format);         XMLWriter xmlWriter = new XMLWriter(new FileOutputStream(fileName), format);        xmlWriter.write(doc);        xmlWriter.close();    }    catch (Exception e)     {        System.out.println(e);    }}