java word doc转xml 解析word里面的树

来源:互联网 发布:unity3d麻将开发思路 编辑:程序博客网 时间:2024/06/05 04:22
最近做的一个项目遇到一个需要将word里面画的一个树导入到数据库,于是就想用doc 转成xml,然后再解析到数据库里面。word里面的树是这样的:

这里写图片描述

转成xml后有了一下关系结构:
<o:relationtable v:ext="edit"><o:rel v:ext="edit" idsrc="#_s1028" iddest="#_s1028"/><o:rel v:ext="edit" idsrc="#_s1029" iddest="#_s1028" idcntr="#_s1032"/><o:rel v:ext="edit" idsrc="#_s1030" iddest="#_s1028" idcntr="#_s1033"/><o:rel v:ext="edit" idsrc="#_s1117" iddest="#_s1028" idcntr="#_s1118"/><o:rel v:ext="edit" idsrc="#_s1161" iddest="#_s1028" idcntr="#_s1162"/></o:relationtable>
格式转换找了网上好多方法都不好用,最后看到一个用word录制宏,然后用jacob调用宏的方法来实现批量转换。

宏代码:

Sub hong1()'' hong1 宏''  Dim name As String    name = "01"    For i = 1 To 4    ChangeFileOpenDirectory "D:\doc\"    Documents.Open filename:=name & ".doc", ConfirmConversions:=False, ReadOnly:= _        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _        Format:=wdOpenFormatAuto, XMLTransform:=""    ChangeFileOpenDirectory "D:\doc2xml\"    ActiveDocument.SaveAs2 filename:=name & ".xml", FileFormat:=wdFormatFlatXML, _        LockComments:=False, password:="", AddToRecentFiles:=True, WritePassword _        :="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _        SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _        False, CompatibilityMode:=11        ActiveWindow.Close        name = name + 1        If name < 10 Then name = "0" & name        Next iEnd Sub

调用宏的java代码:

static void runMacros(String path) {        ActiveXComponent word = new ActiveXComponent("Word.Application");        Dispatch documents = word.getProperty("Documents").toDispatch();        //String filename = "01.doc";        File file = new File(path);        File[] files = file.listFiles();        for (File tf : files) {            Dispatch document = Dispatch.call(documents, "Open", tf.getAbsolutePath()).toDispatch();            Dispatch.call(word, "Run", new Variant("macro1"), new Variant(path), new Variant(tf.getName()),                    new Variant(path), new Variant(tf.getName().substring(0,tf.getName().lastIndexOf("."))));        }        // Dispatch.call(documents, "Close");    }
实现转换之后再用dom4j来解析xml树。基本搞定了。
1 0
原创粉丝点击