Aspose.words之IReplacingCallback接口的应用

来源：互联网发布：电子图书数据库读秀编辑：程序博客网时间：2024/05/27 10:42

公司因为项目需要, 需要使用Word作为模板, 然后替换其中某些内容,从而达到目的.

在使用过程踩了一些坑, 记录一下防止后期继续踩进同一个坑.

2017/10/27 修改Markdown格式, 完善展示效果.
2017/11/17 再次修改细节.

1. 详解

下面这段代码是我从stackoverflow上找到的, 就以此为例子进行讲解吧.
1. 以下这段代码本为.NET版本的, 本人将其修改为Java版本.
2. 此段代码的作用是高亮指定字符.
3. 主要的坑点是某些方法与预想的不符, 最后想通了之后再看代码里的注释,发现完全是不同的理解.

/** *  ReplacingArgs的官方API   :   https://apireference.aspose.com/java/words/com.aspose.words/ReplacingArgs *  e.getMatch().group()    :   匹配到的所有的字符 *  e.getMatchNode()        :   包含匹配字符的"第一个Node", 注意该"第一个Node"可能只有一部分是匹配我们给出的正则表达式的. *  e.getMatchOffset()      :   包含匹配字符的"第一个Node"中, 真正匹配的字符部分在该Node包含的字符中的索引位. 如果大于0,就说明该Node中真的只包含部分匹配内容,索引位左边的部分(即后面的部分, 注意这里的后面一词,我是按照正则的零宽正向断言里对方向的约定来描述的: 即左边为后, 右边为前)属于未匹配项. 下面的splitRun方法就是为了分割出匹配部分. *   *  --------------------------------------------------- *  所以通常情况下,我们匹配出来的整个匹配项是这样一种情况: *  \[\w+\s+\]  ---- 正则表达式, 我们想要匹配类似 [ab           ]  的字符 *   *  匹配出来的是  {糊涂[} {a   } {     } {     } {     ]哈} *  接下来我来详细描述下出现这种情形的原因: *    1. 首先我们需要记住的是: 你的每次保存都会将你当前连续输入的部分作为一个Run(我知道这种说法不严谨.) *    2. 所以会出现这样的情况: 看上去一个不间断的空行其实是由多个Run组成的.其中的每个Run里的内容都是空行. *    3. 知道上面的内容就好办了, 上面出现五个Run就是因为输入时保存了5次,而且每个Run的形成时机就是你保存的时机. *    4. 这样就出现了 第一个Run里只有部分匹配内容, 最后一个Run里也只有部分匹配内容. */private static class ReplaceEvaluatorFindAndHighlight implements IReplacingCallback {    /// <summary>    /// This method is called by the Aspose.Words find and replace engine for each match.    /// This method highlights the match string, even if it spans multiple runs.    /// </summary>    @Override    public int replacing(ReplacingArgs e) throws Exception {        // This is a Run node that contains either the beginning or the complete match.        //   这里返回值为 整个匹配项的 第一个Node.        Node currentNode = e.getMatchNode();        // The first (and may be the only) run can contain text before the match,         // in this case it is necessary to split the run.        //    作为匹配的第一个Run, 其中可能只有一部分属于我们匹配的. 所以我们需要将其拆开为两个.        //    所以我们需要将该Run拆成两个Run(分别包含匹配项部分和非匹配项部分).        //    这里 splitRun 返回值为 包含匹配部分的Run .        if (e.getMatchOffset() > 0) {            currentNode = splitRun((Run) currentNode, e.getMatchOffset());        }        // This array is used to store all nodes of the match for further highlighting.        // 这个runs 数组里保存的将是所有将要进行高亮处理的Node.而且都是精确匹配的,不会出现只有部分内容匹配的Node.        List<Node> runs = new ArrayList<Node>();        // Find all runs that contain parts of the match string.        // .NET中的实现:  int remainingLength = e.Match.Value.Length;        //  end()返回匹配到的子字符串的后一个字符在字符串中的索引位置.         //   TODO  int remainingLength = e.getMatch().end(); 会导致 查找的字符串及其后面的部分被高亮.        //     整个匹配项的总长度, 这里的匹配项指代的是完全匹配项(即不包含前后可能的长度[出现这个长度的原因参见本类顶部的注释])        //     例如我们意图匹配[a       ], 可能最终Aspose给我们的是 xx[a       ]yy, 这里的长度是[a       ]的长度, 而不是后者        int remainingLength = e.getMatch().group().length();        // 如果进行迭代的Run还包含在到达本次匹配的所有.        while ((remainingLength > 0) && (currentNode != null)                && (currentNode.getText().length() <= remainingLength)) {             runs.add(currentNode);            // 扣掉第一个匹配Run中真正的匹配部分长度之后的长度            remainingLength = remainingLength - currentNode.getText().length();            // Select the next Run node.             // Have to loop because there could be other nodes such as BookmarkStart etc.            //   迭代出下一个Run,推动匹配工作往下执行            do {                currentNode = currentNode.getNextSibling();            } while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN));        }        // split the last run that contains the match if there is any text left.        //   上面的循环之后,可能出现这样一种情况: 循环之后的下一个Run里含有部分匹配项的内容,这样我们就需要将其分离出来.        if ((currentNode != null) && (remainingLength > 0)) {            splitRun((Run) currentNode, remainingLength);            // 这里之所有可以直接添加,是因为在上面的splitRun里将currentNode里的内容进行了赋值            runs.add(currentNode);        }        // Now highlight all runs in the sequence.        //  然后我们就可以放心得进行高亮处理了, 引入runs里存放的都是完全匹配项.        for (Node run : runs) {            final Run currentRun = (Run) run;            currentRun.getFont().setHighlightColor(Color.RED);        }        // Signal to the replace engine to do nothing because we have already done all what we wanted.        return ReplaceAction.SKIP;    }    /// <summary>    /// Splits text of the specified run into two runs.    /// Inserts the new run just after the specified run.    /// 返回值为 分割完毕了,匹配部分组成的Run.    /// </summary>    private static Run splitRun(Run run, int position) throws Exception {        Run afterRun = (Run) run.deepClone(true);        // 分离成匹配项和非匹配项两个部分,         //    afterRun : 匹配部分        //     run     : 非匹配部分        afterRun.setText(run.getText().substring(position));        run.setText(run.getText().substring(0, position));        run.getParentNode().insertAfter(afterRun, run);        return afterRun;    }}

2. Links

IReplacingCallback接口
http://blog.csdn.net/liuwen718/article/details/26586081
highlight-string-in-a-word-document

0 0