有关java编辑PDF的一些小问题

来源：互联网发布：编程笔记本推荐编辑：程序博客网时间：2024/06/18 07:25

最近分配到一个任务，对一个PDF文件进行编辑，提取需要替换的内容，使其成为公用模板，用java去编辑。

会出现几个问题：

1）PDF样式文字不好改，推荐工具（Adobe Acrobat Pro DC）http://jingyan.baidu.com/article/e6c8503c7b1ab1e54f1a1819.html#

2）java编写替换代码，如下。

public static void editPDF(String sourceFile, String destinationFile, Map<String, String> chars, String encoding) {<span style="white-space:pre"></span>try {PDDocument helloDocument = PDDocument.load(new File(sourceFile));List pages = helloDocument.getDocumentCatalog().getAllPages();for (int i = 0; i < pages.size(); i++) {PDPage page = (PDPage) pages.get(i);PDStream contents = page.getContents();PDFStreamParser parser = new PDFStreamParser(contents.getStream());parser.parse();List<Object> tokens = parser.getTokens();for (int j = 0; j < tokens.size(); j++) {Object next = tokens.get(j);if (next instanceof PDFOperator) {PDFOperator op = (PDFOperator) next;// Tj and TJ are the two operators that display strings// in a// PDFtry {COSString previousString = (COSString) tokens.get(j - 1);String string = previousString.getString();for (String key : chars.keySet()) {if (string.indexOf(key) < 0) {if (string.indexOf("$") >= 0) {System.out.println(string);}continue;}string = string.replace(key, chars.get(key));}// Word you want to change. Currently this code// changes// word "Solr" to "Solr123"previousString.reset();previousString.append(string.getBytes(encoding));} catch (Exception e1) {try {COSArray previousArray = (COSArray) tokens.get(j - 1);for (int k = 0; k < previousArray.size(); k++) {Object arrElement = previousArray.getObject(k);if (arrElement instanceof COSString) {COSString cosString = (COSString) arrElement;String string = cosString.getString();for (String key : chars.keySet()) {<span style="white-space:pre"></span>if (string.indexOf(key) < 0) {if (string.indexOf("$") >= 0) {System.out.println(string);}continue;}string = string.replace(key, chars.get(key));}// Currently this code changes word// "Solr"// to// "Solr123"cosString.reset();cosString.append(string.getBytes(encoding));}}} catch (Exception e2) {continue;}}<span style="white-space:pre"></span>}}// now that the tokens are updated we will replace the page// content// stream.PDStream updatedStream = new PDStream(helloDocument);OutputStream out = updatedStream.createOutputStream();ContentStreamWriter tokenWriter = new ContentStreamWriter(out);tokenWriter.writeTokens(tokens);page.setContents(updatedStream);helloDocument.save(destinationFile); // Output// file// name// PDFTextStripper textStripper = new PDFTextStripper();// System.out.println(textStripper.getText(helloDocument));// helloDocument.close();}helloDocument.close();} catch (IOException e) {e.printStackTrace();} catch (COSVisitorException e) {e.printStackTrace();}}

上面的Map<String,String> chars只是我替换字符串比较多，放字符串用的。

3、关键的关键是PDF中有可能有些字体显示出来了，但是自己的系统中并没有该字体，这时候Java就会读出乱码来，解决方法：

可以用PDF编辑工具把识别不出的字体换成系统中存在的字体（有可能java还识别不出，基础的几种还是识别出来的）

或者到网上下载该字体，安装到系统中

0 0