有关java编辑PDF的一些小问题

来源:互联网 发布:编程笔记本推荐 编辑:程序博客网 时间:2024/06/18 07:25

最近分配到一个任务,对一个PDF文件进行编辑,提取需要替换的内容,使其成为公用模板,用java去编辑。

会出现几个问题:

1)PDF样式文字不好改,推荐工具(Adobe Acrobat Pro DC)http://jingyan.baidu.com/article/e6c8503c7b1ab1e54f1a1819.html#

2)java编写替换代码,如下。

public static void editPDF(String sourceFile, String destinationFile, Map<String, String> chars, String encoding) {<span style="white-space:pre"></span>try {PDDocument helloDocument = PDDocument.load(new File(sourceFile));List pages = helloDocument.getDocumentCatalog().getAllPages();for (int i = 0; i < pages.size(); i++) {PDPage page = (PDPage) pages.get(i);PDStream contents = page.getContents();PDFStreamParser parser = new PDFStreamParser(contents.getStream());parser.parse();List<Object> tokens = parser.getTokens();for (int j = 0; j < tokens.size(); j++) {Object next = tokens.get(j);if (next instanceof PDFOperator) {PDFOperator op = (PDFOperator) next;// Tj and TJ are the two operators that display strings// in a// PDFtry {COSString previousString = (COSString) tokens.get(j - 1);String string = previousString.getString();for (String key : chars.keySet()) {if (string.indexOf(key) < 0) {if (string.indexOf("$") >= 0) {System.out.println(string);}continue;}string = string.replace(key, chars.get(key));}// Word you want to change. Currently this code// changes// word "Solr" to "Solr123"previousString.reset();previousString.append(string.getBytes(encoding));} catch (Exception e1) {try {COSArray previousArray = (COSArray) tokens.get(j - 1);for (int k = 0; k < previousArray.size(); k++) {Object arrElement = previousArray.getObject(k);if (arrElement instanceof COSString) {COSString cosString = (COSString) arrElement;String string = cosString.getString();for (String key : chars.keySet()) {<span style="white-space:pre"></span>if (string.indexOf(key) < 0) {if (string.indexOf("$") >= 0) {System.out.println(string);}continue;}string = string.replace(key, chars.get(key));}// Currently this code changes word// "Solr"// to// "Solr123"cosString.reset();cosString.append(string.getBytes(encoding));}}} catch (Exception e2) {continue;}}<span style="white-space:pre"></span>}}// now that the tokens are updated we will replace the page// content// stream.PDStream updatedStream = new PDStream(helloDocument);OutputStream out = updatedStream.createOutputStream();ContentStreamWriter tokenWriter = new ContentStreamWriter(out);tokenWriter.writeTokens(tokens);page.setContents(updatedStream);helloDocument.save(destinationFile); // Output// file// name// PDFTextStripper textStripper = new PDFTextStripper();// System.out.println(textStripper.getText(helloDocument));// helloDocument.close();}helloDocument.close();} catch (IOException e) {e.printStackTrace();} catch (COSVisitorException e) {e.printStackTrace();}}
上面的Map<String,String> chars只是我替换字符串比较多,放字符串用的。


3、关键的关键是PDF中有可能有些字体显示出来了,但是自己的系统中并没有该字体,这时候Java就会读出乱码来,解决方法:

可以用PDF编辑工具把识别不出的字体换成系统中存在的字体(有可能java还识别不出,基础的几种还是识别出来的)

或者到网上下载该字体,安装到系统中

0 0
原创粉丝点击