PDF抽词报错:java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)

来源:互联网 发布:war thunder mac 编辑:程序博客网 时间:2024/06/05 17:25

最近遇到PDF抽词报错:

java.lang.NullPointerException

   at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)

   at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)

   at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)

   at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)

   at com.index.extractor.impl.PdfFileTextExtractor.getText(PdfFileTextExtractor.java:46)

   at test.TextConvert.convert(TextConvert.java:147)

   at test.TextConvert.getEFiles(TextConvert.java:111)

   at test.TextConvert.getEFiles(TextConvert.java:130)

   at test.TextConvert.getEFiles(TextConvert.java:130)

   at test.TextConvert.getEFiles(TextConvert.java:130)

   at test.TextConvert.getEFiles(TextConvert.java:130)

   at test.TextConvert.go(TextConvert.java:47)

   at test.TextConvert.main(TextConvert.java:42)

 

java.lang.Throwable: Warning: You did not close the PDF Document

   at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:418)

   at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)

   at java.lang.ref.Finalizer.runFinalizer(Unknown Source)

   at java.lang.ref.Finalizer.access$100(Unknown Source)

   at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

 

网上搜了一下发现有人提交了此bug,在0.8版本中已做了修正,而我的工程里现在用到的还是FontBox-0.1.0-dev.jar PDFBox-0.7.3.jar,可其最新版本已是fontbox-1.4.0.jar pdfbox-1.4.0.jar 并已共享给了apache。

下载最新的包把后缀扩展名zip修改为jar,然后导入工程中即可。

原创粉丝点击