PDF报表项目预研

来源:互联网 发布:拓扑图坐标算法 编辑:程序博客网 时间:2024/06/06 13:13

背景: 

       业务需求:解析pdf报表,提取数据,自动纠错

       以后可能需求:填充报表

       pdf:Adobe livecyele designer表单设计工具设计的模板

      考虑开源项目:ITEXT & PDFBOX


经过几天的折腾,发现PDFBOX和ITEXT均不支持Adobe livecyele designer设计的pdf表单解析填充,主要是因为designer设计的表单XFA(xml form architecture)格式,而目前主流支持的都是AcroForm形式的表单即Acrobat基础表单功能设计的表单。关于XFA和AcroForm的区别这里就不再细说了,附上adobe的说法

The interactive forms that you create in LiveCycle Designer are different than the interactive forms that you create in Adobe Acrobat. If you create an interactive form in Acrobat, your form is based on Adobe’s Acroform technology. This technology dates back to Acrobat version 3, and Adobe provides the “Acrobat Forms API Reference” to provide the technical details for this technology. I would not recommend using Acroform technology because XFA is the better technology.


       itext目前还有在做此类方案的研究,具体官方声明如下:

We've all read to much questions saying: "I've created a form using 
Adobe LiveCycle Designer, then I took some code from the web to fill out 
such a form, and... it doesn't work." 

We were able to reduce the number of questions like that by providing 
documentation ( http://itextpdf.com/book/ ) and examples ( 
http://itextpdf.com/book/chapter.php?id=8 ) explaining the difference 
between AcroForm technology and the XML Forms Architecture (XFA). We've 
always explained: iText has full support for AcroForms, partial support 
for XML. 

More recently, We've been investing in the development of an XFA to PDF 
convertor. You can read more about it on the iText blog: 
http://lowagie.com/xfa2pdf
You can watch a demo on YouTube: http://www.youtube.com/watch?v=qxtAy2Czsh0
You can register to be kept up-to-date: 
http://itextpdf.com/themes/betalist.php
And what's more: you can try out the functionality yourself: 
http://demo.itextsupport.com/xfademo/

Please be aware that the functionality isn't finished yet; we welcome 
all feedback because your suggestions and samples will help us improve 
the product. 

     预研就此告一段落,有空再研究新的可行方案,个人感觉通过ITEXT导出XML数据,然后用xml解析工具(JDOM,DOM4J,SAX等)解析获取pdf报表内容也不失为一种方案,只是有待确认


最后附上itext相关链接

ITEXT官网 http://itextpdf.com/

<ITEXT IN ACTION>代码示例地址http://itextpdf.com/book/examples.php

PDFBOX官网http://pdfbox.apache.org/


原创粉丝点击