iText parse html with RichText and images to pdf
来源:互联网 发布:湖南北大青鸟学校java 编辑:程序博客网 时间:2024/05/16 16:17
I use itextpdf to convert RichText to pdf and encountered many issues. Here are the three issues I want to talk about :
1.Tables in RichText turns into black box while using XMLWorkerHelper.
2.Line spacing in pdf doesn't look the same as html from the UI while using <p> tag.
3.Position of Images in pdf doesn't follow the UI while handling <img/> tag with Image Class and treating the other content as a whole html.
issue1:Tables in RichText turns into black box while using XMLWorkerHelper.
former code (with issue):
Document doc = new Document(PageSize.LETTER); //create a new docPdfWriter writer = PdfWriter.getInstance(doc,os); //create a writer and associated with docdoc.open(); //open the docString content = getContent(paper.getContentId());//XMLWorker approachInputStream is = IOUtils.toInputStream(content);XMLWorkerHelper helper = XMLWorkerHelper.getInstance();helper.parseXHtml(writer, doc, is);
change to code (fix issue):
Document doc = new Document(PageSize.LETTER); //create a new docPdfWriter writer = PdfWriter.getInstance(doc,os); //create a writer and associated with docdoc.open(); //open the docString content = getContent(paper.getContentId());//HTMLWorker approachHTMLWorker htmlWorker = new HTMLWorker(doc);htmlWorker.parse(new StringReader(content));
Summary:Though HTMLWorker is deprecated and XMLWorkerHelper is new, XMLWorkerHelper seems to be able to handle text well but doesn't work well with some certain stuff like tables. The easiest way is to treat the content you want to convert to pdf as html because it shows exactly the same as RichText in html.
issue2 : Line spacing in pdf doesn't look the same as html from the UI while using <p> tag.
This issue happens bacause <p> tag's height is higher than <br/> in html while <p> tag's height is the same as <br/> in pdf.
Solution : make <p> tag with one more <br/>
private static String handlePTag(String content) { content = content.replaceAll("<p></p>", "").replaceAll("<P></P>", ""); content = content.replaceAll("<p", "<br><p").replaceAll("<P", "<br><P"); content = content.replaceAll("</p>", "</p><br>").replaceAll("</P>", "</P><br>"); content = content.replaceAll("</p><br>\\s*<br><p", "</p><br><p").replaceAll("</P><br>\\s*<br><P", "</P><br><P"); return content; }
issue3:Position of Images in pdf doesn't follow the UI while handling <img/> tag with Image Class and treating the other content as a whole html.
Description : This issue happens because we handle <img/> tag with Image class and convert the other content in RichText to pdf as a whole html. So all the images are added in fron of all the other content (eg,text) or all the images are after all the other content. And the position of images doesn't follow what user input in the richText from the UI.
Solution : ImageProvider. We provide a class for handling <img/> tag and doing appropriate changes while parsing every <img/> tag with certain parameters the interface ImageProvider provides.(We can get the src attribute of every <img/> tag and get the id of every img and get the Image object by image id, so we can return a Image Object while parsing a <img/> tag and doc can add the corresponding Image Object in the right position whenever a <img/> (with certain id) shows.)
Note: ImageProvider approach is from book 'itext in action'.There is a difficult problem while trying this approach. When we think of this approach, the first thing is to try this kind of content : content = "<img src=\"a.jpg\"/>". But it doesn't work. It's because of 2 things. One is that you have to put a.jpg in the right directory location (sometimes is : right under the folder of your model project, and sometimes is : right under the root drive(eg,E:\) of your class). The other is that you have to set height and width to the img, otherwise the img never shows in your pdf when the height or width of the img is bigger than your pdf.eg.content = "<img height="300",width="300" src=\"a.jpg\"/>".
Later you can see I set it in my ImageProvider:
image.scaleToFit(300f, 300f);
former code (with issue):
Document doc = new Document(PageSize.LETTER); //create a new docPdfWriter writer = PdfWriter.getInstance(doc,os); //create a writer and associated with docdoc.open(); //open the docString content = getContent(paper.getContentId());content = handleImageContent(doc,content);//HTMLWorker approachHTMLWorker htmlWorker = new HTMLWorker(doc);htmlWorker.parse(new StringReader(content));
some other related methods :
private String handleImageContent(Document doc,String content) throws BadElementException, MalformedURLException, IOException, DocumentException {// String content1 = "<br/><img src=\"/mps/imageServlet?id=1131\"/><br/><br/><br/><a href=\"/mps/attachmentServlet?id=1132&name=a.jpeg\" type=\"image/jpeg\" target=\"_blank\">a.jpeg</a><br/>"; List<String> ids = getImageIdList(content);//get all image id list List<Image> images = new ArrayList<Image>(); RichTextService richTextService = (RichTextService)BeanLocator.getBean("richTextService"); for(String id:ids) { logger.debug("IUploadService.handleImageContent->ids.id:"+id); OutputStream os = new ByteArrayOutputStream(); richTextService.getImage(Integer.parseInt(id), os); images.add(getImgObjFromIO(os)); } addImagesToPDF(doc,images);//handle <img/> tag with Image class and add all the images to doc return removeImageFromContent(content,ids);// remove src attribute from <img/> tag, so <img/> won't show images and doesn't show error if the image location doesn't exist }
private void addImagesToPDF(Document doc,List<Image> images) throws DocumentException { for(Image img : images) { doc.add(img); } }
change to code (fix issue):
Document doc = new Document(PageSize.LETTER); //create a new docPdfWriter writer = PdfWriter.getInstance(doc,os); //create a writer and associated with docdoc.open(); //open the doc//ImageProvider approachParagraph p = new Paragraph();HashMap<String,Object> map = new HashMap<String,Object>();map.put(HTMLWorker.IMG_PROVIDER, new ImgProvider());List<Element> list = HTMLWorker.parseToList(new StringReader(handlePTag(content)),null,map);for(Element e : list) { p.add(e);}doc.add(p);
ImageProvider:
package sg.gov.cpf.corpapp.iwr.mps.common.util;import com.itextpdf.text.DocListener;import com.itextpdf.text.Image;import com.itextpdf.text.html.simpleparser.ChainedProperties;import com.itextpdf.text.html.simpleparser.ImageProvider;import java.io.ByteArrayOutputStream;import java.io.OutputStream;import java.util.Map;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;import sg.gov.cpf.corpapp.iwr.mps.common.base.BeanLocator;import sg.gov.cpf.corpapp.iwr.mps.service.richtext.RichTextService;public class ImgProvider implements ImageProvider{ private static Log logger = LogFactory.getLog(ImgProvider.class); public Image getImage(String string, Map<String, String> map, ChainedProperties chainedProperties, DocListener docListener) { System.out.println("ImgProvider.getImage()->string : " + string); logger.debug("ImgProvider.getImage()->string" + string); String id = string.substring(string.indexOf("id=")+3); System.out.println("ImgProvider.getImage()->id : " + id); logger.debug("ImgProvider.getImage()->id : " + id); RichTextService richTextService = (RichTextService)BeanLocator.getBean("richTextService"); OutputStream os = new ByteArrayOutputStream(); richTextService.getImage(Integer.parseInt(id), os); Image image = null; byte by[] = ((ByteArrayOutputStream)os).toByteArray(); try { image = Image.getInstance(by); image.scaleToFit(300f, 300f); os.close(); } catch (Exception e) { e.printStackTrace(); } return image; }}
example test content:
content = "<img src=\"/mps/imageServlet?id=1131\"/>";content = "<p><img style=\"WIDTH: 429px; HEIGHT: 1402px\" src=\"http://intrauat.cpf.gov.sg/mps/imageServlet?id=1397\" width=\"1897\" height=\"2160\"/></p>";
Runnable Test Class for Issue 3:
package sg.gov.cpf.corpapp.iwr.mps.service;import com.itextpdf.text.DocListener;import com.itextpdf.text.Document;import com.itextpdf.text.Element;import com.itextpdf.text.Image;import com.itextpdf.text.PageSize;import com.itextpdf.text.html.simpleparser.ChainedProperties;import com.itextpdf.text.html.simpleparser.HTMLWorker;import com.itextpdf.text.html.simpleparser.ImageProvider;import com.itextpdf.text.html.simpleparser.StyleSheet;import com.itextpdf.text.pdf.PdfWriter;import java.io.File;import java.io.FileOutputStream;import java.io.StringReader;import java.util.HashMap;import java.util.List;import java.util.Map;public class TestPdf implements ImageProvider { public Image getImage(String string, Map<String, String> map, ChainedProperties chainedProperties, DocListener docListener) { System.out.println(string); Image image = null; try { image = Image.getInstance(string); image.scaleToFit(300f, 300f); } catch (Exception e) { e.printStackTrace(); } return image; } public static void testGeneratePdf() throws Exception { String content = "Testing Img<br/><p><img src=\"a.jpg\" width='300' height='300'/></p><p><br/>middle<br/><img src=\"b.jpg\" width='300' height='300'/></p>end"; // String content = "Testing Img<br/><p><img src=\"/mps/imageServlet?id=1397\"/>"; Document doc = new Document(PageSize.LETTER); File f = new File("e:/dayna.pdf"); System.out.println(f.getAbsolutePath()); PdfWriter.getInstance(doc, new FileOutputStream(f)); doc.open(); StyleSheet ss = new StyleSheet(); HashMap<String, Object> map = new HashMap<String, Object>(); map.put(HTMLWorker.IMG_PROVIDER, new TestPdf()); List<Element> list = HTMLWorker.parseToList(new StringReader(content), ss, map); for (Element e : list) { doc.add(e); } doc.close(); } public static void main(String[] args) throws Exception { testGeneratePdf(); }}
a.jpg and b.jpg are put at location : E:\Workspace\mywork\MPS_SG_LOCAL\CpfAppsMpsModel
Result:
- iText parse html with RichText and images to pdf
- itext html to pdf
- Creating PDF with Java and iText - Tutorial
- Export GridView with Images from database to Word Excel and PDF Formats
- itext html 生成pdf
- iText html转pdf
- iText 实现 html 转换 pdf
- itext 将html转成pdf
- iText实现html转pdf
- itext to control pdf pages
- How to convert docx/odt to pdf/html with Java?
- java itext html转pdf[续篇]
- 利用itext进行html转pdf
- 使用iText把html转成pdf
- Java IText实现HTML转换PDF
- itext将html转pdf中文支持
- itext实现HTML转换为PDF
- itext使用html标签生成pdf文件
- 定制自动开关机设置界面
- 国内首部android 5.0视频教程全集下载
- 织梦CMS百度编辑器(Ueditor)图片无水印解决办法
- ABAP指针
- 【电脑分区】——不用PE怎么进行电脑分区
- iText parse html with RichText and images to pdf
- jdk与jre的区别
- Mac Eclipse 配置Tomcat 出现localhost:8080/ 404错误
- 思科交换机配置TELNET连接
- 再谈Javascript原型继承
- JQUERY读取表格中行与列里控件的值
- COPY JAVA 虚拟机
- 使用 Nginx 提升网站访问速度
- 外关联sql