alfresco 文件转换和元数据的抽取

来源:互联网 发布:找不到占用80端口进程 编辑:程序博客网 时间:2024/04/30 07:01

 

 

 

TextToPdfContentTransformer                 text -> pdf               http://www.pdfbox.org/ PDFBox
TextMiningContentTransformer               doc -> txt                 http://www.textmining.org/ TextMining
StringExtractingContentTransformer       textual format(text/plain application/x-javascript text/*) -> txt
RuntimeExecutableContentTransformer  RuntimeExec()          动态执行外部操作系统命令行的指令
PoiHssfContentTransformer                    XLS -> Text               http://jakarta.apache.org/poi/ POI
PdfToImageContentTransformer             PDF -> PNG               http://www.pdfbox.org/ PDFBox
PdfBoxContentTransformer                     PDF -> Text               http://www.pdfbox.org/ PDFBox

OpenOfficeContentTransformer             OpenOffice格式互转,
                                                              把Word/RTF/OpenDocument Text转换成PDF/Word/RTF/OpenDocument Text格式;
                                                              把Excel/OpenDocument Spreadsheet转换成PDF/Excel/OpenDocument Spreadsheet格式;
                                                              把PowerPoint/OpenDocument Presentation转换成PDF/Flash/PowerPoint/OpenDocument Presentation;

                                                                                                  http://sourceforge.net/projects/joott/ JOOConverter

                                                                                                  http://sourceforge.net/projects/joott/ JOOConverter
MediaWikiContentTransformer               MEDIAWIKI -> HTML   http://matheclipse.org/en/Java_Wikipedia_API
MailContentTransformer                         MSG -> TEXT
HtmlParserContentTransformer              HTML -> TEXT