C#读取doc,pdf,ppt文件
来源:互联网 发布:倩女幽魂2mac版 编辑:程序博客网 时间:2024/05/16 11:55
doc pdf ppt与 txt之间的转换 :
组件的作用一般是将文件读出成字符格式,并不是单纯的转换文件名后缀,所以需要将读出的东西写入txt文件 。
添加office引用
.net中对office中的word及ppt进行编程时,确保安装office时已经安装了word,ppt可编程组件(自定义安装时可查看)或者安装“Microsoft Office 2003 Primary Interop Assemblies”
安装后,在编程页面添加引用:
添加引用-com—microsoft powerpoint object 11.0 libaray/word 11.0 object library;
还得添加office组件
using Microsoft.Office.Interop.Word;
using Microsoft.Office.Interop.PowerPoint;
using org.pdfbox.pdmodel;
using org.pdfbox.util;
using Microsoft.Office.Interop.Word;
using Microsoft.Office.Interop.PowerPoint;
public void pdf2txt(FileInfo file,FileInfo txtfile)
{
PDDocument doc = PDDocument.load(file.FullName);
PDFTextStripper pdfStripper = new PDFTextStripper();
string text = pdfStripper.getText(doc);
StreamWriter swPdfChange = new StreamWriter(txtfile.FullName, false, Encoding.GetEncoding("gb2312"));
swPdfChange.Write(text);
swPdfChange.Close();
}
对于doc文件中的表格,读出的结果是去除掉了网格线,内容按行读取。
public void word2text(FileInfo file,FileInfo txtfile)
{
object readOnly = true;
object missing = System.Reflection.Missing.Value;
object fileName = file.FullName;
Microsoft.Office.Interop.Word.ApplicationClass wordapp = new Microsoft.Office.Interop.Word.ApplicationClass();
Document doc = wordapp.Documents.Open(ref fileName,
ref missing, ref readOnly, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing);
string text = doc.Content.Text;
doc.Close(ref missing, ref missing, ref missing);
wordapp.Quit(ref missing, ref missing, ref missing);
StreamWriter swWordChange = new StreamWriter(txtfile.FullName, false, Encoding.GetEncoding("gb2312"));
swWordChange.Write(text);
swWordChange.Close();
}
public void ppt2txt(FileInfo file, FileInfo txtfile)
{
Microsoft.Office.Interop.PowerPoint.Application pa = new Microsoft.Office.Interop.PowerPoint.ApplicationClass();
Microsoft.Office.Interop.PowerPoint.Presentation pp = pa.Presentations.Open(file.FullName,
Microsoft.Office.Core.MsoTriState.msoTrue,
Microsoft.Office.Core.MsoTriState.msoFalse,
Microsoft.Office.Core.MsoTriState.msoFalse);
string pps = "";
StreamWriter swPPtChange = new StreamWriter(txtfile.FullName, false, Encoding.GetEncoding("gb2312"));
foreach (Microsoft.Office.Interop.PowerPoint.Slide slide in pp.Slides)
{
foreach (Microsoft.Office.Interop.PowerPoint.Shape shape in slide.Shapes)
pps += shape.TextFrame.TextRange.Text.ToString();
}
swPPtChange.Write(pps);
swPPtChange.Close();
}
读取不同类型的文件
public StreamReader text2reader(FileInfo file)
{
StreamReader st = null;
switch (file.Extension.ToLower())
{
case ".txt":
st = new StreamReader(file.FullName, Encoding.GetEncoding("gb2312"));
break;
case ".doc":
FileInfo wordfile = new FileInfo(@"E:/my programs/200807program/FileSearch/App_Data/word2txt.txt");//不能使用相对路径,想办法改进
word2text(file, wordfile);
st = new StreamReader(wordfile.FullName, Encoding.GetEncoding("gb2312"));
break;
case ".pdf":
FileInfo pdffile = new FileInfo(@"E:/my programs/200807program/FileSearch/App_Data/pdf2txt.txt");
pdf2txt(file, pdffile);
st = new StreamReader(pdffile.FullName, Encoding.GetEncoding("gb2312"));
break;
case".ppt":
FileInfo pptfile = new FileInfo(@"E:/my programs/200807program/FileSearch/App_Data/ppt2txt.txt");
ppt2txt(file,pptfile);
st = new StreamReader(pptfile.FullName,Encoding.GetEncoding("gb2312"));
break;
}
return st;
}
- C#读取doc,pdf,ppt文件
- C#读取doc,pdf,ppt文件
- C#读取doc,pdf,ppt文件 .
- C#读取doc,pdf,ppt,TXT文件
- C#读取doc,pdf,ppt文件
- VC 读取 doc,xls,ppt,pdf等格式的文件
- poi读取doc、ppt、pptx、xsl、xslx文件的内容,pdfbox读取pdf内容,读取txt文件内容
- doc,docx,pdf,ppt等文件类型读取方法
- C#读取HDF5文件.doc
- lucene pdf+doc+ppt+xls+txt+多层文件
- C#读取pdf文件
- C#读取pdf文件
- C# 打开pdf、doc。xls.文件
- java读取 doc、ppt、excel
- UIWebView打开doc和PDF文件,实现本地读取
- 利用openoffice转换ppt、doc转化pdf
- 利用openoffice转换ppt、doc转化pdf
- doc,excel,ppt转存pdf并预览
- Epoll 异步
- 提高PHP编程效率的53个要点(转)
- linux文件分类--以颜色和符号分类(ls时)
- java的static代码段【转】
- Oracle10g SQL优化辅助工具之 set autotrace
- C#读取doc,pdf,ppt文件
- PHPExcel内存泄漏问题
- 关于数据库的差异备份和还原
- DHCP(动态主机配置协议)基本信息
- LotusScript Language (一)
- PHP 编码规范手册
- windows常用快捷键
- Unable to resolve target 'android-2' 问题解决
- linux下find查找文件