PyPDF2
来源:互联网 发布:mac 五笔输入法 联想 编辑:程序博客网 时间:2024/05/16 01:46
安装
直接使用 pip 安装就可以了
pip install PyPDF2
PyPDF2 包含了 PdfFileReader PdfFileMerger PageObject PdfFileWriter 四个常用的主要 Class。
简单读写 PDF
from PyPDF2 import PdfFileReader, PdfFileWriterinfn = 'infn.pdf'outfn = 'outfn.pdf'# 获取一个 PdfFileReader 对象pdf_input = PdfFileReader(open(infn, 'rb'))# 获取 PDF 的页数page_count = pdf_input.getNumPages()print(page_count)# 返回一个 PageObjectpage = pdf_input.getPage(i)# 获取一个 PdfFileWriter 对象pdf_output = PdfFileWriter()# 将一个 PageObject 加入到 PdfFileWriter 中pdf_output.addPage(page)# 输出到文件中pdf_output.write(open(outfn, 'wb'))
应用实例 合并分割 PDF
from PyPDF2 import PdfFileReader, PdfFileWriterdef split_pdf(infn, outfn): pdf_output = PdfFileWriter() pdf_input = PdfFileReader(open(infn, 'rb')) # 获取 pdf 共用多少页 page_count = pdf_input.getNumPages() print(page_count) # 将 pdf 第五页之后的页面,输出到一个新的文件 for i in range(5, page_count): pdf_output.addPage(pdf_input.getPage(i)) pdf_output.write(open(outfn, 'wb'))def merge_pdf(infnList, outfn): pdf_output = PdfFileWriter() for infn in infnList: pdf_input = PdfFileReader(open(infn, 'rb')) # 获取 pdf 共用多少页 page_count = pdf_input.getNumPages() print(page_count) for i in range(page_count): pdf_output.addPage(pdf_input.getPage(i)) pdf_output.write(open(outfn, 'wb'))if __name__ == '__main__': infn = 'infn.pdf' outfn = 'outfn.pdf' split_pdf(infn, outfn)
应用实例源代码可以在 https://github.com/xchaoinfo/Py-example-by-xchaoinfo 找到。
Refer: PyPDF2 Documentation
转自:https://zhuanlan.zhihu.com/p/26647491
Easy Concatenation with pdfcat
PyPDF2 contains a growing variety of sample programs meant to demonstrate its features. It also contains useful scripts such as pdfcat
, located within the Scripts
folder. This script makes it easy to concatenate PDF files by using Python slicing syntax. Because we are slicing PDF pages, we refer to the slices as page ranges.
Page range expression examples:
:
all pages-1
last page22
just the 23rd page:-1
all but the last page0:3
the first three pages-2
second-to-last page:3
the first three pages-2:
last two pages5:
from the sixth page onward-3:-1
third & second to lastThe third stride or step number is also recognized:
::2
0 2 4 ... to the end1:10:2
1 3 5 7 9::-1
all pages in reverse order3:0:-1
3 2 1 but not 02::-1
2 1 0Usage for pdfcat
is as follows:
You can add as many input files as you like. You may also specify as many page ranges as needed for each file.
- Optional arguments:
-h, --helpShow the help message and exit-o, --outputFollow this argument with the output PDF file. Will be created if it doesn’t exist.-v, --verboseShow page ranges as they are being read
Examples:
Concatenates all of head.pdf
, all but page seven of content.pdf
, and the last page of tail.pdf
, producing output.pdf
.
You can specify the output file by redirection.
In case you don’t want chapter 10 before chapter 2.
- PyPDF2
- python-pypdf2
- PyPDF2解析pdf文件
- PyPDF2提取pdf书签
- 使用 PyPDF2 操作 pdf 文件
- PyPDF2处理pdf文件的一个例子
- PyPDF2.utils.PdfReadError: Unexpected destination '/__WKANCHOR_2'
- pyPdf和pyPdf2在合并pdf时的那个异常
- 【Python】pdf文件处理之“PyPDF2”库简易安装笔记
- python 爬虫 爬取PyPDF2的官方在线文档
- 在windows下安装PyPdf2,将文件夹中的pdf文件合成为一个pdf文件
- HBuilder打包apk时出现中文报错
- Haproxy安装及配置
- android 自定义RadioButton样式
- android 设置TextView多行和单行各自显示不同的Gravity样式;textview.getLineCount()值为0解决
- numpy中的convolve的理解
- PyPDF2
- 【ARM】LED实验——eclipse
- Android打arr包并在项目中引用以及遇到的坑
- 简要概述磁盘相关概念
- 操作系统实验一实验报告
- 企信通PHP HTTP接口 发送短信
- B树
- 总结Mysql中悲观锁是怎么使用的
- 破解NET的四大神器