PyPDF2

来源:互联网 发布:mac 五笔输入法 联想 编辑:程序博客网 时间:2024/05/16 01:46

安装

直接使用 pip 安装就可以了
pip install PyPDF2

PyPDF2 包含了 PdfFileReader PdfFileMerger PageObject PdfFileWriter 四个常用的主要 Class。

 简单读写 PDF

from PyPDF2 import PdfFileReader, PdfFileWriterinfn = 'infn.pdf'outfn = 'outfn.pdf'# 获取一个 PdfFileReader 对象pdf_input = PdfFileReader(open(infn, 'rb'))# 获取 PDF 的页数page_count = pdf_input.getNumPages()print(page_count)# 返回一个 PageObjectpage = pdf_input.getPage(i)# 获取一个 PdfFileWriter 对象pdf_output = PdfFileWriter()# 将一个 PageObject 加入到 PdfFileWriter 中pdf_output.addPage(page)# 输出到文件中pdf_output.write(open(outfn, 'wb'))

应用实例 合并分割 PDF

from PyPDF2 import PdfFileReader, PdfFileWriterdef split_pdf(infn, outfn):    pdf_output = PdfFileWriter()    pdf_input = PdfFileReader(open(infn, 'rb'))    # 获取 pdf 共用多少页    page_count = pdf_input.getNumPages()    print(page_count)    # 将 pdf 第五页之后的页面,输出到一个新的文件    for i in range(5, page_count):        pdf_output.addPage(pdf_input.getPage(i))    pdf_output.write(open(outfn, 'wb'))def merge_pdf(infnList, outfn):    pdf_output = PdfFileWriter()    for infn in infnList:        pdf_input = PdfFileReader(open(infn, 'rb'))        # 获取 pdf 共用多少页        page_count = pdf_input.getNumPages()        print(page_count)        for i in range(page_count):            pdf_output.addPage(pdf_input.getPage(i))    pdf_output.write(open(outfn, 'wb'))if __name__ == '__main__':    infn = 'infn.pdf'    outfn = 'outfn.pdf'    split_pdf(infn, outfn)

应用实例源代码可以在 github.com/xchaoinfo/Py 找到。

Refer: PyPDF2 Documentation

转自:https://zhuanlan.zhihu.com/p/26647491

Easy Concatenation with pdfcat

PyPDF2 contains a growing variety of sample programs meant to demonstrate its features. It also contains useful scripts such as pdfcat, located within the Scripts folder. This script makes it easy to concatenate PDF files by using Python slicing syntax. Because we are slicing PDF pages, we refer to the slices as page ranges.

Page range expression examples:

:all pages-1last page22just the 23rd page:-1all but the last page0:3the first three pages-2second-to-last page:3the first three pages-2:last two pages5:from the sixth page onward-3:-1third & second to last

The third stride or step number is also recognized:

::20 2 4 ... to the end1:10:21 3 5 7 9::-1all pages in reverse order3:0:-13 2 1 but not 02::-12 1 0

Usage for pdfcat is as follows:

>>> pdfcat [-h] [-o output.pdf] [-v] input.pdf [page_range...] ...

You can add as many input files as you like. You may also specify as many page ranges as needed for each file.

Optional arguments:
-h--helpShow the help message and exit-o--outputFollow this argument with the output PDF file. Will be created if it doesn’t exist.-v--verboseShow page ranges as they are being read

Examples:

>>> pdfcat -o output.pdf head.pdf content.pdf :6 7: tail.pdf -1

Concatenates all of head.pdf, all but page seven of content.pdf, and the last page of tail.pdf, producing output.pdf.

>>> pdfcat chapter*.pdf >book.pdf

You can specify the output file by redirection.

>>> pdfcat chapter?.pdf chapter10.pdf >book.pdf

In case you don’t want chapter 10 before chapter 2.



原创粉丝点击