In this tutorial, we'll walk through a Python script that merges multiple PDF files into one. This functionality is helpful in various scenarios, such as combining reports, invoices, or documents into a single file. To achieve this, we'll use the popular Python library PyPDF2, which allows for easy manipulation of PDF files.
If you're interested in the source code for this project, you can find it on GitHub: Merge_pdf GitHub Repository.
Step 1: Install the Required Library
To begin, you need to install the PyPDF2 library. This library simplifies tasks like merging, splitting, and rotating PDF pages. Open your Command Prompt or Terminal and run the following command:
pip install PyPDF2
This will install the necessary dependency for the script. If you see any warning related to the pip version, it's a good idea to update it, although it is not strictly required for the script to work.
Step 2: Writing the Code
Now let's dive into the code that merges the PDF files.
import PyPDF2
import glob
from pathlib import Path
# Create a list of PDF filepaths
filepaths = glob.glob("files/*.pdf")
# Create a PDF merger object
pdf_merger = PyPDF2.PdfMerger()
# Go through each PDF file and append to the merger object
for filepath in filepaths:
# Append the current PDF file to the merger
pdf_merger.append(filepath)
# Output the merged PDF
with open("merged.pdf", "wb") as output_pdf:
pdf_merger.write(output_pdf)
print("PDFs merged successfully into 'merged.pdf'")
Explanation of the Code:
Import Libraries:
PyPDF2: This library is used to manipulate PDF files in Python.
glob: It helps in searching for PDF files in a specified folder.
Path: From the pathlib module, this provides an easy way to handle file paths.
Gather PDF File Paths:
glob.glob("files/*.pdf") finds all PDF files in the files/ directory and stores their paths in the filepaths list.
Merge PDFs:
A PyPDF2.PdfMerger() object is created to merge the PDFs.
For each file in the filepaths list, the append() method is used to add the PDF to the merger.
Output the Merged PDF:
The merged PDF is written to a new file called merged.pdf using the pdf_merger.write(output_pdf) function.
Run the Script:
After running the script, you will see the message PDFs merged successfully into 'merged.pdf'.
Step 3: Running the Script
Once the script is ready, you can execute it from your Command Prompt or Terminal with the following command:
python main.py
If the script runs successfully, you will see the following output:
C:\projects\merge_pdf>python main.py
PDFs merged successfully into 'merged.pdf'
Conclusion:
By following this tutorial, you should now be able to merge multiple PDF files into a single PDF document using Python and PyPDF2. This approach is ideal for combining reports, documents, or invoices with minimal code. The script is simple yet effective for working with multiple PDFs, and you can easily adapt it for other PDF manipulation tasks.
The source code for this project is available on GitHub: Merge_pdf GitHub Repository .
Feel free to modify the script for tasks like splitting PDFs, rotating pages, or extracting text. The possibilities are endless!
Happy coding!
Top comments (0)