There are many reasons for converting PDF to Excel. Probably the most important reason is that you can easily manipulate and analyze data in PDFs by using the data calculation, visualization and analysis tools provided by Microsoft Excel. For example, you can use formulas on the data, create charts based on the data, apply conditional formatting to the data, and many more. In this article, I will explain how to programmatically convert PDF to Excel with Formatting in Java.
- Convert PDF to Excel with Formatting in Java
- Convert a Multi-Page PDF to a Single Excel Worksheet in Java
Add Dependencies
In order to convert PDF to Excel, this article uses a third-party API named Spire.PDF for Java. Before coding, you need to add needed dependencies for including Spire.PDF for Java into your Java project. There are two ways to do that.
Method 1: If you are using maven, you can easily import the JAR file of Spire.PDF for Java into your application by adding the following code to your project’s pom.xml file.
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>8.12.6</version>
</dependency>
</dependencies>
Method 2: If you are not using maven, you can download the latest version of Spire.PDF for Java from this link, extract the zip file and then import the Spire.Pdf.jar file under the lib folder into your project as a dependency.
Convert PDF to Excel with Formatting in Java
The PdfDocument.saveToFile(String, FileFormat) method in Spire.PDF for Java is used to convert a PDF document to other file formats. You can use this method to easily convert a PDF to Excel with formatting by specifying the FileFormat as XLSX.
The following are the detailed steps:
- Initialize an instance of the PdfDocument class.
- Load a PDF document using PdfDocument.loadFromFile() method.
- Save the PDF document to Excel XLSX format using PdfDocument.saveToFile(String, FileFormat) method.
import com.spire.pdf.FileFormat;
import com.spire.pdf.PdfDocument;
public class ConvertPdfToExcel {
public static void main(String[] args) {
//Initialize an instance of the PdfDocument class
PdfDocument pdf = new PdfDocument();
//Load a PDF document
pdf.loadFromFile("C:\\Users\\Administrator\\Desktop\\Sample.pdf");
//Save the PDF document to XLSX format
pdf.saveToFile("PdfToExcel.xlsx", FileFormat.XLSX);
}
}
Convert a Multi-Page PDF to a Single Excel Worksheet in Java
If your PDF document has multiple pages and you want to convert it to a single Excel worksheet, you can use the PdfDocument.getConvertOptions().setPdfToXlsxOptions() method to set the PDF to XLSX conversion options, and then call the PdfDocument.saveToFile(String, FileFormat) method to convert the PDF to XLSX format with the specified options you have set.
The following are the detailed steps:
- Initialize an instance of the PdfDocument class.
- Load a PDF document using PdfDocument.loadFromFile() method.
- Set the PDF to XLSX conversion options using the PdfDocument.getConvertOptions().setPdfToXlsxOptions() method.
- Save the PDF document to Excel XLSX format using the PdfDocument.saveToFile(String, FileFormat) method.
import com.spire.pdf.FileFormat;
import com.spire.pdf.PdfDocument;
import com.spire.pdf.conversion.XlsxLineLayoutOptions;
public class ConvertMultiPagePdfToSingleExcelWorksheet {
public static void main(String[] args) throws Exception {
// Initialize an instance of the PdfDocument class
PdfDocument pdf = new PdfDocument();
//Load a PDF document
pdf.loadFromFile("C:\\Users\\Administrator\\Desktop\\Members.pdf");
//Set the PDF to XLSX conversion options: rendering multiple pages on a single worksheet
pdf.getConvertOptions().setPdfToXlsxOptions(new XlsxLineLayoutOptions(false,true,true));
//Save the PDF document to XLSX format
pdf.saveToFile("PdfToOneSheet.xlsx", FileFormat.XLSX);
}
}
Conclusion
This article demonstrates how to convert a PDF to Excel as well as how to convert a multi-page PDF to a single-sheet Excel in Java using Spire.PDF for Java API. Apart from the PDF to Excel conversion, you can also use the API to convert PDFs to a variety of other file formats such as DOCX, HTML and PPTX by specifying the FileFormat as DOCX, HTML and PPTX.
Top comments (0)