How to Convert PDF to Excel XLSX with Python — A Complete Guide

Andrew Wilson
2 min readApr 10, 2024

--

When you have tabular data in a PDF document that you want to perform calculations, sorting, filtering, and other operations, you can convert the PDF to an Excel file.

In this article, we will explore how to convert PDF to Excel xlsx in Python using a third-party library.

Python Library for PDF to Excel Conversion

The third-party library we use is Spire.PDF for Python, which is a professional Python library to create, process and convert PDF file. It can be installed via the following pip command.

pip install Spire.PDF

Convert PDF to Excel File

The Python PDF library provides the PdfDocument.SaveToFile() method to save PDF files to Excel xlsx format. During conversion, in order to control the accuracy of the output results, you can set conversion options via the XlsxLineLayoutOptions class.

The constructor of this class accepts the following five parameters.

  1. convertToMultipleSheet (bool): Indicates whether to render multiple PDF pages into a single Excel worksheet.
  2. rotatedText (bool): Indicates whether to display rotated text.
  3. splitCell (bool): Indicates whether a PDF table cell containing multiple lines of text will be split into multiple rows in Excel.
  4. wrapText (bool): Indicates whether to wrap text in the Excel cells.
  5. overlapText (bool): Indicates whether to display overlapping text.

Python Code to Convert PDF to Excel:

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load a PDF document
pdf.LoadFromFile("quote.pdf")

# Create an XlsxLineLayoutOptions object to specify the conversion options
# Parameters: convertToMultipleSheet, rotatedText, splitCell, wrapText, overlapText
convertOptions = XlsxLineLayoutOptions(True, True, False, True, False)

# Set the conversion options
pdf.ConvertOptions.SetPdfToXlsxOptions(convertOptions)

# Save the PDF document to Excel XLSX format
pdf.SaveToFile("PdfToExcel.xlsx", FileFormat.XLSX)
pdf.Close()
Python PDF to Excel

The PDF Python library supports converting PDF to other file formats as well. If you are interested in learning more about it, check below.

--

--

Andrew Wilson

Explore C#, Java and Python solutions for processing Word/Excel/PowerPoint/PDF files.