Python Tutorial: Converting HTML to PDF, Images, XML and XPS

Andrew Wilson
2 min readMar 28, 2024

--

Converting HTML files to other formats like PDF, images, XML, or XPS is a common task in the world of document management and digital publishing. Each format has its own advantages and use cases, ranging from creating print-ready documents to archiving HTML files for offline viewing or sharing on different platforms.

In this article, we will explore a solution for converting HTML to PDF, images (jpg/png), XML, or XPS formats using Python.

Install the Python Library for Converting HTML Files

The solution to convert HTML to the above-mentioned file formats requires a third-party library Spire.Doc for Python. We can install it easily via the below pip command.

pip install Spire.Doc

Spire.Doc for Python is a Word document processing library based on the Python language. It provides a rich set of APIs to read, write, modify and create Doc or Docx documents.

With it, we can load HTML files using theLoadFromFile(fileName: string, FileFormat.Html, XHTMLValidationType.none) method, and then convert them to the specified file format through the SaveToFile(fileName: string, fileFormat: FileFormat) method. To convert HTML to JPG/PNG images you need to use the SaveImageToStreams() method.

The following are the code samples: 👇

1. Converting HTML to PDF in Python

from spire.doc import *
from spire.doc.common import *

# Load an HTML file
document = Document()
document.LoadFromFile("input.html", FileFormat.Html, XHTMLValidationType.none)

# Save the HTML file as PDF
document.SaveToFile("HtmltoPdf.pdf", FileFormat.PDF)
document.Close()

2. Convert HTML to JPN/PNG Images in Python

from spire.doc import *
from spire.doc.common import *
import io

inputFile = "Template.html"
outputFile = "HtmlToImage.png"

# Load an HTML file
document = Document()
document.LoadFromFile(inputFile, FileFormat.Html, XHTMLValidationType.none)

# Save the HTML file as image streams
imageStream = document.SaveImageToStreams(0, ImageType.Bitmap)

# Save the image stream to a specified image format
with open(outputFile,'wb') as imageFile:
imageFile.write(imageStream.ToArray())
document.Close()

3. Convert HTML to XML in Python

from spire.doc import *
from spire.doc.common import *

# Load an HTML file
document = Document()
document.LoadFromFile("input.html")

# Save the HTML file as a XML file
document.SaveToFile("HtmltoXml.xml", FileFormat.Xml)
document.Close()

4. Convert HTML to XPS in Python

from spire.doc import *
from spire.doc.common import *

# Load an HTML file
document = Document()
document.LoadFromFile("input.html", FileFormat.Html, XHTMLValidationType.none)

# Save the HTML file as XPS
document.SaveToFile("HtmltoXps.xps", FileFormat.XPS)
document.Close()

The above 4 examples show how to use Python to convert HTML to PDF, images, XML and XPS formats. If you need to convert HTML to Word documents, refer to:

Explore other features of converting or processing Word documents with the Spire.Doc for Python library:

--

--

Andrew Wilson

Explore C#, Java and Python solutions for processing Word/Excel/PowerPoint/PDF files.