Python Tutorial: Converting HTML to PDF, Images, XML and XPS
Converting HTML files to other formats like PDF, images, XML, or XPS is a common task in the world of document management and digital publishing. Each format has its own advantages and use cases, ranging from creating print-ready documents to archiving HTML files for offline viewing or sharing on different platforms.
In this article, we will explore a solution for converting HTML to PDF, images (jpg/png), XML, or XPS formats using Python.
Install the Python Library for Converting HTML Files
The solution to convert HTML to the above-mentioned file formats requires a third-party library Spire.Doc for Python. We can install it easily via the below pip command.
pip install Spire.Doc
Spire.Doc for Python is a Word document processing library based on the Python language. It provides a rich set of APIs to read, write, modify and create Doc or Docx documents.
With it, we can load HTML files using theLoadFromFile(fileName: string, FileFormat.Html, XHTMLValidationType.none)
method, and then convert them to the specified file format through the SaveToFile(fileName: string, fileFormat: FileFormat)
method. To convert HTML to JPG/PNG images you need to use the SaveImageToStreams()
method.
The following are the code samples: 👇
1. Converting HTML to PDF in Python
from spire.doc import *
from spire.doc.common import *
# Load an HTML file
document = Document()
document.LoadFromFile("input.html", FileFormat.Html, XHTMLValidationType.none)
# Save the HTML file as PDF
document.SaveToFile("HtmltoPdf.pdf", FileFormat.PDF)
document.Close()
2. Convert HTML to JPN/PNG Images in Python
from spire.doc import *
from spire.doc.common import *
import io
inputFile = "Template.html"
outputFile = "HtmlToImage.png"
# Load an HTML file
document = Document()
document.LoadFromFile(inputFile, FileFormat.Html, XHTMLValidationType.none)
# Save the HTML file as image streams
imageStream = document.SaveImageToStreams(0, ImageType.Bitmap)
# Save the image stream to a specified image format
with open(outputFile,'wb') as imageFile:
imageFile.write(imageStream.ToArray())
document.Close()
3. Convert HTML to XML in Python
from spire.doc import *
from spire.doc.common import *
# Load an HTML file
document = Document()
document.LoadFromFile("input.html")
# Save the HTML file as a XML file
document.SaveToFile("HtmltoXml.xml", FileFormat.Xml)
document.Close()
4. Convert HTML to XPS in Python
from spire.doc import *
from spire.doc.common import *
# Load an HTML file
document = Document()
document.LoadFromFile("input.html", FileFormat.Html, XHTMLValidationType.none)
# Save the HTML file as XPS
document.SaveToFile("HtmltoXps.xps", FileFormat.XPS)
document.Close()
The above 4 examples show how to use Python to convert HTML to PDF, images, XML and XPS formats. If you need to convert HTML to Word documents, refer to:
Explore other features of converting or processing Word documents with the Spire.Doc for Python library: