Read/Extract Text & Images from Word in C#

4 min readNov 20, 2024

When processing Word documents, you may occasionally need to extract document content, including text and images, for reuse in other projects, documents, or marketing materials.

Manual extraction is both cumbersome and time-consuming, and for large or repetitive tasks, automation of text and image extraction can greatly increase productivity. In this article, we’ll cover how to programmatically extract text and images from a Word document in C# using a free third-party library.

Extract Text from a Specified Paragraph in C#
Extract Text from a Word document in C#
Extract Images from a Word document in C#

Free .NET Word Library

The free third-party library we need to use is called Free Spire.Doc for .NET. You can either download the library from the below link to manually add reference to your project, or install it directly via NuGet.

Downloads - Free Spire.Doc

Download free .NET/Wpf Word library to read, create, manipulate, convert & print Microsoft Word documents.

www.e-iceblue.com

Extract Text from a Specified Paragraph in C#

The Paragraph.Text property can be used to retrieve the text content of a specified paragraph. The following are the steps to extract text from a Word paragraph and export to a .txt file.

Import the necessary namespaces;
Load a Word document through the LoadFromFile() method;
Create a StringBuilder instance to store extracted text;
Access a specified section, and then access a specified paragraph in the section;
Get the text of the paragraph using the Paragraph.Text property;
Append the extracted text to the StringBuilder instance;
Write the text in the StringBuilder instance to a .txt file.

C# code:

using Spire.Doc;
using Spire.Doc.Documents;
using System.Text;
using System.IO;

namespace ExtractParagraphText
{
    class Program
    {
        static void Main(string[] args)
        {
            // Load a Word document 
            Document doc = new Document();
            doc.LoadFromFile("Roche Limit.docx");

            // Create a StringBuilder instance to store extracted text
            StringBuilder sb = new StringBuilder();

            // Get the first section
            Section section = doc.Sections[0];

            // Get the second paragraph in the section
            Paragraph paragraph = section.Paragraphs[1];

            // Get text from the paragraph and append to the StringBuilder instance
            sb.AppendLine(paragraph.Text);

            // Write to a text file
            File.WriteAllText("ParagraphText.txt", sb.ToString());
        }
    }
}

Extract the text of the second paragraph in Word with C#

Extract Text from a Word document in C#

The free .NET Word library also provides a simple method Document.GetText() to retrieve the text content of an entire Word document. The following are the steps to extract text from a Word Document and export to a .txt file.

Import the necessary namespaces;
Load a Word document through the LoadFromFile() method;
Create a StringBuilder instance to store extracted text;
Get the text of the Word document using the Document.GetText() method;
Append the extracted text to the StringBuilder instance;
Write the text in the StringBuilder instance to a .txt file.

C# code:

using Spire.Doc;
using System.Text;
using System.IO;

namespace ExtractWordText
{
    class Program
    {
        static void Main(string[] args)
        {
            // Load a Word document 
            Document doc = new Document();
            doc.LoadFromFile("Roche Limit.docx");

            // Create a StringBuilder instance to store extracted text
            StringBuilder sb = new StringBuilder();

            // Get text from the Word document
            string text = doc.GetText();

            // Append the extracted text to the StringBuilder instance
            sb.AppendLine(text);

            // Write to a text file
            File.WriteAllText("ExtractWordText.txt", sb.ToString());
        }
    }
}

Extract the text of the entire Word document with C#

Extract Images from a Word document in C#

To extract images from a Word document, you need to iterate through each child objects to determine if it is a DocPicture. If so, then you can save the image out of the document. The following are the steps to extract imaged from Word and save to a specified file path.

Import the necessary namespaces;
Load a Word document through the LoadFromFile() method;
Iterate through each section and then each paragraph of each section;
Iterate through each child objects of a paragraph;
Determine if a specific child object is a DocPicture. If yes, save the image out of the document using DocPicture.Image.Save(String, ImageFormat) method.

C# code:

using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;
using System;

namespace ExtractImages
{
    class Program
    {
        static void Main(string[] args)
        {
            //Load a Word document
            Document doc = new Document();
            doc.LoadFromFile("Roche Limit.docx");
            
            int index = 0;
            
            // Iterate through each section of document
            foreach (Section section in doc.Sections)
            {
                // Iterate through each paragraph of section
                foreach (Paragraph paragraph in section.Paragraphs)
                {
                    // Iterate through each document object of a specific paragraph
                    foreach (DocumentObject docObject in paragraph.ChildObjects)
                    {
                        // Dertermine if the DocumentObjectType is picture
                        if (docObject.DocumentObjectType == DocumentObjectType.Picture)
                        {
                            // If yes, save the image out of the document
                            DocPicture picture = docObject as DocPicture;
                            picture.Image.Save(string.Format("Images\\image_{0}.png", index), System.Drawing.Imaging.ImageFormat.Png);
                            index++;
                        }
                    }
                }
            }
        }
    }
}

Extract all images from a Word document in C#

If you want to explore more Word document processing features with C# or VB.NET, check here:

Read/Extract Text & Images from Word in C#

Free .NET Word Library

Downloads - Free Spire.Doc

Download free .NET/Wpf Word library to read, create, manipulate, convert & print Microsoft Word documents.

Extract Text from a Specified Paragraph in C#

Extract Text from a Word document in C#

Extract Images from a Word document in C#

Spire.Doc for .NET Program Guide Content

Spire.Doc for .NET is a professional Word .NET library specifically designed for developers to create, read, write…

Written by Andrew Wilson

No responses yet