Extract Text from Word Using Java

Andrew Wilson
1 min readApr 25, 2022

TXT is a common text format that can be used on many computers and mobile devices. The TXT document is known for its small size, and it makes the storage of text content more convenient. This article will demonstrate how to extract the text content in a Word document and save it as .txt format by using Free Spire.Doc for Java.

Import JAR Dependency

Method 1: Download the free library and unzip it. Then add the Spire.Doc.jar file to your Java application as dependency.
Method 2: Directly add the jar dependency to maven project by adding the following configurations to the pom.xml.

<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>http://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.doc.free</artifactId>
<version>5.2.0</version>
</dependency>
</dependencies>

Sample Code

import com.spire.doc.Document;

import java.io.FileWriter;
import java.io.IOException;

public class ExtractText {

public static void main(String[] args) throws IOException {

//Load Word document
Document document = new Document();
document.loadFromFile("Island.docx");

//Get text from document as string
String text=document.getText();

//Write string to a .txt file
writeStringToTxt(text," Extracted.txt");
}

public static void writeStringToTxt(String content, String txtFileName) throws IOException{

FileWriter fWriter= new FileWriter(txtFileName,true);
try {
fWriter.write(content);
}catch(IOException ex){
ex.printStackTrace();
}finally{
try{
fWriter.flush();
fWriter.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}

--

--

Andrew Wilson

Explore C#, Java and Python solutions for processing Word/Excel/PowerPoint/PDF files.