PDFBox Reading Text

One of the main features of PDFBox library is its ability to quickly and accurately extract text from an existing PDF document. In this section, we will learn how to read text from an existing document in the PDFBox library by using a Java Program. The PDF document may contain text, animation, and images etc as its text contents. We can extract text from the existing PDF document by using getText() method of the PDFTextStripper class.

import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class ReadingText {

   public static void main(String args[]) throws IOException {

      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/new.pdf");
      PDDocument document = PDDocument.load(file);

      //Instantiate PDFTextStripper class
      PDFTextStripper pdfStripper = new PDFTextStripper();

      //Retrieving text from PDF document
      String text = pdfStripper.getText(document);
      System.out.println(text);

      //Closing the document
      document.close();

   }
}

Pin it

About Mariano

I'm Ethan Mariano a software engineer by profession and reader/writter by passion.I have good understanding and knowledge of AngularJS, Database, javascript, web development, digital marketing and exploring other technologies related to Software development.

PDFBox Reading Text

PDFBox Reading Text

About Mariano

0 comments:

Featured post

Political Full Forms List

Recent comments

PDFBox Reading Text

PDFBox Reading Text

About Mariano

RELATED POSTS

0 comments:

Featured post

Political Full Forms List