Class pdfFileProcessor

java.lang.Object
edu.odu.cs.cs350.pdfFileProcessor

public class pdfFileProcessor extends Object
A service class to extract text from PDF files using Apache PDFBox. Modified from https://github.com/tvalva/pdfwordscan
  • Constructor Details

    • pdfFileProcessor

      public pdfFileProcessor()
  • Method Details

    • processFile

      public static Document processFile(File inFile) throws IOException
      Extract text from the PDF file and return it as a Document object.
      Parameters:
      inFile - the PDF file to be processed
      Returns:
      a Document object containing the extracted text and file name
      Throws:
      IOException - if there is an error reading the PDF