Package edu.odu.cs.cs350
Class pdfFileProcessor
java.lang.Object
edu.odu.cs.cs350.pdfFileProcessor
A service class to extract text from PDF files using Apache PDFBox. Modified
from https://github.com/tvalva/pdfwordscan
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic DocumentprocessFile(File inFile) Extract text from the PDF file and return it as a Document object.
-
Constructor Details
-
pdfFileProcessor
public pdfFileProcessor()
-
-
Method Details
-
processFile
Extract text from the PDF file and return it as a Document object.- Parameters:
inFile- the PDF file to be processed- Returns:
- a Document object containing the extracted text and file name
- Throws:
IOException- if there is an error reading the PDF
-