All Classes and Interfaces
Class
Description
Build a training set based on TrainingData PDFs and train a FilteredClassifier.
Represents a collection (corpus) of
Document objects.Represents a text document containing a collection of words.
Main class for the ACM Classifier application.
class identifies the type of a document file
and determines whether text files contain only ASCII characters.
Enumeration for supported document types.
Checks if a PDF file is less than 50 pages.
This class reads the file path or file name and makes sure that the specified file exists.
The main entry point for building a
Corpus from a set of text files,
and computing term frequency (TF) and TF-IDF values for analysis.A service class to extract text from PDF files using Apache PDFBox.
A simple utility class for measuring elapsed execution time.
Calculates term frequency (TF) for words in a given document.
Calculates TF-IDF (Term Frequency–Inverse Document Frequency) for words across multiple documents.
A utility class for processing plain text (.txt) files.
Represents a single word and its frequency count in a document.
Class to test a String against a list of common words.