Package edu.odu.cs.cs350
package edu.odu.cs.cs350
-
ClassDescriptionBuild a training set based on TrainingData PDFs and train a FilteredClassifier.Represents a collection (corpus) of
Documentobjects.Represents a text document containing a collection of words.Main class for the ACM Classifier application.class identifies the type of a document file and determines whether text files contain only ASCII characters.Enumeration for supported document types.Checks if a PDF file is less than 50 pages.This class reads the file path or file name and makes sure that the specified file exists.The main entry point for building aCorpusfrom a set of text files, and computing term frequency (TF) and TF-IDF values for analysis.A service class to extract text from PDF files using Apache PDFBox.A simple utility class for measuring elapsed execution time.Calculates term frequency (TF) for words in a given document.Calculates TF-IDF (Term Frequency–Inverse Document Frequency) for words across multiple documents.A utility class for processing plain text (.txt) files.Represents a single word and its frequency count in a document.Class to test a String against a list of common words.