Package edu.odu.cs.cs350


package edu.odu.cs.cs350
  • Class
    Description
    Build a training set based on TrainingData PDFs and train a FilteredClassifier.
    Represents a collection (corpus) of Document objects.
    Represents a text document containing a collection of words.
    Main class for the ACM Classifier application.
    class identifies the type of a document file and determines whether text files contain only ASCII characters.
    Enumeration for supported document types.
    Checks if a PDF file is less than 50 pages.
    This class reads the file path or file name and makes sure that the specified file exists.
    The main entry point for building a Corpus from a set of text files, and computing term frequency (TF) and TF-IDF values for analysis.
     
    A service class to extract text from PDF files using Apache PDFBox.
    A simple utility class for measuring elapsed execution time.
    Calculates term frequency (TF) for words in a given document.
    Calculates TF-IDF (Term Frequency–Inverse Document Frequency) for words across multiple documents.
    A utility class for processing plain text (.txt) files.
    Represents a single word and its frequency count in a document.
    Class to test a String against a list of common words.