Index
All Classes and Interfaces|All Packages
A
- ACMTrainingSetBuilder - Class in edu.odu.cs.cs350
-
Build a training set based on TrainingData PDFs and train a FilteredClassifier.
- ACMTrainingSetBuilder() - Constructor for class edu.odu.cs.cs350.ACMTrainingSetBuilder
- addDocument(Document) - Method in class edu.odu.cs.cs350.Corpus
-
Adds a
Documentto this corpus. - addStringInstancesFromRepository(Instances, List<String>, List<String>) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
-
Gets PDFs from Maven repo, extracts text, and adds Instances.
- addWord(String) - Method in class edu.odu.cs.cs350.Document
-
Adds a word to this document with an initial count of 1 or increments its count if the word has already been seen.
- ASCII_TEXT - Enum constant in enum class edu.odu.cs.cs350.DocumentIdentifier.DocumentType
-
ASCII text document type
B
- buildTrainingFilter(Instances) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
-
Build a StringToWordVector filter for training and classification.
C
- classifyAndPrint(Corpus) - Static method in class edu.odu.cs.cs350.DocumentClassifier
-
Classify documents in a Corpus, print results, and record classification times.
- computeIDF(List<Document>) - Static method in class edu.odu.cs.cs350.TFIDFCalculator
-
Computes the Inverse Document Frequency (IDF) for all words in a collection of documents.
- computeTF(Document) - Static method in class edu.odu.cs.cs350.TFCalculator
-
Computes the term frequency map for a document.
- computeTFIDF(Document, Map<String, Double>) - Static method in class edu.odu.cs.cs350.TFIDFCalculator
-
Computes the TF-IDF score for each word in a document.
- contains(String) - Static method in class edu.odu.cs.cs350.WordFilter
-
Check if a word is in the stop word list.
- Corpus - Class in edu.odu.cs.cs350
-
Represents a collection (corpus) of
Documentobjects. - Corpus() - Constructor for class edu.odu.cs.cs350.Corpus
-
Constructs an empty Corpus.
- createExampleCorpus(String[]) - Static method in class edu.odu.cs.cs350.DocumentClassifier
-
Create a Corpus of Documents from input files and record processing times.
D
- determineCategoryFromPath(String, List<String>) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
-
Determine ACM category from document path within repo.
- Document - Class in edu.odu.cs.cs350
-
Represents a text document containing a collection of words.
- Document(String) - Constructor for class edu.odu.cs.cs350.Document
-
Constructs a new Document with the specified name.
- DocumentClassifier - Class in edu.odu.cs.cs350
-
Main class for the ACM Classifier application.
- DocumentClassifier() - Constructor for class edu.odu.cs.cs350.DocumentClassifier
- DocumentIdentifier - Class in edu.odu.cs.cs350
-
class identifies the type of a document file and determines whether text files contain only ASCII characters.
- DocumentIdentifier() - Constructor for class edu.odu.cs.cs350.DocumentIdentifier
- DocumentIdentifier.DocumentType - Enum Class in edu.odu.cs.cs350
-
Enumeration for supported document types.
- DocumentLength - Class in edu.odu.cs.cs350
-
Checks if a PDF file is less than 50 pages.
- DocumentLength() - Constructor for class edu.odu.cs.cs350.DocumentLength
E
- edu.odu.cs.cs350 - package edu.odu.cs.cs350
- elapsedMilliseconds() - Method in class edu.odu.cs.cs350.Stopwatch
-
Returns the elapsed time in milliseconds since the last call to
Stopwatch.start(). - elapsedSeconds() - Method in class edu.odu.cs.cs350.Stopwatch
-
Returns the elapsed time in seconds since the last call to
Stopwatch.start().
F
- FilenameReader - Class in edu.odu.cs.cs350
-
This class reads the file path or file name and makes sure that the specified file exists.
- FilenameReader() - Constructor for class edu.odu.cs.cs350.FilenameReader
- FileProcessor - Class in edu.odu.cs.cs350
-
The main entry point for building a
Corpusfrom a set of text files, and computing term frequency (TF) and TF-IDF values for analysis. - FileProcessor() - Constructor for class edu.odu.cs.cs350.FileProcessor
G
- getACMClasses(List<String>) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
-
Get list of ACM categories from document paths (using the subfolder names).
- getCount() - Method in class edu.odu.cs.cs350.Word
-
Returns the current count of occurrences for this word.
- getDocumentCountContaining(String) - Method in class edu.odu.cs.cs350.Corpus
-
Counts how many documents in the corpus contain a given word.
- getDocumentListing() - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
-
Get list of document paths from TrainingData repository.
- getDocuments() - Method in class edu.odu.cs.cs350.Corpus
-
Returns the list of all
Documentobjects in this corpus. - getFilename(String) - Static method in class edu.odu.cs.cs350.FilenameReader
-
Reads the file, validates that the referenced file exists, and returns the filename as a string.
- getFilePath() - Method in class edu.odu.cs.cs350.txtFileProcessor
-
Returns the file path associated with this text file processor.
- getLogger() - Static method in class edu.odu.cs.cs350.LoggerUtil
-
Returns a shared application logger.
- getName() - Method in class edu.odu.cs.cs350.Document
-
Returns the name of this document.
- getTextFromDocument(String) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
-
Extract text from a PDF document in the TrainingData repository.
- getTotalDocuments() - Method in class edu.odu.cs.cs350.Corpus
-
Returns the total number of documents in this corpus.
- getTotalWordCount() - Method in class edu.odu.cs.cs350.Document
-
Calculates the total number of words (sum of all word counts) in this document.
- getWord() - Method in class edu.odu.cs.cs350.Word
-
Returns the string representation of this word.
- getWords() - Method in class edu.odu.cs.cs350.Document
-
Returns the map of words contained in this document.
I
- identify(File) - Static method in class edu.odu.cs.cs350.DocumentIdentifier
-
Identifies the type of document based on its file extension.
- incrementCount() - Method in class edu.odu.cs.cs350.Word
-
Increments the count of occurrences for this word by 1.
- isTextDocument(File) - Static method in class edu.odu.cs.cs350.DocumentIdentifier
-
Checks whether a given text file contains only ASCII characters.
- isValidLength(File) - Static method in class edu.odu.cs.cs350.DocumentLength
-
Return true if the PDF has less than 50 pages.
L
- LoggerUtil - Class in edu.odu.cs.cs350
- LoggerUtil() - Constructor for class edu.odu.cs.cs350.LoggerUtil
M
- main(String[]) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
-
Main method to build training set and train classifier.
- main(String[]) - Static method in class edu.odu.cs.cs350.DocumentClassifier
-
Main method that calls the classifier with input arguments.
- main(String[]) - Static method in class edu.odu.cs.cs350.DocumentIdentifier
-
Main method for identifying a document's type from the command line.
- main(String[]) - Static method in class edu.odu.cs.cs350.FileProcessor
-
Main method that runs the file processing pipeline.
P
- PDF - Enum constant in enum class edu.odu.cs.cs350.DocumentIdentifier.DocumentType
-
PDF document type
- pdfFileProcessor - Class in edu.odu.cs.cs350
-
A service class to extract text from PDF files using Apache PDFBox.
- pdfFileProcessor() - Constructor for class edu.odu.cs.cs350.pdfFileProcessor
- printFileNames(Corpus) - Static method in class edu.odu.cs.cs350.DocumentClassifier
-
Print the names and word counts of documents in a corpus.
- processFile() - Method in class edu.odu.cs.cs350.txtFileProcessor
-
Reads the text file and converts it into a Document object, using tokenization and removing common stopwords.
- processFile(File) - Static method in class edu.odu.cs.cs350.pdfFileProcessor
-
Extract text from the PDF file and return it as a Document object.
S
- start() - Method in class edu.odu.cs.cs350.Stopwatch
-
Starts or restarts the stopwatch.
- Stopwatch - Class in edu.odu.cs.cs350
-
A simple utility class for measuring elapsed execution time.
- Stopwatch() - Constructor for class edu.odu.cs.cs350.Stopwatch
T
- TFCalculator - Class in edu.odu.cs.cs350
-
Calculates term frequency (TF) for words in a given document.
- TFCalculator() - Constructor for class edu.odu.cs.cs350.TFCalculator
- TFIDFCalculator - Class in edu.odu.cs.cs350
-
Calculates TF-IDF (Term Frequency–Inverse Document Frequency) for words across multiple documents.
- TFIDFCalculator() - Constructor for class edu.odu.cs.cs350.TFIDFCalculator
- txtFileProcessor - Class in edu.odu.cs.cs350
-
A utility class for processing plain text (.txt) files.
- txtFileProcessor(String) - Constructor for class edu.odu.cs.cs350.txtFileProcessor
-
Constructor for txtFileProcessor.
U
- UNSUPPORTED - Enum constant in enum class edu.odu.cs.cs350.DocumentIdentifier.DocumentType
-
Unsupported document type
V
- valueOf(String) - Static method in enum class edu.odu.cs.cs350.DocumentIdentifier.DocumentType
-
Returns the enum constant of this class with the specified name.
- values() - Static method in enum class edu.odu.cs.cs350.DocumentIdentifier.DocumentType
-
Returns an array containing the constants of this enum class, in the order they are declared.
W
- Word - Class in edu.odu.cs.cs350
-
Represents a single word and its frequency count in a document.
- Word(String) - Constructor for class edu.odu.cs.cs350.Word
-
Constructs a new Word with an initial count of 1.
- WordFilter - Class in edu.odu.cs.cs350
-
Class to test a String against a list of common words.
- WordFilter() - Constructor for class edu.odu.cs.cs350.WordFilter
- writeModels(Instances, Instances, FilteredClassifier) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
-
Write trained models to disk.
All Classes and Interfaces|All Packages