Index

A B C D E F G I L M P S T U V W 
All Classes and Interfaces|All Packages

A

ACMTrainingSetBuilder - Class in edu.odu.cs.cs350
Build a training set based on TrainingData PDFs and train a FilteredClassifier.
ACMTrainingSetBuilder() - Constructor for class edu.odu.cs.cs350.ACMTrainingSetBuilder
 
addDocument(Document) - Method in class edu.odu.cs.cs350.Corpus
Adds a Document to this corpus.
addStringInstancesFromRepository(Instances, List<String>, List<String>) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
Gets PDFs from Maven repo, extracts text, and adds Instances.
addWord(String) - Method in class edu.odu.cs.cs350.Document
Adds a word to this document with an initial count of 1 or increments its count if the word has already been seen.
ASCII_TEXT - Enum constant in enum class edu.odu.cs.cs350.DocumentIdentifier.DocumentType
ASCII text document type

B

buildTrainingFilter(Instances) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
Build a StringToWordVector filter for training and classification.

C

classifyAndPrint(Corpus) - Static method in class edu.odu.cs.cs350.DocumentClassifier
Classify documents in a Corpus, print results, and record classification times.
computeIDF(List<Document>) - Static method in class edu.odu.cs.cs350.TFIDFCalculator
Computes the Inverse Document Frequency (IDF) for all words in a collection of documents.
computeTF(Document) - Static method in class edu.odu.cs.cs350.TFCalculator
Computes the term frequency map for a document.
computeTFIDF(Document, Map<String, Double>) - Static method in class edu.odu.cs.cs350.TFIDFCalculator
Computes the TF-IDF score for each word in a document.
contains(String) - Static method in class edu.odu.cs.cs350.WordFilter
Check if a word is in the stop word list.
Corpus - Class in edu.odu.cs.cs350
Represents a collection (corpus) of Document objects.
Corpus() - Constructor for class edu.odu.cs.cs350.Corpus
Constructs an empty Corpus.
createExampleCorpus(String[]) - Static method in class edu.odu.cs.cs350.DocumentClassifier
Create a Corpus of Documents from input files and record processing times.

D

determineCategoryFromPath(String, List<String>) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
Determine ACM category from document path within repo.
Document - Class in edu.odu.cs.cs350
Represents a text document containing a collection of words.
Document(String) - Constructor for class edu.odu.cs.cs350.Document
Constructs a new Document with the specified name.
DocumentClassifier - Class in edu.odu.cs.cs350
Main class for the ACM Classifier application.
DocumentClassifier() - Constructor for class edu.odu.cs.cs350.DocumentClassifier
 
DocumentIdentifier - Class in edu.odu.cs.cs350
class identifies the type of a document file and determines whether text files contain only ASCII characters.
DocumentIdentifier() - Constructor for class edu.odu.cs.cs350.DocumentIdentifier
 
DocumentIdentifier.DocumentType - Enum Class in edu.odu.cs.cs350
Enumeration for supported document types.
DocumentLength - Class in edu.odu.cs.cs350
Checks if a PDF file is less than 50 pages.
DocumentLength() - Constructor for class edu.odu.cs.cs350.DocumentLength
 

E

edu.odu.cs.cs350 - package edu.odu.cs.cs350
 
elapsedMilliseconds() - Method in class edu.odu.cs.cs350.Stopwatch
Returns the elapsed time in milliseconds since the last call to Stopwatch.start().
elapsedSeconds() - Method in class edu.odu.cs.cs350.Stopwatch
Returns the elapsed time in seconds since the last call to Stopwatch.start().

F

FilenameReader - Class in edu.odu.cs.cs350
This class reads the file path or file name and makes sure that the specified file exists.
FilenameReader() - Constructor for class edu.odu.cs.cs350.FilenameReader
 
FileProcessor - Class in edu.odu.cs.cs350
The main entry point for building a Corpus from a set of text files, and computing term frequency (TF) and TF-IDF values for analysis.
FileProcessor() - Constructor for class edu.odu.cs.cs350.FileProcessor
 

G

getACMClasses(List<String>) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
Get list of ACM categories from document paths (using the subfolder names).
getCount() - Method in class edu.odu.cs.cs350.Word
Returns the current count of occurrences for this word.
getDocumentCountContaining(String) - Method in class edu.odu.cs.cs350.Corpus
Counts how many documents in the corpus contain a given word.
getDocumentListing() - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
Get list of document paths from TrainingData repository.
getDocuments() - Method in class edu.odu.cs.cs350.Corpus
Returns the list of all Document objects in this corpus.
getFilename(String) - Static method in class edu.odu.cs.cs350.FilenameReader
Reads the file, validates that the referenced file exists, and returns the filename as a string.
getFilePath() - Method in class edu.odu.cs.cs350.txtFileProcessor
Returns the file path associated with this text file processor.
getLogger() - Static method in class edu.odu.cs.cs350.LoggerUtil
Returns a shared application logger.
getName() - Method in class edu.odu.cs.cs350.Document
Returns the name of this document.
getTextFromDocument(String) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
Extract text from a PDF document in the TrainingData repository.
getTotalDocuments() - Method in class edu.odu.cs.cs350.Corpus
Returns the total number of documents in this corpus.
getTotalWordCount() - Method in class edu.odu.cs.cs350.Document
Calculates the total number of words (sum of all word counts) in this document.
getWord() - Method in class edu.odu.cs.cs350.Word
Returns the string representation of this word.
getWords() - Method in class edu.odu.cs.cs350.Document
Returns the map of words contained in this document.

I

identify(File) - Static method in class edu.odu.cs.cs350.DocumentIdentifier
Identifies the type of document based on its file extension.
incrementCount() - Method in class edu.odu.cs.cs350.Word
Increments the count of occurrences for this word by 1.
isTextDocument(File) - Static method in class edu.odu.cs.cs350.DocumentIdentifier
Checks whether a given text file contains only ASCII characters.
isValidLength(File) - Static method in class edu.odu.cs.cs350.DocumentLength
Return true if the PDF has less than 50 pages.

L

LoggerUtil - Class in edu.odu.cs.cs350
 
LoggerUtil() - Constructor for class edu.odu.cs.cs350.LoggerUtil
 

M

main(String[]) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
Main method to build training set and train classifier.
main(String[]) - Static method in class edu.odu.cs.cs350.DocumentClassifier
Main method that calls the classifier with input arguments.
main(String[]) - Static method in class edu.odu.cs.cs350.DocumentIdentifier
Main method for identifying a document's type from the command line.
main(String[]) - Static method in class edu.odu.cs.cs350.FileProcessor
Main method that runs the file processing pipeline.

P

PDF - Enum constant in enum class edu.odu.cs.cs350.DocumentIdentifier.DocumentType
PDF document type
pdfFileProcessor - Class in edu.odu.cs.cs350
A service class to extract text from PDF files using Apache PDFBox.
pdfFileProcessor() - Constructor for class edu.odu.cs.cs350.pdfFileProcessor
 
printFileNames(Corpus) - Static method in class edu.odu.cs.cs350.DocumentClassifier
Print the names and word counts of documents in a corpus.
processFile() - Method in class edu.odu.cs.cs350.txtFileProcessor
Reads the text file and converts it into a Document object, using tokenization and removing common stopwords.
processFile(File) - Static method in class edu.odu.cs.cs350.pdfFileProcessor
Extract text from the PDF file and return it as a Document object.

S

start() - Method in class edu.odu.cs.cs350.Stopwatch
Starts or restarts the stopwatch.
Stopwatch - Class in edu.odu.cs.cs350
A simple utility class for measuring elapsed execution time.
Stopwatch() - Constructor for class edu.odu.cs.cs350.Stopwatch
 

T

TFCalculator - Class in edu.odu.cs.cs350
Calculates term frequency (TF) for words in a given document.
TFCalculator() - Constructor for class edu.odu.cs.cs350.TFCalculator
 
TFIDFCalculator - Class in edu.odu.cs.cs350
Calculates TF-IDF (Term Frequency–Inverse Document Frequency) for words across multiple documents.
TFIDFCalculator() - Constructor for class edu.odu.cs.cs350.TFIDFCalculator
 
txtFileProcessor - Class in edu.odu.cs.cs350
A utility class for processing plain text (.txt) files.
txtFileProcessor(String) - Constructor for class edu.odu.cs.cs350.txtFileProcessor
Constructor for txtFileProcessor.

U

UNSUPPORTED - Enum constant in enum class edu.odu.cs.cs350.DocumentIdentifier.DocumentType
Unsupported document type

V

valueOf(String) - Static method in enum class edu.odu.cs.cs350.DocumentIdentifier.DocumentType
Returns the enum constant of this class with the specified name.
values() - Static method in enum class edu.odu.cs.cs350.DocumentIdentifier.DocumentType
Returns an array containing the constants of this enum class, in the order they are declared.

W

Word - Class in edu.odu.cs.cs350
Represents a single word and its frequency count in a document.
Word(String) - Constructor for class edu.odu.cs.cs350.Word
Constructs a new Word with an initial count of 1.
WordFilter - Class in edu.odu.cs.cs350
Class to test a String against a list of common words.
WordFilter() - Constructor for class edu.odu.cs.cs350.WordFilter
 
writeModels(Instances, Instances, FilteredClassifier) - Static method in class edu.odu.cs.cs350.ACMTrainingSetBuilder
Write trained models to disk.
A B C D E F G I L M P S T U V W 
All Classes and Interfaces|All Packages