Package edu.odu.cs.cs350
Class Corpus
java.lang.Object
edu.odu.cs.cs350.Corpus
Represents a collection (corpus) of
Document objects.
The Corpus class manages a list of documents and provides utility methods for corpus-level statistics, such as counting how many documents contain a given word.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidaddDocument(Document doc) Adds aDocumentto this corpus.intCounts how many documents in the corpus contain a given word.Returns the list of allDocumentobjects in this corpus.intReturns the total number of documents in this corpus.
-
Constructor Details
-
Corpus
public Corpus()Constructs an empty Corpus.
-
-
Method Details
-
addDocument
Adds aDocumentto this corpus.- Parameters:
doc- the document to add
-
getDocuments
Returns the list of allDocumentobjects in this corpus.- Returns:
- a list of documents
-
getTotalDocuments
public int getTotalDocuments()Returns the total number of documents in this corpus.- Returns:
- the number of documents
-
getDocumentCountContaining
Counts how many documents in the corpus contain a given word.This is useful for computing the inverse document frequency (IDF) part of the TF-IDF calculation.
- Parameters:
word- the word to search for- Returns:
- the number of documents containing the given word
-