Class Corpus

java.lang.Object
edu.odu.cs.cs350.Corpus

public class Corpus extends Object
Represents a collection (corpus) of Document objects.

The Corpus class manages a list of documents and provides utility methods for corpus-level statistics, such as counting how many documents contain a given word.

  • Constructor Details

    • Corpus

      public Corpus()
      Constructs an empty Corpus.
  • Method Details

    • addDocument

      public void addDocument(Document doc)
      Adds a Document to this corpus.
      Parameters:
      doc - the document to add
    • getDocuments

      public List<Document> getDocuments()
      Returns the list of all Document objects in this corpus.
      Returns:
      a list of documents
    • getTotalDocuments

      public int getTotalDocuments()
      Returns the total number of documents in this corpus.
      Returns:
      the number of documents
    • getDocumentCountContaining

      public int getDocumentCountContaining(String word)
      Counts how many documents in the corpus contain a given word.

      This is useful for computing the inverse document frequency (IDF) part of the TF-IDF calculation.

      Parameters:
      word - the word to search for
      Returns:
      the number of documents containing the given word