Enter the size of the document and parameters k and b into the calculator to determine the number of distinct words.

Heaps Law Formula

The following formula is used to calculate the number of distinct words in a document according to Heap's Law.

V = k * (N^b)

Variables:

  • V is the number of distinct words in the document
  • N is the size of the document (number of words)
  • k and b are parameters that depend on the language and the text source. Typically, k is between 10 and 100, and b is between 0.4 and 0.6

To calculate the number of distinct words in a document, multiply the parameter k by the size of the document raised to the power of the parameter b. This formula shows that the number of unique words (V) grows much slower than the size of the document (N), which is a characteristic of Heap's Law.

What is Heaps Law?

Heap's Law is a mathematical relationship observed in linguistics and computer science that describes the number of distinct words in a document, or set of documents, as a function of the document length. It states that the number of unique words grows much slower than the size of the document, and can be approximated by a power law relationship. This law is useful in information retrieval and natural language processing systems.

How to Calculate Heaps Law?

The following steps outline how to calculate the Heaps Law using the formula: V = k * (N^b).


  1. First, determine the size of the document (N) in terms of the number of words.
  2. Next, determine the parameter k, which depends on the language and the text source (typically between 10 and 100).
  3. Next, determine the parameter b, which depends on the language and the text source (typically between 0.4 and 0.6).
  4. Finally, calculate the number of distinct words in the document (V) using the formula V = k * (N^b).
  5. After inserting the variables and calculating the result, check your answer with the calculator above.

Example Problem : 

Use the following variables as an example problem to test your knowledge.

Size of the document (N) = 500

Parameter k = 50

Parameter b = 0.5