Lexical Diversity Calculator - Calculator Academy

Enter the total number of unique words and the total number of words into the Lexical Diversity Calculator. The calculator will evaluate the Lexical Diversity (Type–Token Ratio).

Understanding Lexical Diversity

Lexical diversity measures how varied the vocabulary is in a passage of text. It compares the number of distinct words used to the total number of words used. A higher value generally indicates more vocabulary variation, while a lower value indicates more repetition. This calculator expresses the result as a percentage, which is the percentage form of the type-token ratio.

Lexical Diversity Formula

TTR = \frac{UW}{TW}

LD = \frac{UW}{TW} \times 100

0 \le LD \le 100

Symbol	Meaning
LD	Lexical diversity expressed as a percentage
UW	Total number of unique words, also called types
TW	Total number of words, also called tokens
TTR	Type-token ratio in decimal form

How to Use the Calculator

Count the total number of words in the passage. This is the number of tokens.
Count how many different words appear at least once. This is the number of types.
Enter both values into the calculator.
The calculator returns the lexical diversity percentage.

If a text contains 1,000 total words and 250 distinct words, the lexical diversity is:

LD = \frac{250}{1000} \times 100 = 25\%

What Counts as a Unique Word?

The accuracy of lexical diversity depends on how the text is prepared before counting. The same text can produce different results depending on the rules used. For consistent analysis, define your counting method before comparing documents.

Case sensitivity: Decide whether Word and word are treated as the same type.
Punctuation: Remove punctuation if you want word counts only.
Contractions: Decide whether forms like don’t remain one token or are split.
Numbers and symbols: Choose whether dates, percentages, and symbols count as tokens.
Lemmatization or stemming: Some analyses group forms like run, runs, and running; others count them separately.
Hyphenated terms: Be consistent about whether they are one word or multiple words.

How to Interpret the Result

There is no single “good” lexical diversity score for every context. A result is most meaningful when it is compared to other texts of similar length, genre, and preprocessing rules.

Result Pattern	General Meaning	What to Check
Higher LD	More varied vocabulary and less repetition	Whether the text is very short, which can inflate the score
Lower LD	More repeated vocabulary or narrower word choice	Whether the subject matter naturally repeats key terms
Similar LD across texts	Comparable vocabulary variety	Whether tokenization and text length were standardized

Why Text Length Matters

Simple type-token ratio is sensitive to passage length. As a text gets longer, repeated words usually accumulate faster than brand-new words, so lexical diversity often decreases even if the writing remains sophisticated. This means short texts frequently appear more diverse than long texts.

For better comparisons:

Compare samples with similar word counts.
Use the same cleaning and tokenization rules for every text.
Be cautious when comparing a paragraph to a full article or chapter.
For advanced research, consider segment-based or corrected lexical diversity measures.

Lexical Diversity vs. Lexical Density

Lexical diversity and lexical density are related but different measurements:

Measure	What It Evaluates	Main Focus
Lexical Diversity	How many different words are used	Vocabulary variation
Lexical Density	How much of the text is made up of content words	Information load

A text may be lexically dense but not highly diverse if it repeats many technical terms. Likewise, a text may be diverse without being especially dense if it uses many different everyday words.

Common Applications

Writing analysis: evaluate repetition and vocabulary range in essays, blogs, and reports
Language learning: track vocabulary growth over time
Education: compare drafts, assignments, or reading responses
Corpus linguistics: examine stylistic variation across authors or genres
Speech and discourse analysis: study spoken vocabulary variety
NLP and text mining: use as a descriptive feature in language datasets

Best Practices for Reliable Comparison

Use samples of similar length.
Apply the same normalization rules to every text.
State whether the result is reported as a decimal ratio or a percentage.
Do not interpret the score in isolation; consider subject matter, genre, and audience.
When analyzing specialized texts, expect repeated terminology to lower the score naturally.

Frequently Asked Questions

Can lexical diversity be more than 100%?: No. The number of unique words cannot exceed the total number of words, so the percentage stays between 0% and 100%.
Is a higher score always better?: Not necessarily. A higher score means more variation, but effective writing often repeats important terms for clarity, cohesion, and emphasis.
What is the difference between types and tokens?: Tokens are all words in the text, including repeats. Types are the distinct words counted once each.
Should I remove stop words before calculating?: That depends on your goal. Keeping stop words reflects the full text as written. Removing them may highlight content vocabulary more clearly.
Why do two tools give different lexical diversity values?: They may handle punctuation, capitalization, contractions, numbers, or stemming differently. Consistent preprocessing is essential for fair comparison.