An initial feature engineering notebook for a Kaggle student essay scoring competition suggested somewhat suprisingly a negative correlation between unique words and essay score. This notebook shows how misspelled words significantly contributed to that result. It also outlines how using a fixed number of words can prevent multicollinearity with other features.
The notebook can be found here.