Cross-entropy is commonly used to quantify the difference between two probability distributions.
Cross-entropy loss measures how close is the predicted distribution to the true distribution.
Why the Negative Sign?
Log Loss uses negative log to provide an easy metric for comparison. It takes this approach because the positive log of numbers < 1 returns negative values, which is confusing to work with when comparing the performance of two models.
Bias is the simplifying assumptions made by the model to make the target function easier to approximate.
Variance is the amount that the estimate of the target function will change given different training data.
The bias–variance trade-off is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set.