From 7007a9fac37ec232b0e00f68fe99b38a62d37e4b Mon Sep 17 00:00:00 2001 From: PeerHerholz Date: Fri, 25 Oct 2024 13:23:09 -0400 Subject: [PATCH] fix description of n < p problem --- content/haxby_data.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/haxby_data.ipynb b/content/haxby_data.ipynb index 18300f5..fac2051 100644 --- a/content/haxby_data.ipynb +++ b/content/haxby_data.ipynb @@ -51,7 +51,7 @@ "```{admonition} Bonus question: ever heard of the \"small-n-high-p\" (p >> n) problem?\n", ":class: tip, dropdown\n", "\n", - "\"Classical\" `machine learning`/`decoding` models and the underlying algorithms operate on the assumption that are more `predictors` or `features` than there are `sample`. In fact many more. Why is that?\n", + "\"Classical\" `machine learning`/`decoding` models and the underlying algorithms operate on the assumption that are more `samples` than there are `predictors` or `features` . In fact many more. Why is that?\n", "Consider a high-dimensional `space` whose `dimensions` are defined by the number of `features` (e.g. `10 features` would result in a space with `10 dimensions`. The resulting `volume` of this `space` is the amount of `samples` that could be drawn from the `domain` and the number of `samples` entail the `samples` you need to address your `learning problem`, ie `decoding` outcome. That is why folks say: \"get more data\", `machine learning` is `data`-hungry: our `sample` needs to be as representative of the high-dimensional domain as possible. Thus, as the number of `features` increases, so should the number of `samples` so to capture enough of the `space` for the `decoding model` at hand.\n", "\n", "This referred to as the [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) and poses as a major problem in many fields that aim to utilize `machine learning`/`decoding` on unsuitable data. Why is that?\n",