Skip to content

Commit

Permalink
Rename "perfect" into "optimal"
Browse files Browse the repository at this point in the history
Optimal in the sense of having to read all relevant, so the optimal solution to the problem is always bounded by the number of relevant records in the dataset.
  • Loading branch information
J535D165 committed Sep 28, 2024
1 parent 1620e80 commit 1d10ecb
Show file tree
Hide file tree
Showing 4 changed files with 27 additions and 27 deletions.
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@ extension can plot or compute the values for such metrics from ASReview
project files. [O'Mara-Eves et al.
(2015)](https://doi.org/10.1186/2046-4053-4-5) provides a comprehensive
overview of different metrics used in the field of actrive learning. Below we
describe the metrics available in the software.
describe the metrics available in the software.

### Recall
### Recall

The recall is the proportion of relevant records that have been found at a
certain point during the screening phase. It is sometimes also called the
Expand All @@ -58,12 +58,12 @@ The confusion matrix consist of the True Positives (TP), False Positives (FP),
True Negatives (TN), and False Negatives (FN). Definitions are provided in the
following table retrieved at a certain recall (r%).

| | Definition | Calculation |
| | Definition | Calculation |
|----------------------|----------------------------------------------------------------------------------------|---------------------------------|
| True Positives (TP) | The number of relevant records found at recall level | Relevant Records * r% |
| True Positives (TP) | The number of relevant records found at recall level | Relevant Records * r% |
| False Positives (FP) | The number of irrelevant records reviewed at recall level | Records Reviewed – TP |
| True Negatives (TN) | The number of irrelevant records correctly not reviewed at recall level | Irrelevant Records – FP |
| False Negatives (FN) | The number of relevant records not reviewed at recall level (missing relevant records) | Relevant Records – TP |
| True Negatives (TN) | The number of irrelevant records correctly not reviewed at recall level | Irrelevant Records – FP |
| False Negatives (FN) | The number of relevant records not reviewed at recall level (missing relevant records) | Relevant Records – TP |

### Work saved over sampling

Expand All @@ -81,7 +81,7 @@ normalize the WSS for class imbalance (denoted as the nWSS). Moreover, Kusa et
al. showed that nWSS is equal to the True Negative Rate (TNR). The TNR is the
proportion of irrelevant records that were correctly not reviewed at level of
recall. The nWSS is useful to compare performance in terms of work saved
across datasets and models while controlling for dataset class imbalance.
across datasets and models while controlling for dataset class imbalance.

The following table provides a hypothetical dataset example:

Expand Down Expand Up @@ -262,11 +262,11 @@ related to the steep recall curve.
Optional arguments for the command line are `--priors` to include prior
knowledge, `--x_absolute` and `--y_absolute` to use absolute axes.

See `asreview plot -h` for all command line arguments.
See `asreview plot -h` for all command line arguments.

### Plotting multiple files
It is possible to show the curves of multiple files in one plot. Use this
syntax (replace `YOUR_ASREVIEW_FILE_1` and `YOUR_ASREVIEW_FILE_2` by the
It is possible to show the curves of multiple files in one plot. Use this
syntax (replace `YOUR_ASREVIEW_FILE_1` and `YOUR_ASREVIEW_FILE_2` by the
asreview_files that you want to include in the plot):

```bash
Expand Down Expand Up @@ -365,9 +365,9 @@ with open_state("example.asreview") as s:
![Recall with absolute
axes](https://github.com/asreview/asreview-insights/blob/main/docs/example_absolute_axes.png)

#### Example: Adjusting the Random and Perfect curves
#### Example: Adjusting the random and optimal curves

By default, each plot will have a curve representing perfect performance, and a
By default, each plot will have a curve representing optimal performance, and a
curve representing random sampling performance. Both curves can be removed from
the graph.

Expand All @@ -380,7 +380,7 @@ from asreviewcontrib.insights.plot import plot_recall
with open_state("example.asreview") as s:

fig, ax = plt.subplots()
plot_recall(ax, s, show_random=False, show_perfect=False)
plot_recall(ax, s, show_random=False, show_optimal=False)

fig.savefig("example_without_curves.png")
```
Expand Down
24 changes: 12 additions & 12 deletions asreviewcontrib/insights/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ def plot_recall(
x_absolute=False,
y_absolute=False,
show_random=True,
show_perfect=True,
show_optimal=True,
show_legend=True,
legend_values=None,
legend_kwargs=None,
Expand All @@ -35,8 +35,8 @@ def plot_recall(
If False, the fraction of all included records found is on the y-axis.
show_random: bool
Show the random curve in the plot.
show_perfect: bool
Show the perfect curve in the plot.
show_optimal: bool
Show the optimal curve in the plot.
show_legend: bool
If state_obj contains multiple states, show a legend in the plot.
legend_values: list[str]
Expand Down Expand Up @@ -64,7 +64,7 @@ def plot_recall(
x_absolute=x_absolute,
y_absolute=y_absolute,
show_random=show_random,
show_perfect=show_perfect,
show_optimal=show_optimal,
show_legend=show_legend,
legend_values=legend_values,
legend_kwargs=legend_kwargs,
Expand Down Expand Up @@ -241,7 +241,7 @@ def _plot_recall(
x_absolute=False,
y_absolute=False,
show_random=True,
show_perfect=True,
show_optimal=True,
show_legend=True,
legend_values=None,
legend_kwargs=None,
Expand All @@ -263,10 +263,10 @@ def _plot_recall(
ax = _add_recall_info(ax, labels, x_absolute, y_absolute)

if show_random:
ax = _add_random_curve(ax, labels, x_absolute, y_absolute)
ax = _add_random_curve(ax, labels, x_absolute, y_absolute)

if show_perfect:
ax = _add_perfect_curve(ax, labels, x_absolute, y_absolute)
if show_optimal:
ax = _add_optimal_curve(ax, labels, x_absolute, y_absolute)

if show_legend:
if legend_kwargs is None:
Expand Down Expand Up @@ -406,13 +406,13 @@ def _add_random_curve(ax, labels, x_absolute, y_absolute):
return ax


def _add_perfect_curve(ax, labels, x_absolute, y_absolute):
"""Add a perfect curve to a plot using step-wise increments.
def _add_optimal_curve(ax, labels, x_absolute, y_absolute):
"""Add a optimal curve to a plot using step-wise increments.
Returns
-------
plt.axes.Axes
Axes with perfect curve added.
Axes with optimal curve added.
"""
# get total amount of positive labels
if isinstance(labels[0], list):
Expand All @@ -426,7 +426,7 @@ def _add_perfect_curve(ax, labels, x_absolute, y_absolute):
x = np.arange(0, n_pos_docs + 1) if x_absolute else np.arange(0, n_pos_docs + 1) / n_docs # noqa: E501
y = np.arange(0, n_pos_docs + 1) if y_absolute else np.arange(0, n_pos_docs + 1) / n_pos_docs # noqa: E501

# Plot the stepwise perfect curve
# Plot the stepwise optimal curve
ax.step(x, y, color="grey", where="post")

return ax
Expand Down
2 changes: 1 addition & 1 deletion docs/example_multiple_lines.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
plot_recall(ax, s2)

# Set the labels for the legend. Both plots add the recall line, the random
# curve, and the perfect curve. Hence the recall lines are the 0th and 3nd line.
# curve, and the optimal curve. Hence the recall lines are the 0th and 3nd line.
ax.lines[0].set_label("Naive Bayes")
ax.lines[3].set_label("Logistic")
ax.legend()
Expand Down
2 changes: 1 addition & 1 deletion docs/example_without_curves.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@

with open_state("tests/asreview_files/sim_van_de_schoot_2017_logistic.asreview") as s:
fig, ax = plt.subplots()
plot_recall(ax, s, show_random=False, show_perfect=False)
plot_recall(ax, s, show_random=False, show_optimal=False)

fig.savefig("docs/example_without_curves.png")

0 comments on commit 1d10ecb

Please sign in to comment.