Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ metadata: HTML -> markdown #1422

Merged
merged 12 commits into from
Sep 20, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ dataset:

• Data from 1270 to 1870 is taken from Table 3.06 of Broadberry et al. (2015). The data in this table is based on the Medieval Accounts Database, the Early Modern Probate Inventories Database and the Modern Farm Accounts Database. Seed sown per acre from the Medieval and Modern Databases. Pulses for the modern period and all seeds sown for the early modern period are taken from Overton and Campbell (1996), Allen (2005).
This comprises crop yield estimates only for England. For this dataset, we have assumed that yields in England are also representative of average UK yields. The data was given as decadal averages, and we have assumed, for each value, the middle year in each decade.
All values of yield in bushels per acre have been converted to tonnes per hectare, using the conversion factors given by <a href="https://www.ers.usda.gov/webdocs/publications/41880/33132_ah697_002.pdf">the USDA</a> for the different commodities.
All values of yield in bushels per acre have been converted to tonnes per hectare, using the conversion factors given by [the USDA](https://www.ers.usda.gov/webdocs/publications/41880/33132_ah697_002.pdf) for the different commodities.

• Data from 1870 to 1960 is taken from Table 4 of Brassley (2000). The data in this table is based on the book "A hundred Years of British food and farming: a statistical survey", by H. F. Marks (ed. D. K. Britton, 1989). The data is provided over 5-year periods. We have assumed, for each value, the middle year in each 5-year set.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ dataset:

• Data from 1270 to 1870 is taken from Table 3.06 of Broadberry et al. (2015). The data in this table is based on the Medieval Accounts Database, the Early Modern Probate Inventories Database and the Modern Farm Accounts Database. Seed sown per acre from the Medieval and Modern Databases. Pulses for the modern period and all seeds sown for the early modern period are taken from Overton and Campbell (1996), Allen (2005).
This comprises crop yield estimates only for England. For this dataset, we have assumed that yields in England are also representative of average UK yields. The data was given as decadal averages, and we have assumed, for each value, the middle year in each decade.
All values of yield in bushels per acre have been converted to tonnes per hectare, using the conversion factors given by <a href="https://www.ers.usda.gov/webdocs/publications/41880/33132_ah697_002.pdf">the USDA</a> for the different commodities.
All values of yield in bushels per acre have been converted to tonnes per hectare, using the conversion factors given by [the USDA](https://www.ers.usda.gov/webdocs/publications/41880/33132_ah697_002.pdf) for the different commodities.

• Data from 1870 to 1960 is taken from Table 4 of Brassley (2000). The data in this table is based on the book "A hundred Years of British food and farming: a statistical survey", by H. F. Marks (ed. D. K. Britton, 1989). The data is provided over 5-year periods. We have assumed, for each value, the middle year in each 5-year set.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ tables:

For example, 'Start in 2010' marks the necessary future emissions pathway to have a >66% chance of keeping global average temperatures below 1.5°C warming if global CO2 emissions mitigation had started in 2010, very quickly peaking then falling.

Data is sourced from Robbie Andrew, and available for download <a href="http://folk.uio.no/roberan/t/global_mitigation_curves.shtml">here</a>.
Data is sourced from Robbie Andrew, and available for download [here](http://folk.uio.no/roberan/t/global_mitigation_curves.shtml).

Historical emissions to 2017 are sourced from CDIAC/Global Carbon Project, projection to 2018 from Global Carbon Project (Le Quéré et al. 2018).

Expand All @@ -34,7 +34,7 @@ tables:

For example, 'Start in 2010' marks the necessary future emissions pathway to have a >66% chance of keeping global average temperatures below 2°C warming if global CO2 emissions mitigation had started in 2010, very quickly peaking then falling.

Data is sourced from Robbie Andrew, and available for download <a href="http://folk.uio.no/roberan/t/global_mitigation_curves.shtml">here</a>.
Data is sourced from Robbie Andrew, and available for download [here](http://folk.uio.no/roberan/t/global_mitigation_curves.shtml).

Historical emissions to 2017 are sourced from CDIAC/Global Carbon Project, projection to 2018 from Global Carbon Project (Le Quéré et al. 2018).

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
dataset:
title: DeepFake detection (AI Index, 2023)
description: >
Data from Li et al. (2022) via AI Index report on Celeb-DF, presently one of the most challenging deepfake detection benchmarks.
Data from Li et al. (2022) via AI Index report on Celeb-DF, presently one of the most challenging deepfake detection benchmarks.


The AI Index is an independent initiative at the Stanford University Institute for Human-Centered Artificial Intelligence.
The mission of the AI Index is “to provide unbiased, rigorously vetted, and globally sourced data for policymakers, researchers, executives, journalists, and the general public to develop intuitions about the complex field of AI.”
Their flagship output is the annual AI Index Report, which has been published since 2017.
The AI Index is an independent initiative at the Stanford University Institute for Human-Centered Artificial Intelligence.
The mission of the AI Index is “to provide unbiased, rigorously vetted, and globally sourced data for policymakers, researchers,
executives, journalists, and the general public to develop intuitions about the complex field of AI.” Their flagship output
is the annual AI Index Report, which has been published since 2017.
licenses:
- name: Public domain
url: https://aiindex.stanford.edu/wp-content/uploads/2023/04/HAI_AI-Index-Report_2023.pdf
Expand All @@ -17,33 +18,44 @@ dataset:
date_accessed: '2023-06-19'
publication_date: '2023-05-19'
publication_year: 2023
published_by: Li et al. (2022) via the AI Index 2023 Annual Report, AI Index Steering Committee, Institute
for Human-Centered AI, Stanford University, Stanford, CA, April 2023
published_by: Li et al. (2022) via the AI Index 2023 Annual Report, AI Index Steering Committee, Institute for Human-Centered
AI, Stanford University, Stanford, CA, April 2023
tables:
ai_deepfakes:
variables:
area_under_curve_score__auc:
title: Area Under Curve Score (AUC)
description: >
The Area Under Curve Score (AUC), also known as the AUC-ROC (Receiver Operating Characteristic) score, is a popular evaluation metric used in machine learning and statistics to assess the performance of binary classification models.

In binary classification, the goal is to predict whether an instance belongs to one class (positive) or another (negative) based on its features. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various classification thresholds. The TPR is the ratio of true positives to the total number of actual positives, while the FPR is the ratio of false positives to the total number of actual negatives.

The AUC is a measure of the overall performance of the classifier across all possible classification thresholds. It represents the probability that a randomly chosen positive instance will be ranked higher than a randomly chosen negative instance according to the classifier's predicted probabilities. The AUC score ranges from 0 to 1, where a score of 1 indicates a perfect classifier, and a score of 0.5 represents a classifier with no discriminatory power (equivalent to random guessing).

Interpreting the AUC score:

AUC = 1: Perfect classifier. The model has a clear separation between the positive and negative classes, correctly ranking all instances.
AUC > 0.5: Better than random guessing. The model has some discriminatory power and performs better than a random classifier.
AUC = 0.5: Random classifier. The model performs no better than flipping a coin and has no ability to distinguish between the classes.
AUC < 0.5: Inverted classifier. The model performs worse than random guessing, meaning it is making incorrect predictions.
The AUC score is widely used because it is insensitive to class imbalance and classification thresholds. It provides a single scalar value to compare different classifiers or evaluate the performance of a single classifier. Higher AUC scores generally indicate better classifier performance in terms of the trade-off between true positives and false positives.

It's important to note that the AUC score is specific to binary classification problems and cannot be directly applied to multi-class classification tasks without modification.

unit: 'Area under curve'
The Area Under Curve Score (AUC), also known as the AUC-ROC (Receiver Operating Characteristic) score, is a popular
evaluation metric used in machine learning and statistics to assess the performance of binary classification models.

In binary classification, the goal is to predict whether an instance belongs to one class (positive) or another
(negative) based on its features. The ROC curve is created by plotting the true positive rate (TPR) against the
false positive rate (FPR) at various classification thresholds. The TPR is the ratio of true positives to the total
number of actual positives, while the FPR is the ratio of false positives to the total number of actual negatives.

The AUC is a measure of the overall performance of the classifier across all possible classification thresholds.
It represents the probability that a randomly chosen positive instance will be ranked higher than a randomly chosen
negative instance according to the classifier's predicted probabilities. The AUC score ranges from 0 to 1, where
a score of 1 indicates a perfect classifier, and a score of 0.5 represents a classifier with no discriminatory power
(equivalent to random guessing).

Interpreting the AUC score:

AUC = 1: Perfect classifier. The model has a clear separation between the positive and negative classes, correctly
ranking all instances.
AUC > 0.5: Better than random guessing. The model has some discriminatory power and performs better than a random
classifier.
AUC = 0.5: Random classifier. The model performs no better than flipping a coin and has no ability to distinguish
between the classes.
AUC < 0.5: Inverted classifier. The model performs worse than random guessing, meaning it is making incorrect predictions.
The AUC score is widely used because it is insensitive to class imbalance and classification thresholds. It provides
a single scalar value to compare different classifiers or evaluate the performance of a single classifier. Higher
AUC scores generally indicate better classifier performance in terms of the trade-off between true positives and
false positives.

It's important to note that the AUC score is specific to binary classification problems and cannot be directly applied
to multi-class classification tasks without modification.
unit: Area under curve
display:
numDecimalPlaces: 0



Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ dataset:
date_accessed: 2021-07-08
url: https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy.html
description: |
BP's region definitions sometimes differ from Our World in Data's definitions. For example, BP's North America includes only Canada, Mexico and United States, whereas Our World in Data's North America includes countries in Central America (see a map with <a href="https://ourworldindata.org/world-region-map-definitions">our region definitions</a>). For this reason, we include in the dataset regions like "North America (BP)" to refer to BP's original data using their definition of the region, as well as "North America", which is data aggregated by Our World in Data using our definition. These aggregates are constructed by adding up (when possible) the contributions from the countries in the region.
BP's region definitions sometimes differ from Our World in Data's definitions. For example, BP's North America includes only Canada, Mexico and United States, whereas Our World in Data's North America includes countries in Central America (see a map with [our region definitions](https://ourworldindata.org/world-region-map-definitions)). For this reason, we include in the dataset regions like "North America (BP)" to refer to BP's original data using their definition of the region, as well as "North America", which is data aggregated by Our World in Data using our definition. These aggregates are constructed by adding up (when possible) the contributions from the countries in the region.

<a href="https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy/using-the-review/definitions-and-explanatory-notes.html#accordion_Regional%20definitions">BP's region definitions</a>, denoted with "(BP)", are:
[BP's region definitions](https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy/using-the-review/definitions-and-explanatory-notes.html#accordion_Regional%20definitions), denoted with "(BP)", are:
* "Asia Pacific (BP)": Brunei, Cambodia, China (Mainland), China Hong Kong SAR (Special Administrative Region), China Macau SAR (Special Administrative Region), Indonesia, Japan, Laos, Malaysia, Mongolia, North Korea, Philippines, Singapore, South Asia (Afghanistan, Bangladesh, India, Myanmar, Nepal, Pakistan and Sri Lanka), South Korea, Taiwan, Thailand, Vietnam, Australia, New Zealand, Papua New Guinea and Oceania.
* "Australasia (BP)": Australia, New Zealand.
* "CIS (BP)" - Commonwealth of Independent States: Armenia, Azerbaijan, Belarus, Kazakhstan, Kyrgyzstan, Moldova, Russian Federation, Tajikistan, Turkmenistan, Uzbekistan.
Expand All @@ -38,4 +38,4 @@ dataset:
* "North America" - All North American countries + "Other Caribbean (BP)" + "Other North America (BP)".
* "Oceania" - All Oceanian countries.
* "South America" - All South American countries + "Other South America (BP)".
Where the individual countries in each region are defined <a href="https://ourworldindata.org/world-region-map-definitions">in this map</a>. Additional BP regions are ignored, since they belong to other regions already included (e.g. the data for "Other Western Africa (BP)" is included in "Other Africa (BP)"). Finally, income groups are constructed following the definitions <a href="https://ourworldindata.org/grapher/world-banks-income-groups">in this map</a>.
Where the individual countries in each region are defined [in this map](https://ourworldindata.org/world-region-map-definitions). Additional BP regions are ignored, since they belong to other regions already included (e.g. the data for "Other Western Africa (BP)" is included in "Other Africa (BP)"). Finally, income groups are constructed following the definitions [in this map](https://ourworldindata.org/grapher/world-banks-income-groups).
Loading