-
Notifications
You must be signed in to change notification settings - Fork 0
/
iqisa.html
167 lines (163 loc) · 10.5 KB
/
iqisa.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
<html><head><title>niplav</title>
<link href="./favicon.png" rel="shortcut icon" type="image/png"/>
<link href="main.css" rel="stylesheet" type="text/css"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!DOCTYPE HTML>
<style type="text/css">
code.has-jax {font: inherit; font-size: 100%; background: inherit; border: inherit;}
</style>
<script async="" src="./mathjax/latest.js?config=TeX-MML-AM_CHTML" type="text/javascript">
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
processEscapes: true,
skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script>
document.addEventListener('DOMContentLoaded', function () {
// Change the title to the h1 header
var title = document.querySelector('h1')
if(title) {
var title_elem = document.querySelector('title')
title_elem.textContent=title.textContent + " – niplav"
}
});
</script>
</head><body><h2 id="home"><a href="./index.html">home</a></h2>
<p><em>author: niplav, created: 2022-07-15, modified: 2024-06-27, language: english, status: finished, importance: 6, confidence: certain</em></p>
<blockquote>
<p><strong>Iqisa is <a href="https://github.com/niplav/iqisa">a library</a> for handling and comparing forecasting datasets from different platforms.</strong></p>
</blockquote><div class="toc"><div class="toc-title">Contents</div><ul><li><a href="#Advantages">Advantages</a><ul></ul></li><li><a href="#Requirements">Requirements</a><ul></ul></li><li><a href="#Uses">Uses</a><ul></ul></li><li><a href="#Possible_Projects">Possible Projects</a><ul></ul></li><li><a href="#Known_Bugs">Known Bugs</a><ul></ul></li><li><a href="#Feature_Wishlist">Feature Wishlist</a><ul><li><a href="#Potential_Additional_Sources_for_Forecasting_Data">Potential Additional Sources for Forecasting Data</a><ul></ul></li></ul></li><li><a href="#Acknowledgements">Acknowledgements</a><ul></ul></li><li><a href="#Discussions">Discussions</a><ul></ul></li></ul></div>
<h1 id="Iqisa_A_Library_For_Handling_Forecasting_Datasets"><a class="hanchor" href="#Iqisa_A_Library_For_Handling_Forecasting_Datasets">Iqisa: A Library For Handling Forecasting Datasets</a></h1>
<blockquote>
<p>The eventual success of my archives reinforced my view that
public permission-less datasets are often a bottleneck to
research: you cannot guarantee that people will use your dataset,
but you can guarantee that they won’t use it.</p>
</blockquote>
<p><em>—<a href="https://gwern.net/">Gwern Branwen</a>, <a href="https://gwern.net/newsletter/2019/13">“2019 News”</a>, 2019</em></p>
<p>Iqisa is a collection of forecasting datasets and a simple
library for handling those datasets. Code and data available
<a href="https://github.com/niplav/iqisa">here</a>. Documentation is
<a href="./iqisadoc.html">here</a>.</p>
<p>So far it contains data from:</p>
<ul>
<li>The <a href="https://en.wikipedia.org/wiki/The_Good_Judgment_Project">Good Judgment Project</a>: ~790k market trades+~3.14m survey predictions≈3.9m forecasts</li>
<li>The <a href="https://www.metaculus.com/api2/schema/redoc/">Metaculus public API</a>: ~390k forecasts</li>
<li><a href="https://predictionbook.com/">PredictionBook</a>: ~64k forecasts</li>
</ul>
<p>for a total of ~4.38m forecasts, as well as code for handling private
<a href="https://metaculus.com">Metaculus</a> data (available to researchers on
request to Metaculus), but I plan to also add data from various other
sources.</p>
<p>The documentation can be found <a href="./iqisadoc.html">here</a>, but a simple
example for using the library is seeing whether traders with more than
100 trades have a better Brier score than traders in general:</p>
<pre><code>import gjp
import iqisa as iqs
market_fcasts=gjp.load_markets()
def brier_score(probabilities, outcomes):
return np.mean((probabilities-outcomes)**2)
def brier_score_user(user_forecasts):
user_right=(user_forecasts['outcome']==user_forecasts['answer_option'])
probabilities=user_forecasts['probability']
return np.mean((probabilities-user_right)**2)
trader_scores=iqs.score(market_fcasts, brier_score, on=['user_id'])
filtered_trader_scores=iqs.score(market_fcasts.groupby(['user_id']).filter(lambda x: len(x)>100), brier_score, on=['user_id'])
</code></pre>
<p>And we can see:</p>
<pre><code>>>> np.mean(trader_scores)
score 0.159194
dtype: float64
>>> np.mean(filtered_trader_scores)
score 0.159018
dtype: float64
</code></pre>
<p>Concluding that more experienced traders are only very slightly better
at trading.</p>
<h2 id="Advantages"><a class="hanchor" href="#Advantages">Advantages</a></h2>
<p>Iqisa offers some advantages over existing datasets:</p>
<ul>
<li>Ease of use: data can be loaded using two lines of code</li>
<li>Unified data format: the data from different platforms and projects has been brought into the same format that enables easier comparisons and cross-platform analyses</li>
<li>Pre-defined functionalities for analysis: Some functionalities (such as for scoring or aggregation) have been implemented for easier use</li>
</ul>
<h2 id="Requirements"><a class="hanchor" href="#Requirements">Requirements</a></h2>
<ul>
<li><a href="https://numpy.org/">NumPy</a></li>
<li><a href="https://pandas.pydata.org">Pandas</a></li>
<li><a href="https://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a></li>
</ul>
<h2 id="Uses"><a class="hanchor" href="#Uses">Uses</a></h2>
<ul>
<li><a href="./range_and_forecasting_accuracy.html#Analysis__Results">Range and Forecasting Accuracy (niplav, 2023)</a></li>
<li><a href="./precision.html">Precision of Sets of Forecasts (niplav, 2023)</a></li>
</ul>
<h2 id="Possible_Projects"><a class="hanchor" href="#Possible_Projects">Possible Projects</a></h2>
<ul>
<li>Take questions from different platforms that are close to each other on <a href="https://github.com/stanleyfok/sentence2vec">sentence2vec</a>, and check which platform made the better predictions on that question.</li>
</ul>
<h2 id="Known_Bugs"><a class="hanchor" href="#Known_Bugs">Known Bugs</a></h2>
<p>Since this is a project I'm now doing in my free time, it might not be
as polished as it should be. Sorry :-/</p>
<p>If you decide to work with this library, feel free to <a href="./about.html#Contact">contact me</a>.</p>
<ul>
<li>Issues with the time fields
<ul>
<li>Not all time-related fields have timezone information attached to them.</li>
</ul></li>
<li>Some predictions in the dataset have occurred after question resolution. There should be a way to filter those out programmatically.</li>
<li>The columns of the datasets are not sorted the same way for question DataFrames and forecast DataFrames.</li>
<li>I fear that despite my best efforts, not all data frome the GJP data has been transferred<sub>20%</sub>.</li>
<li>The default fields in the Metaculus & PredictionBook data should be <code>NA</code> more often than they are right now.</li>
<li>The documentation is still <em>slightly</em> spotty, and tests are mostly nonexistent.</li>
<li>Some variables shouldn't be exposed, but are.</li>
</ul>
<h2 id="Feature_Wishlist"><a class="hanchor" href="#Feature_Wishlist">Feature Wishlist</a></h2>
<ul>
<li>Create a pip package</li>
<li>Add data from more platforms</li>
</ul>
<h3 id="Potential_Additional_Sources_for_Forecasting_Data"><a class="hanchor" href="#Potential_Additional_Sources_for_Forecasting_Data">Potential Additional Sources for Forecasting Data</a></h3>
<ul>
<li><a href="https://metaforecast.org/api/graphql?query=%23%0A%23+Welcome+to+Yoga+GraphiQL%0A%23%0A%23+Yoga+GraphiQL+is+an+in-browser+tool+for+writing%2C+validating%2C+and%0A%23+testing+GraphQL+queries.%0A%23%0A%23+Type+queries+into+this+side+of+the+screen%2C+and+you+will+see+intelligent%0A%23+typeaheads+aware+of+the+current+GraphQL+type+schema+and+live+syntax+and%0A%23+validation+errors+highlighted+within+the+text.%0A%23%0A%23+GraphQL+queries+typically+start+with+a+%22%7B%22+character.+Lines+that+start%0A%23+with+a+%23+are+ignored.%0A%23%0A%23+An+example+GraphQL+query+might+look+like%3A%0A%23%0A%23+++++%7B%0A%23+++++++field%28arg%3A+%22value%22%29+%7B%0A%23+++++++++subField%0A%23+++++++%7D%0A%23+++++%7D%0A%23%0A%23+Keyboard+shortcuts%3A%0A%23%0A%23++Prettify+Query%3A++Shift-Ctrl-P+%28or+press+the+prettify+button+above%29%0A%23%0A%23+++++Merge+Query%3A++Shift-Ctrl-M+%28or+press+the+merge+button+above%29%0A%23%0A%23+++++++Run+Query%3A++Ctrl-Enter+%28or+press+the+play+button+above%29%0A%23%0A%23+++Auto+Complete%3A++Ctrl-Space+%28or+just+start+typing%29%0A%23%0A">Metaforecast API</a></li>
<li><a href="https://github.com/andyzoujm/autocast">Autocast Dataset</a></li>
<li><a href="https://www.cset-foretell.com/">Foretell (CSET)</a></li>
<li><a href="https://www.gjopen.com/">Good Judgment Open</a></li>
<li><a href="https://www.hypermind.com">Hypermind</a></li>
<li><a href="https://augur.net/">Augur</a></li>
<li><a href="https://www.foretold.io/">Foretold</a></li>
<li><a href="https://www.fsu.gr/en/fss/omen">Omen</a></li>
<li><a href="https://www.givewell.org/">GiveWell</a></li>
<li><a href="https://www.openphilanthropy.org/">Open Philanthropy Project</a></li>
<li><a href="https://www.predictit.org/">PredictIt</a></li>
<li><a href="https://elicit.org/">Elicit</a></li>
<li><a href="https://polymarket.com/">PolyMarket</a></li>
<li><a href="https://en.wikipedia.org/wiki/Iowa_Electronic_Markets">Iowa Electronic Markets</a></li>
<li><a href="https://www.infer-pub.com/">INFER</a></li>
<li><a href="https://manifold.markets/home">Manifold</a></li>
<li><a href="https://smarkets.com/">Smarkets</a></li>
<li><a href="https://the-odds-api.com/">The Odds API</a></li>
<li><a href="https://kalshi.com/">Kalshi</a></li>
<li><a href="http://www.ideosphere.com/">Foresight Exchange</a></li>
</ul>
<h2 id="Acknowledgements"><a class="hanchor" href="#Acknowledgements">Acknowledgements</a></h2>
<p>Credits go to <a href="https://arbresearch.com/">Arb Research</a> for funding the
first 80% of this work, and Misha Yagudin in particular for guidance
and mentorship.</p>
<p>Thanks also to <a href="https://github.com/hrosspet">hrosspet</a> for making the
library more usable.</p>
<h2 id="Discussions"><a class="hanchor" href="#Discussions">Discussions</a></h2>
<ul>
<li><a href="https://www.lesswrong.com/posts/CPJeoLZiqbzKtxLYF/iqisa-a-library-for-handling-forecasting-datasets">LessWrong</a></li>
<li><a href="https://forum.effectivealtruism.org/posts/T7PC22Yc9heFL5Ay4/iqisa-a-library-for-handling-forecasting-datasets">Effective Altruism Forum</a></li>
</ul>
</body></html>