-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathestimated_life_expectancy.html
526 lines (492 loc) · 23.4 KB
/
estimated_life_expectancy.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
<html><head><title>niplav</title>
<link href="./favicon.png" rel="shortcut icon" type="image/png"/>
<link href="main.css" rel="stylesheet" type="text/css"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!DOCTYPE HTML>
<style type="text/css">
code.has-jax {font: inherit; font-size: 100%; background: inherit; border: inherit;}
</style>
<script async="" src="./mathjax/latest.js?config=TeX-MML-AM_CHTML" type="text/javascript">
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
processEscapes: true,
skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script>
document.addEventListener('DOMContentLoaded', function () {
// Change the title to the h1 header
var title = document.querySelector('h1')
if(title) {
var title_elem = document.querySelector('title')
title_elem.textContent=title.textContent + " – niplav"
}
});
</script>
</head><body><h2 id="home"><a href="./index.html">home</a></h2>
<p><em>author: niplav, created: 2019-04-10, modified: 2020-07-20, language: english, status: in progress, importance: 3, confidence: possible</em></p>
<blockquote>
<p><strong>A while ago, I became interested in personal estimations
of life expectancy. I wanted to know how accurate people are
at estimating their own life expectancy (by checking <a href="https://en.wikipedia.org/wiki/Life_table">actuarial
tables</a> and whether accuracy
correlates with age or gender). I went out and collected data, which is
shared and analysed here.</strong></p>
</blockquote><div class="toc"><div class="toc-title">Contents</div><ul><li><a href="#Terminology">Terminology</a><ul></ul></li><li><a href="#Data_Collection_Method">Data Collection Method</a><ul></ul></li><li><a href="#Analysis">Analysis</a><ul><li><a href="#Age">Age</a><ul></ul></li><li><a href="#Subjective_Life_Expectancy">Subjective Life Expectancy</a><ul></ul></li><li><a href="#Linear_Regression__Correlation">Linear Regression & Correlation</a><ul></ul></li><li><a href="#Comparison_With_Actuarial_Tables">Comparison With Actuarial Tables</a><ul></ul></li></ul></li><li><a href="#Problems">Problems</a><ul><li><a href="#Hope_Instead_of_Estimates">Hope Instead of Estimates</a><ul></ul></li><li><a href="#Joke_Answers">Joke Answers</a><ul></ul></li><li><a href="#Some_Ages_are_Missing">Some Ages are Missing</a><ul></ul></li><li><a href="#Selection_Bias_and_NonRepresentative_Data">Selection Bias and Non-Representative Data</a><ul></ul></li><li><a href="#Gender_was_Determined_by_Appearance">Gender was Determined by Appearance</a><ul></ul></li><li><a href="#No_Tracking_of_the_Answering_Rate">No Tracking of the Answering Rate</a><ul></ul></li></ul></li></ul></div>
<h1 id="Subjective_Estimates_of_Life_Expectancy"><a class="hanchor" href="#Subjective_Estimates_of_Life_Expectancy">Subjective Estimates of Life Expectancy</a></h1>
<!--TODO: at ≥1000 data points, put results into the abstract.-->
<p>Actuarial tables are of great interest to statisticians, gerontologists
and policymakers. In this piece, data about subjective life expectancy
of urban germans is presented, analyzed using linear regression and
compared to actuarial tables. Gender and age differences in accuracy of
assessing ones own life expectancy are also considered.</p>
<h2 id="Terminology"><a class="hanchor" href="#Terminology">Terminology</a></h2>
<p>This text uses the terms "subjective life expectancy" and "estimated
life expectancy" interchangeably.</p>
<h2 id="Data_Collection_Method"><a class="hanchor" href="#Data_Collection_Method">Data Collection Method</a></h2>
<p>~500 random people were approached (the
exact number of approaches was <a href="./estimated_life_expectancy.html#No_Tracking_of_the_Answering_Rate">regrettably not
recorded</a>)
during the day in the streets of the center of a major german city. They
were asked the following questions (in the presented order):</p>
<ol>
<li>"Entschuldigung, darf ich Ihnen kurz zwei Fragen stellen?"</li>
<li>"Wie alt, schätzen Sie, werden sie werden?"</li>
<li>"Und wie alt sind Sie?"</li>
<li>"Vielen Dank, schönen Tag noch."</li>
</ol>
<p>If the respondent was unsure after the second question, they were told:
"Nur eine grobe Schätzung" to indicate that they weren't expected to make
a perfect estimate. If at any point the respondent seemed uncomfortable,
the interrogation was stopped with step 4 directly.</p>
<p>If after the first question the respondent didn't seem able to understand,
they were asked the following questions (in that order):</p>
<ol>
<li>"Do you speak English?"</li>
<li>"How old, do you think, will you become?"</li>
<li>"And how old are you?"</li>
<li>"Thank you very much, have a nice day."</li>
</ol>
<p>Similarly, if the respondent seemed unsure after step 2, they were told
to only give "a rough estimate" of the number.</p>
<p>The perceived gender of the respondent was then noted together with
their age and estimated age.</p>
<p>Data was collected in the time from February 2019 to August 2019<!--(TODO:
Finish, collect at least 1000 data points, then do analysis)-->.</p>
<p>The raw data is available in CSV
<a href="./data/estimated_life_expectancy.csv">here</a>.</p>
<h2 id="Analysis"><a class="hanchor" href="#Analysis">Analysis</a></h2>
<!--TODO: Add p-values!-->
<p>Code to load the data from the CSV file:</p>
<pre><code> .l("csv")
.l("nstat")
.fc(.ic("./data/estimated_life_expectancy.csv"));data::csv.load()
m::1:$'{1_x}'flr({"m"~*x};data)
f::1:$'{1_x}'flr({"f"~*x};data)
data::1:$'{1_x}'1_data
</code></pre>
<h3 id="Age"><a class="hanchor" href="#Age">Age</a></h3>
<p>Of the 250 respondents, 125 were male and 125 were female. The
mean age was 38.228 years (37.192 years for men and 39.264 years for women),
with the youngest respondents being 15 years old and the oldest being
89 years old (youngest/oldest male: 15/82, youngest/oldest female: 16/89).
The standard deviation for age was 17.5 (17.17 for men, 17.76 for women).</p>
<pre><code> dm::mu(*+data)
38.228
fm::mu(*+f)
39.264
mm::mu(*+m)
37.192
ages::(*+data)@<*+data
mages::(*+m)@<*+m
fages::(*+f)@<*+f
*ages
15
*|ages
89
*mages
15
*|mages
82
*fages
16
*|fages
89
sd(ages)
17.4981146412977875
sd(mages)
17.1700651134467163
sd(fages)
17.7597945934067604
</code></pre>
<p>Code for the image:</p>
<pre><code>.l("nplot")
.l("./load.kg")
cgrid("age (female red, male blue) (N=",($#data),")";[0 100 20])
setdot(3)
fillrgb(0;0;1)
scplot(mages)
fillrgb(1;0;0)
scplot(fages)
setrgb(0;0;0)
segplot(100:^dm)
setrgb(0;0;1)
segplot(100:^mm)
setrgb(1;0;0)
segplot(100:^fm)
draw()
</code></pre>
<p><img alt="The ages of the participants" src="./img/estimated_life_expectancy/ages.png" title="The ages of the participants: Scatterplot where the data points are sorted, male age grows a little bit slower than female age, and female age has a gap between 42 and 50."/></p>
<!--TODO: there is an annoying red line at the border of the image here
and in est_ages.png. Fix that.-->
<p>As one can see in the image, the ages of the respondents by gender
follow a very similar pattern, except a lack of women aged 40 to 50
and a lack of women older than 75. Perhaps a bias on my side? But I
remember to have noticed that older women are less likely to answer,
especially questions about age.</p>
<h3 id="Subjective_Life_Expectancy"><a class="hanchor" href="#Subjective_Life_Expectancy">Subjective Life Expectancy</a></h3>
<p>Estimates of life expectancy were interesting: The mean estimate was
83.248 years, 82.016 years for men and 84.48 years for women. The lowest
estimate of life expectancy was only 30 years, the highest was 200 years
(lowest/highest for males: 30/120, lowest/highest for females: 39/200).</p>
<p>On average, respondents estimated that they had 45.02 years left in
their life, women estimating having 45.216 left in their lifes, men with
44.824 years. The lowest estimate for years of life left was 1, the highest
was 169 (lowest/highest for males: 1/86, lowest/highest for females: 1/169).</p>
<p>The standard deviation for the estimated age was 14.32 (13.29 for men,
15.18 for women).</p>
<pre><code> estages::(*|+data)@<*|+data
mu(estages)
83.248
mestages::(*|+m)@<*|+m
festages::(*|+f)@<*|+f
mu(mestages)
82.016
mu(festages)
84.48
*estages
30
*|estages
200
*mestages
30
*|mestages
120
*festages
39
*|festages
200
:"yl: years left"
yl::{(*|x)-*x}'data
yl::yl@<yl
muyl::mu(yl)
45.02
fyl::{(*|x)-*x}'f
fyl::fyl@<fyl
mufyl::mu(fyl)
45.216
myl::{(*|x)-*x}'m
myl::myl@<myl
mumyl::mu(myl)
44.824
*yl
1
*|yl
169
*myl
1
*|myl
86
*fyl
1
*|fyl
169
sd(estages)
14.3246115479617805
sd(mestages)
13.2941996374358693
sd(festages)
15.1860989065658333
</code></pre>
<p>Code for the image:</p>
<pre><code>.l("nplot")
.l("./load.kg")
cgrid("estimated age (female red, male blue) (N=",($#data),")";[0 200 20])
setdot(3)
fillrgb(0;0;1)
scplot(mestages)
fillrgb(1;0;0)
scplot(festages)
setrgb(0;0;0)
segplot(100:^dem)
setrgb(0;0;1)
segplot(100:^mem)
setrgb(1;0;0)
segplot(100:^fem)
setwidth(1)
setrgb(0;0;0)
draw()
</code></pre>
<p><img alt="Subjective life expectancies of the participants" src="./img/estimated_life_expectancy/est_ages.png" title="Subjective life expectancies of the participants: Scatterplot of the estimated ages shows that most people estimate to become around 80 years old, with some outliers on the lower & upper end. Male and female plots grow equally fast."/></p>
<p>The black line between the blue line for the male average estimated age
and the red line for female average estimated age is for the overall
average estimated age.</p>
<h3 id="Linear_Regression__Correlation"><a class="hanchor" href="#Linear_Regression__Correlation">Linear Regression & Correlation</a></h3>
<p>One can now do a linear regression on the data and try to find out what
the relation between estimated age and real age is.</p>
<pre><code> lreg(data)
[0.143075581058418136 77.7785066872987915]
lreg(f)
[0.110387262427545795 80.1457545280448419]
lreg(m)
[0.17043090258300147 75.6773338711330093]
</code></pre>
<p>The regression shows a positive relation between age and subjective life
expectancy, which is stronger for men than for women.</p>
<p>One can now also calculate the correlation between age and estimated age,
which is 0.1431 for the whole data set, 0.1104 for women and 0.1704 for
men, indicating that men become more optimistic when growing older.</p>
<pre><code> cor@+data
0.14307558105841815
cor@+f
0.110387262427545804
cor@+m
0.170430902583001477
</code></pre>
<p>This can be shown in a scatter plot of the data.</p>
<pre><code>.l("nplot")
.l("./load.kg")
grid([0 100 20];[0 220 20])
fillrgb(0;0;1)
setdot(3)
scplot2(m)
fillrgb(1;0;0)
scplot2(f)
xtitle("age")
ytitle("estimated age")
setrgb(0;0;1)
plot(lr(;lreg(m)))
setrgb(1;0;0)
plot(lr(;lreg(f)))
setrgb(0;0;0)
plot(lr(;lreg(data)))
draw()
</code></pre>
<p><img alt="A scatter plot of the whole data, with linear regression" src="./img/estimated_life_expectancy/all_scatter.png" title="A scatter plot of the whole data, with linear regressions. The linear regression for men starts at 77 years, and grows to ~90 years, the linear regression for women starts at age 80 and also grows to age ~90. The overall linear regression lies between these two datapoints."/></p>
<p>The red dots represent data points from women, the blue ones are from
men. The red line is the linear regression for the data by women, the
black line is the linear regression for the whole data, and the blue
line is the linear regression for the data by men. Outliers are clearly
visible, and the difference between the linear regressions is also visible.</p>
<h3 id="Comparison_With_Actuarial_Tables"><a class="hanchor" href="#Comparison_With_Actuarial_Tables">Comparison With Actuarial Tables</a></h3>
<p>One can also compare the data collected to existing actuarial tables,
and determine if there is a systematic deviation between subjective life
expectancy and the values given by actuarial tables.</p>
<p>For this, one can calculate averages for a given age, both separately
for genders and for the whole data set:</p>
<pre><code> grf::=*+f
abaf::mu'(*|+f)@grf
:"female averages, by age"
rf::(*'(*+f)@grf),'abaf
grm::=*+m
abam::mu'(*|+m)@grm
"male averages, by age"
rm::(*'(*+m)@grm),'abam
gr::=*+data
aba::mu'(*|+data)@gr
"averages, by age"
r::(*'(*+data)@gr),'aba
</code></pre>
<p>Next, one needs actual actuarial tables for information to compare
the collected data to. Because the data was collected in Germany,
the actuarial tables were taken from the <a href="https://en.wikipedia.org/wiki/Federal_Statistical_Office_of_Germany">Federal Statistical Office of
Germany</a>.
The <a href="https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bevoelkerung/Sterbefaelle-Lebenserwartung/_inhalt.html">actuarial
tables</a>
were downloaded, converted from <a href="https://en.wikipedia.org/wiki/ISO/IEC_8859">ISO/IEC
8859</a> to
<a href="https://en.wikipedia.org/wiki/UTF-8">UTF-8</a> and arranged in the
following form:</p>
<pre><code>age,life expectancy m,life expectancy f
</code></pre>
<p>The resulting data can be viewed <a href="./data/de_actuarial_tables.csv">here</a>.</p>
<p>Now, one can start comparing the collected estimates to the actuarial
tables. To do this, one first loads the data into Klong:</p>
<pre><code> known::{(1:$*x),(1.0:$x@1),(1.0:$x@2)}'1_known
mk::{x@[0 1]}'known
fk::{x@[0 2]}'known
kr::{(*x),((x@1)+x@2)%2}'known
</code></pre>
<p>The variable <code>kr</code> contains the mean of the female and male actuarial values.</p>
<p>Since one has already calculate the average estimates for life
expectancy by age (the variables <code>grm</code>, <code>grf</code> and <code>gr</code>), one can now
compute correlations between the estimates and the actuarial tables. The
correlation between the estimates and the actuarial values is 0.0957,
0.0772 for men and 0.0348 for women.</p>
<pre><code> cord::cor(aba;*|+kr@*'(*+data)@gr)
0.0957337523144531185
corm::cor(abam;*|+mk@*'(*+m)@grm)
0.0772309898403104608
corf::cor(abaf;*|+fk@*'(*+f)@grf)
0.0347646631118574528
</code></pre>
<p>One can now also visualize the estimates and actuarial values.</p>
<p>For the mean estimate:</p>
<pre><code>.l("nplot")
.l("nstat")
.l("./load.kg")
grid([0 100 20];[0 150 20])
setdot(3)
fillrgb(0;0;0)
scplot2(mk)
fillrgb(0;0;1)
scplot2(rm)
xtitle("age")
ytitle("male: average estimated life expectancy (blue), average life expectancy (black)")
draw()
</code></pre>
<p><img alt="Mean estimate per year to actuarial table" src="./img/estimated_life_expectancy/avg_est.png" title="Mean estimate per year to actuarial table. The values for the actuarial table follow a standard distribution, starting at 80 years of age. They don't really change until the age of ~50, where they slowly start rising somewhat linearly. They end with 102 expected years for 100 year olds. The subjective life expectancy for a given age is clustered around the actuarial data, sometimes with 20 years differcen (probably due to a small sample size)."/></p>
<!--TODO: Write something about the [Gompertz
distribution](https://en.wikipedia.org/wiki/Gompertz_distribution)
in the tooltip-->
<p>For men:</p>
<pre><code>.l("nplot")
.l("nstat")
.l("./load.kg")
grid([0 100 20];[0 150 20])
setdot(3)
fillrgb(0;0;0)
scplot2(mk)
fillrgb(0;0;1)
scplot2(rm)
xtitle("age")
ytitle("male: average estimated life expectancy (blue), average life expectancy (black)")
draw()
</code></pre>
<p><img alt="Mean estimate per year to actuarial table for men" src="./img/estimated_life_expectancy/m_res.png" title="Mean estimate per year to actuarial table for men. The graph shows the life expectancy for a person with a given age (actuarial values) (how old the person is to how old they will become on average) and the average subjective estimate for life expectancy per age (how old the person is to how old they think they will become). The actuarial data for men rises with the standard Gompertz curve, starting at 80 and beginning to rise at around the person reaching the age of 60. The mean estimate is clustered around this curve, but still quite noisy."/></p>
<p>For women:</p>
<pre><code>.l("nplot")
.l("nstat")
.l("./load.kg")
grid([0 100 20];[0 150 20])
setdot(3)
fillrgb(0;0;0)
scplot2(fk)
fillrgb(1;0;0)
scplot2(rf)
xtitle("age")
ytitle("female: average estimated life expectancy (red), average life expectancy (black)")
draw()
</code></pre>
<p><img alt="Mean estimate per year to actuarial table for men" src="./img/estimated_life_expectancy/f_res.png" title="Mean estimate per year to actuarial table for women. The graph shows the life expectancy for a person with a given age (actuarial values) (how old the person is to how old they will become on average) and the average subjective estimate for life expectancy per age (how old the person is to how old they think they will become). The actuarial data for women also rises with the standard Gompertz curve, starting at 82 and beginning to rise at around the person reaching the age of 60. The mean estimate is clustered around this curve, but still quite noisy."/></p>
<!--TODO: difference between estimated age and actuarial value, is there
a correlation? Would that even be a useful measure?-->
<h2 id="Problems"><a class="hanchor" href="#Problems">Problems</a></h2>
<p>As with most self-reported data, there are numerous problems with this
approach. The data presented here represents estimates often made in a
very short amount of time, and is prone to misunderstandings.</p>
<h3 id="Hope_Instead_of_Estimates"><a class="hanchor" href="#Hope_Instead_of_Estimates">Hope Instead of Estimates</a></h3>
<p>It seemed like very often people misunderstood that the estimate they were
asked for was not asking for the age they hoped for, but the age they
estimated. It would not have been practical explaining this to people,
since the response rate already was quite low and there were sometimes
severe difficulties in understanding, but this seems to be the strongest
bias in the data. Interestingly, a surprising number of people seemed
to hope to die earlier than the average person, especially a woman aged
29 who hoped to die at 39.</p>
<h3 id="Joke_Answers"><a class="hanchor" href="#Joke_Answers">Joke Answers</a></h3>
<p>Keeping in line with the <a href="http://slatestarcodex.com/2013/04/12/noisy-poll-results-and-reptilian-muslim-climatologists-from-mars/">lizardman
constant</a>,
there was a number of joke answers and sometimes even straight lies. Most
of the people who answered that they estimated to become 120 years old
fit into this category (the woman who estimated that she would become
200 years old seemed convinced of that, though). There was also a number
of people who either looked way too old or lied about their age.</p>
<h3 id="Some_Ages_are_Missing"><a class="hanchor" href="#Some_Ages_are_Missing">Some Ages are Missing</a></h3>
<p>The visualisation of the data shows a clear lack of women aged between
40-50 in the data set, as well as a lack of people older than 70. One
possible explanation for this observation is an age-related bias by the
interrogator. Another possible explanation could be that people older
than 70 don't like to answer questions in the street, don't go outside
very often, are rare, or lie about their age regularly. I note that it
seemed to me that older women were most reluctant to answer my questions,
but that may just be another manifestation of personal bias.</p>
<h3 id="Selection_Bias_and_NonRepresentative_Data"><a class="hanchor" href="#Selection_Bias_and_NonRepresentative_Data">Selection Bias and Non-Representative Data</a></h3>
<p>As with any street poll, this interrogation got responses from
people willing to answer questions from strangers on the street. This
could select for high openness, which could function as a proxy for
optimism. Also, all people were interrogated in a big city, which
could have biased the data (I am not sure which direction this bias
woul tend). One could also argue that self-reported data is <a href="https://guzey.com/statistics/dont-believe-self-reported-data/">notoriously
unreliable</a>,
but since the point of this data collection was to aggregate subjective
estimates, this should not be an issue here.</p>
<h3 id="Gender_was_Determined_by_Appearance"><a class="hanchor" href="#Gender_was_Determined_by_Appearance">Gender was Determined by Appearance</a></h3>
<p>The gender of the respondent was determined by a subjective estimate
by the interrogator, which can lead to some problems in regard to
incorrect perception by the interrogator (for example misgendering trans
people). If there had been uncertainty about the gender, the respondent
would have been asked about their identified gender. This was not done
on all respondents, since it would have increased the response time and
therefore decreased the amount of usable responses.</p>
<p>One can do a quick statistical estimate of the probability of
encountering and misgendering a trans person. According to <a href="./doc/sociology/how_many_adults_identify_as_transgender_us_flores_et_al_2016.pdf" title="How Many Adults Identify as Transgender in the United States">Flores et al.
2016</a>,
0.6% of the U.S. adult population identifies as transgender. It is
assumed that the numbers for Germany are similar. <!--TODO: Find a
german source--> It is also assumed that the probability for a trans
person passing is around 90% (I am just making this number up).</p>
<p>One can then use the <a href="https://en.wikipedia.org/wiki/Binomial_distribution">binomial
distribution</a> to
estimate the probability of misgendering at least one trans person. The
code is in <a href="http://t3x.org/klong/index.html">Klong</a>. The probability of
misgendering at least one trans person is calculated by calculating the
probability of encountering n≤maxt trans people, multiplied with the
probability of not correctly gendering all of them.</p>
<pre><code> .l("nstat")
:"probability of being trans"
pt::0.006
:"probability of passing"
pp::0.9
"number of people asked"
n::250
:"maxt is the number so that the probability that I met"
:"more than maxt trans people is less than 1 in a billion."
maxt::{(1-b.cdf(x;n;pt))>10e-9}{x+1}:~1
:"probability of encountering at least 1 trans person"
1-b.pmf(0;n;pt)
0.777875686117172762
:"probability of encountering at least 1 trans person and misgendering them"
+/{(1-pp^x)*b.pmf(x;n;pt)}'1+!maxt
0.139330765507831189
</code></pre>
<p>The result shows that the probability of encountering and misgendering at
least one trans person (and therefore at least one incorrect data point)
is ~13%.</p>
<!--TODO Question: How many people pass as gender X, but identify as gender Y≠X?-->
<h3 id="No_Tracking_of_the_Answering_Rate"><a class="hanchor" href="#No_Tracking_of_the_Answering_Rate">No Tracking of the Answering Rate</a></h3>
<p>While answers were tracked, I did not think of writing down the number
and gender of the people approached. It would have been interesting
to see whether there are any gender-specific differences in answering
rates, and it generally have been a better practice to write them down
(also to determine the overall answering rate).</p>
<!--
Conclusion
-----------
### Comparison with Existing Results
External Links
--------------
-->
<!--
TODO: read:
/usr/local/doc/unread/sle/*
-->
</body></html>