-
Notifications
You must be signed in to change notification settings - Fork 0
/
my_util
118 lines (78 loc) · 6.3 KB
/
my_util
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
function find_True_positive(Pr_Positive_True::Float64,Pr_Positive_False::Float64, Pr_True::Float64)
Pr_Positive = Pr_Positive_True * Pr_True + Pr_Positive_False * ( 1 - Pr_True )
posterior = Pr_Positive_True * Pr_True / Pr_Positive
return(posterior)
end
find_True_positive(0.95,0.01,0.001)
# 0.08683729433272395
using Distributions
using StatsBase
using Distributions
using StatsBase
using StatsPlots
function find_posterior(successes::Int, trials::Int, grid_points::Int, num_samples::Int)
# Create a grid of probabilities
p_grid = range(0, stop=1, length=grid_points)
# Uniform prior
prob_p = fill(1.0, grid_points)
# Likelihood of observing 'successes' successes in 'trials' trials
prob_data = [pdf(Binomial(trials, p), successes) for p in p_grid]
# Posterior distribution
posterior = prob_data .* prob_p
# Normalize the posterior
posterior ./= sum(posterior)
# Sample from the posterior distribution
samples = sample(p_grid, Weights(posterior), num_samples, replace=true)
# Create a density plot of the samples
density(samples, xlabel="Probability", ylabel="Density", title="Posterior Density")
# Calculate the sum of posterior probabilities where p < 0.5
prob_less_than_half = sum(posterior[p_grid .< 0.5])
# Calculate the proportion of samples between 0.5 and 0.75
proportion = sum((samples .> 0.5) .& (samples .< 0.75)) / length(samples)
return p_grid, posterior, samples, prob_less_than_half, proportion
end
p_grid, posterior, samples, prob_less_than_half, proportion = find_posterior(6, 9, 1000, 10000)
println("Probability that p < 0.5: ", prob_less_than_half)
println("Proportion of samples between 0.5 and 0.75: ", proportion)
quantile(samples,[0.1,0.9])
"""
Yes, there are several additional metrics you can consider when assessing the election outcome based on your Bayesian model:
1. Posterior mean or median: The mean or median of the posterior distribution can provide a point estimate of the proportion of voters supporting each candidate. This can give you a single "best guess" of the election outcome based on your model.
2. Probability of winning: You can calculate the probability that a candidate will win the election by computing the proportion of posterior samples where their vote share exceeds 50%. This gives you a direct measure of the likelihood of each candidate winning based on your model.
3. Posterior probability of a tie: You can also compute the probability of a tie by calculating the proportion of posterior samples where the vote shares are exactly equal (or within a small margin to account for rounding).
4. Marginal likelihood or Bayes factor: If you want to compare different models (e.g., with different priors or likelihood functions), you can compute the marginal likelihood or Bayes factor. These metrics provide a way to assess the relative fit of different models to the data while accounting for model complexity.
5. Sensitivity analysis: You can explore how sensitive your results are to the choice of prior or other model assumptions. This can help you assess the robustness of your conclusions and identify which assumptions are most critical.
6. Predictive accuracy: If you have historical data from previous elections, you can assess the predictive accuracy of your model by comparing its predictions to the actual outcomes. This can help you calibrate your model and understand how well it performs in practice.
These additional metrics can provide a more comprehensive understanding of the election outcome and the uncertainty associated with your predictions. They can also help you communicate your results to a wider audience and make more informed decisions based on your analysis.
Certainly! In Julia, you can use the DynamicLinearModels.jl package for Kalman filtering and state-space modeling. This package provides a flexible and efficient framework for working with dynamic linear models, including the Kalman filter.
To install the DynamicLinearModels.jl package, you can use the following command in the Julia REPL:
```julia
using Pkg
Pkg.add("DynamicLinearModels")
```
Once the package is installed, you can use it to implement Kalman filtering for your poll aggregation. Here's a basic example of how you can use DynamicLinearModels.jl to create a Kalman filter for tracking the true state of the race over time:
```julia
using DynamicLinearModels
# Define the state-space model
model = LocalLevel(1.0)
# Initialize the Kalman filter
kf = kalman_filter(model, 1.0)
# Iterate over the polls
for poll in polls
# Extract the poll date, sample size, and candidate support
date = poll.date
sample_size = poll.sample_size
candidate_support = poll.candidate_support
# Update the Kalman filter with the new poll
update_kalman_filter!(kf, date, candidate_support, sample_size)
end
# Get the filtered estimates of the true state
filtered_state = get_filtered_state(kf)
```
In this example, we define a `LocalLevel` model, which assumes that the true state of the race follows a random walk over time. We then initialize a Kalman filter with an initial estimate of the state.
For each poll, we extract the relevant information (date, sample size, and candidate support) and update the Kalman filter using the `update_kalman_filter!` function. The function takes the Kalman filter object, the poll date, the candidate support (as a proportion), and the sample size.
After processing all the polls, we can retrieve the filtered estimates of the true state using the `get_filtered_state` function.
DynamicLinearModels.jl provides a wide range of options for customizing the state-space model, handling missing data, and estimating model parameters. You can refer to the package documentation for more details and examples:
Keep in mind that Kalman filtering assumes that the polls are noisy observations of the true state, and it tries to estimate the true state by balancing the information from the polls with the assumed dynamics of the state. The performance of the Kalman filter will depend on the quality of the polls, the appropriateness of the state-space model, and the choice of model parameters.
As with any model, it's important to validate the Kalman filter's assumptions and performance using historical data or out-of-sample predictions before relying on it for real-time poll aggregation.
"""