diff --git a/12th.md b/12th.md index b240de9..9cae4d5 100644 --- a/12th.md +++ b/12th.md @@ -1,5 +1,6 @@ +++ title = "Twelfth Amendment" +tags = ["tag1", "tag2"] +++ ## Tie scenario diff --git a/2020model.md b/2020model.md new file mode 100644 index 0000000..90b9092 --- /dev/null +++ b/2020model.md @@ -0,0 +1,132 @@ ++++ +title = "Election of 2020" ++++ + +## Purpose + +The model used to assess monthly polls begins with the assumption that the 2024 election will resemble the 2020 election in that + +* Biden will win the same states as in 2020, except for seven swing states +* Trump will win the same states as in 2020, except for North Carolina + +Models of the 2020 election for each of the seven swing states considers the number of votes won by Biden as a starting point. The results of each model will feed-forward as partial inputs to monthly poll models. In turn, the results of those will be input for subsequent months. + +~~~ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
StateBiden ProportionBiden VotesBoth Candidate Votes
AZ0.50161,672,1433,333,829
GA0.50122,473,6334,935,487
MI0.51412,804,0405,453,892
NV0.5122703,4861,373,376
NC0.49322,684,2925,443,067
PA0.50593,458,2296,835,903
WI0.50321,630,8663,241,050
+~~~ + +## Explanation + +- **Median**: Half of the random samples have probabilities greater than and half less than the median. +- **Mean**: The average of the 10,000 random samples constructed. It coincides with the Biden Proportion in the table above. +- **Mode**: The probability that occurs most often. +- **q025**: The value below which 2.5% of the probabilities occur. +- **q975**: The value above which 97.5% of the probabilities occur. +- **MCSE**: Measures the precision of Markov Chain Monte Carlo (MCMC) estimates by quantifying the variability due to finite sampling. Smaller values indicate better precision. Values close to zero indicate that little of the variability in the estimate arises from using MCMC. +- **R-hat**: Assesses the convergence of MCMC chains by comparing within-chain and between-chain variances. Values close to 1 indicate convergence. + +These diagnostics are used to assess usefulness of models using MCMC methods in determining whether the MCMC chains have run long enough to provide accurate and stable estimates of the posterior distributions. + +The density plots show the number of observations on the $y$-axis and the probabilities on the $x$-axis. The shaded area in the center shows the credible interval within which 95% of the probabilities fall. The unshaded areas show that the proportion of the votes are likely to be either less than or greater than the credible interval. + +## Rationale + +Although there will be a different electorate, for the reasons explained [here](/typology), most voters in 2024 are highly likely to have voted in 2020. A high degere of political polarization makes it likely that most of those voters will vote the same way. However, it is unrealistic to expect that they will vote exactly the same way. An approach to adjusting for this is to introduce mathematical uncertainty into the results of the 2020 election, explained in more detail [here](/prior). + +## Results + + + +### Pennsylvania + +### Georgia + +### North Carolina + +### Michigan + +### Arizona +~~~ + + + + + + + + + + + + + + + + + + + + + + + +
medianmeanmodeq025q975mcserhat
0.50160.50160.50150.5010.50210.01.0002
+ +~~~ + +### Wisconsin + +### Nevada diff --git a/_assets/election_priors.csv b/_assets/election_priors.csv new file mode 100644 index 0000000..413f82e --- /dev/null +++ b/_assets/election_priors.csv @@ -0,0 +1,8 @@ +st,biden_pop,tot +AZ,1672143,3333829 +GA,2473633,4935487 +MI,2804040,5453892 +NV,703486,1373376 +NC,2684292,5443067 +PA,3458229,6835903 +WI,1630866,3241050 diff --git a/_assets/img/models/AZ_2020.png b/_assets/img/models/AZ_2020.png new file mode 100644 index 0000000..3e8673c Binary files /dev/null and b/_assets/img/models/AZ_2020.png differ diff --git a/_assets/objs/AZ_2020_p_sample.bson b/_assets/objs/AZ_2020_p_sample.bson new file mode 100644 index 0000000..562acb0 Binary files /dev/null and b/_assets/objs/AZ_2020_p_sample.bson differ diff --git a/_assets/objs/apr_[Polls.bson b/_assets/objs/apr_[Polls.bson deleted file mode 100644 index a8724f5..0000000 Binary files a/_assets/objs/apr_[Polls.bson and /dev/null differ diff --git a/_assets/objs/election_priors.csv b/_assets/objs/election_priors.csv new file mode 100644 index 0000000..6378d59 --- /dev/null +++ b/_assets/objs/election_priors.csv @@ -0,0 +1,8 @@ +st,num_wins,num_votes +AZ,1672143,3333829 +GA,2473633,4935487 +MI,2804040,5453892 +NV,703486,1373376 +NC,2684292,5443067 +PA,3458229,6835903 +WI,1630866,3241050 diff --git a/_assets/scripts/bayes_head.jl b/_assets/scripts/bayes_head.jl index ab80c51..9ea4004 100644 --- a/_assets/scripts/bayes_head.jl +++ b/_assets/scripts/bayes_head.jl @@ -67,3 +67,14 @@ prior_probs = Dict( WI => 1630866 / (1630866 + 1610184), NV => 703486 / ( 703486 + 669890) ) + + +prior_probs = Dict( + AZ => 1672143 / (1672143 + 1661686), + GA => 2473633 / (2473633 + 2461854), + MI => 2804040 / (2804040 + 2649852), + NC => 2684292 / (2684292 + 2758775), + PA => 3458229 / (3458229 + 3377674), + WI => 1630866 / (1630866 + 1610184), + NV => 703486 / ( 703486 + 669890) +) \ No newline at end of file diff --git a/_assets/scripts/commons.jl b/_assets/scripts/commons.jl new file mode 100644 index 0000000..406d8e6 --- /dev/null +++ b/_assets/scripts/commons.jl @@ -0,0 +1,226 @@ +using BSON: @load, @save +using Colors +using Combinatorics +using CSV +using DataFrames +using Format +using HTTP +using GLMakie +using KernelDensity +using LinearAlgebra +using MCMCChains +using Missings +using PlotlyJS +using Plots +using PrettyTables +using Printf +using Serialization +using Statistics +using StatsPlots +using Turing +#------------------------------------------------------------------ +@enum Month mar apr may jun jul aug sep oct nov +@enum State PA GA NC MI AZ WI NV +STATE = State +@enum Pollster begin + bi2 + bi3 + bl2 + bl3 + cb2 + cb3 + cn2 + cn3 + ec2 + ec3 + fm2 + fm3 + fo2 + fo3 + hi2 + hi3 + ma2 + ma3 + mi2 + mi3 + mr2 + mr3 + qi2 + qi3 + sp2 + sp3 + su2 + su3 + wa2 + wa3 + ws2 + ws3l + ws3s +end +#------------------------------------------------------------------ +const states = ["NV", "WI", "AZ", "GA", "MI", "PA", "NC"] +const FLAGRED = "rgb(178, 34, 52)" +const FLAGBLUE = "rgb(60, 59, 110)" +const PURPLE = "rgb(119, 47, 81)" +const GREENBAR = "rgb(47, 119, 78)" +#------------------------------------------------------------------ +mutable struct MetaFrame + meta::Dict{Symbol, Any} + data::DataFrame +end +#------------------------------------------------------------------ +struct Poll + biden_support::Float64 + trump_support::Float64 + sample_size::Int +end +#------------------------------------------------------------------ +""" + filter_empty_entries(dict::Dict{Pollster, Vector{Poll}}) -> Dict{Pollster, Vector{Poll}} + +Filter out entries in a dictionary where the values are empty vectors. + +# Arguments +- `dict::Dict{Pollster, Vector{Poll}}`: A dictionary where the keys are of type `Pollster` and the values are vectors of type `Poll`. + +# Returns +- `Dict{Pollster, Vector{Poll}}`: A new dictionary containing only the entries from the input dictionary where the vectors are not empty. + +# Description +The `filter_empty_entries` function iterates over each key-value pair in the provided dictionary. It constructs a new dictionary that includes only those entries where the value (a vector of `Poll` objects) is not empty. + +# Example +```julia +# Define types for the example +struct Pollster + name::String +end + +struct Poll + question::String + response::String +end + +# Create a dictionary with some empty and non-empty vectors +pollster1 = Pollster("Pollster A") +pollster2 = Pollster("Pollster B") +poll1 = Poll("Question 1", "Response 1") +poll2 = Poll("Question 2", "Response 2") + +dict = Dict( + pollster1 => [poll1, poll2], + pollster2 => [] +) + +# Filter out entries with empty vectors +filtered_dict = filter_empty_entries(dict) +println(filtered_dict) +# Output: +# Dict{Pollster, Vector{Poll}} with 1 entry: +# Pollster("Pollster A") => Poll[Poll("Question 1", "Response 1"), Poll("Question 2", "Response 2")] +""" +#------------------------------------------------------------------ +""" + metahelp() + +shows MetaFrame structure and give example + +mutable struct MetaFrame + meta::Dict{Symbol, Any} + data::DataFrame +end + +# Example usage +df = DataFrame(name=["John", "Jane"], age=[28, 34]) +meta_info = Dict(:source => "Survey Data", :year => 2021) + +df = MetaFrame(meta_info, df) + +meta_info = Dict( + :source => "Census Bureau, Current Population Survey, November 2022", + :title => "Table 4c. Reported Voting and Registration of the Total Voting-Age Population, by Age, for States: November 2022", + :url => "https://www.census.gov/data/tables/time-series/demo/voting-and-registration/p20-586.html", + :title => "Table 4c. Reported Voting and Registration of the Total Voting-Age Population, by Age, for States: November 2022") +""" +function metahelp() + println("Display with ?metahelp") +end +#------------------------------------------------------------------ +""" + radix(df::DataFrame) + +Format numerals in a DataFrame as strings with thousands separators. + +# Arguments +- `df::DataFrame`: The DataFrame whose integer columns will be formatted. + +# Description +The `radix` function iterates over each column in the provided DataFrame. If a column's element type is `Int64`, it formats the numerals in that column as strings with thousands separators (commas). + +# Example +```julia +using DataFrames + +df = DataFrame(A = [1000, 2000, 3000], B = ["text1", "text2", "text3"], C = [4000, 5000, 6000]) +radix(df) +println(df) +# Output: +# 3×3 DataFrame +# Row │ A B C +# │ String String String +# ─────┼────────────────────────── +# 1 │ 1,000 text1 4,000 +# 2 │ 2,000 text2 5,000 +# 3 │ 3,000 text3 6,000 +""" +function radix(df::DataFrame) + for col in names(df) + if eltype(df[!, col]) <: Integer + df[:, col] = format.(df[:, col], commas=true) + end + end +end +#------------------------------------------------------------------ +""" + format_table(summary_df::DataFrame) + +Format floating-point numbers in a DataFrame to four decimal places and display the table using PrettyTables.jl. + +# Arguments +- `summary_df::DataFrame`: The DataFrame containing the data to be formatted and displayed. + +# Description +The `format_table` function formats all floating-point numbers in the provided DataFrame to four decimal places. It uses the PrettyTables.jl package to display the formatted table in HTML format. The function applies a formatter to each cell in the DataFrame, checking if the value is of type `Float64` and formatting it accordingly. Non-floating-point values are left unchanged. + +# Example +```julia +using DataFrames, PrettyTables, Printf + +# Create a sample DataFrame +summary_df = DataFrame(A = [1.123456, 2.234567, 3.345678], B = [4.456789, 5.567890, 6.678901]) + +# Define the format_table function +function format_table(summary_df::DataFrame) + formatter = (v, i, j) -> isa(v, Float64) ? @sprintf("%.4f", v) : v + # Apply the formatter to all columns of the table + pretty_table(summary_df, + backend = Val(:html), + header = names(summary_df), + formatters = formatter, + standalone = false) +end + +# Call the function to format and display the table +format_table(summary_df) +# Formats floating-point numbers to four decimal places in table +""" +function format_table(summary_df::DataFrame) + formatter = (v, i, j) -> isa(v, Float64) ? @sprintf("%.4f", v) : v + # Apply the formatter to all columns of the table + PrettyTables::pretty_table(summary_df, + backend = Val(:html), + header = names(summary_df), + formatters = formatter, + standalone = false) +end +#------------------------------------------------------------------ \ No newline at end of file diff --git a/_assets/scripts/count_bayes.jl b/_assets/scripts/count_bayes.jl new file mode 100644 index 0000000..2aec956 --- /dev/null +++ b/_assets/scripts/count_bayes.jl @@ -0,0 +1,66 @@ +using BSON: @load +using CSV +using DataFrames +include("election_priors.jl") +@enum Month mar apr may jun jul aug sep oct nov +@enum State PA GA NC MI AZ WI NV +@enum Pollster begin + bi2 + bi3 + bl2 + bl3 + cb2 + cb3 + cn2 + cn3 + ec2 + ec3 + fm2 + fm3 + fo2 + fo3 + hi2 + hi3 + ma2 + ma3 + mi2 + mi3 + mr2 + mr3 + qi2 + qi3 + sp2 + sp3 + su2 + su3 + wa2 + wa3 + ws2 + ws3l + ws3s +end + +struct Poll + biden_support::Float64 + trump_support::Float64 + sample_size::Int +end + +@load "../objs/apr_polls.bson" months + + +function process_polls(polls::Vector{Poll}) + result = Int64.(collect(collect([(p.biden_support, p.sample_size) for p in polls])[1])) + return [Int64(floor(result[1] / 100 * result[2])), result[2]] +end + + +processed_polls = Dict(state => Dict(pollster => process_polls(polls) for (pollster, polls) in pollsters) for (state, pollsters) in march) + + +processed_polls_totals = Dict(state => Dict( + "num_wins" => sum(first(values(polls)) for polls in values(pollsters)), + "num_votes" => sum(last(values(polls)) for polls in values(pollsters)) +) for (state, pollsters) in processed_polls) + + diff --git a/_assets/scripts/election_priors.jl b/_assets/scripts/election_priors.jl new file mode 100644 index 0000000..c064f2a --- /dev/null +++ b/_assets/scripts/election_priors.jl @@ -0,0 +1,11 @@ +using CSV +using DataFrames +const STATES = ["NV", "WI", "AZ", "GA", "MI", "PA", "NC"] + +votes = CSV.read("../objs/2020vote.csv", DataFrame) +election_priors = filter(row -> row.st in STATES, votes) +election_priors = election_priors[:,[1,2,3]] +election_priors.tot = election_priors.biden_pop .+ election_priors.trump_pop +election_priors = election_priors[:,[:st,:biden_pop,:tot]] +rename!(election_priors,["st","num_wins","num_votes"]) +# CSV.write("../objs/election_priors.jl",election_priors) \ No newline at end of file diff --git a/_assets/scripts/first_posterior.jl b/_assets/scripts/first_posterior.jl new file mode 100644 index 0000000..262c5bd --- /dev/null +++ b/_assets/scripts/first_posterior.jl @@ -0,0 +1,36 @@ +include("first_posterior_forepart.jl") + +ST = last_election[1, "st"] +num_wins = last_election[1, "num_wins"] +num_votes = last_election[1, "num_votes"] + +chain = sample(election_model(num_votes, num_wins), sampler, num_samples, init_params=init_params) + +p_intv = quantile(chain[:p], [0.025, 0.975]) +p_mean = summarystats(chain)[1, :mean] +p_mcse = summarystats(chain)[1, :mcse] +p_rhat = summarystats(chain)[1, :rhat] +p_df = DataFrame(median = median(chain[:p]), + mean = mean(chain[:p]), + mode = mode(chain[:p]), + q025 = p_intv[1], + q975 = p_intv[2], + mcse = summarystats(chain)[1, :mcse], + rhat = summarystats(chain)[1, :rhat]) + +# Extract the :p parameter from the chain object +p_samples = chain[:p] + +# Flatten the p_samples array into a 1D vector +p_vec = vec(p_samples) + +# Compute the density estimate +kde_result = kde(p_vec) + +include("first_posterior_aftpart.jl") +# Display posterior density plot +draw_density() +deep = deepcopy(chain) +@save ("../objs/$ST" * "_2020_p_sample.bson") deep +save(("../img/models/$ST" * "_2020.png"), fig) + diff --git a/_assets/scripts/first_posterior_aftpart.jl b/_assets/scripts/first_posterior_aftpart.jl new file mode 100644 index 0000000..28ad8d6 --- /dev/null +++ b/_assets/scripts/first_posterior_aftpart.jl @@ -0,0 +1,60 @@ +""" + draw_density() + +Draw a density plot of the parameter `p` with a shaded credible interval. + +# Returns +- `fig`: A `Figure` object containing the density plot. + +# Description +This function creates a density plot of the parameter `p` using the `kde_result` and `p_vec` variables, +which are assumed to be available in the current scope. The plot includes a shaded region representing +the credible interval specified by `posterior_interval`. + +The density curve is plotted in green (`#a3b35c`) with a stroke width of 8. The shaded credible interval +is filled in orange (`#e1aa6e`). + +The x-axis label is set to "p", the y-axis label is set to "Density", and the plot title is set to +"Density Plot of p for $ST", where `$ST` is a string variable assumed to be available in the current scope. + +The x-axis limits are set to the extrema of `p_vec`, and the y-axis limits are set to start from 0. + +A legend is added to the plot using `axislegend(ax)`. + +# Example +```julia +# Assume kde_result, p_vec, posterior_interval, and ST are available in the current scope +fig = draw_density() +""" +function draw_density() + # Create a new figure with specified resolution + fig = Figure(size = (600, 400)) + + # Add an axis to the figure + ax = Axis(fig[1, 1], xlabel = "p", ylabel = "Density", title = "Density Plot of p for $ST") + + # Plot the full density curve + #lines!(ax, kde_result.x, kde_result.density, color = :blue, strokewidth = 2, strokecolor = :black, label = "Density") + lines!(ax, kde_result.x, kde_result.density, color = "#a3b35c", strokewidth = 8, strokecolor = "#a3b35c", label = "Density") + + # Find the indices corresponding to the posterior interval + indices = findall((posterior_interval[1] .<= kde_result.x) .& (kde_result.x .<= posterior_interval[2])) + + # Extract the x and y values within the posterior interval + x_region = kde_result.x[indices] + y_region = kde_result.density[indices] + + # Fill the specific area under the curve + #band!(ax, x_region, fill(0, length(x_region)), y_region, color = (:red, 0.2), label = "Credible Interval") + band!(ax, x_region, fill(0, length(x_region)), y_region, color = ("#e1aa6e"), label = "Credible Interval") + # Add a legend to the plot + axislegend(ax) + + # Adjust the plot limits to fit the density line + Makie.xlims!(ax, extrema(p_vec)) + Makie.ylims!(ax, 0, nothing) + + # Display the figure + fig +end +#------------------------------------------------------------------ diff --git a/_assets/scripts/first_posterior_forepart.jl b/_assets/scripts/first_posterior_forepart.jl new file mode 100644 index 0000000..8ef19a4 --- /dev/null +++ b/_assets/scripts/first_posterior_forepart.jl @@ -0,0 +1,31 @@ +using BSON: @load, @save +using CSV +using DataFrames +using Format +using GLMakie +using KernelDensity +using LinearAlgebra +using MCMCChains +using Missings +using PrettyTables +using Printf +using Serialization +using Statistics +using Turing +#------------ + +# Set the number of votes and wins +last_election = CSV.read("../objs/election_priors.csv", DataFrame) + +# Define the model +include("models.jl") + +# Set up the sampler +sampler = NUTS(0.65) + +# Specify the number of samples and chains +num_samples = 10000 +num_chains = 4 + +# Sample from the posterior +init_params = [Dict(:p => 0.5)] \ No newline at end of file diff --git a/_assets/scripts/func.jl b/_assets/scripts/func.jl index cf8c969..e5eaaef 100644 --- a/_assets/scripts/func.jl +++ b/_assets/scripts/func.jl @@ -12,22 +12,23 @@ Find and return rows from the `outcomes` DataFrame where the `combo` column does # Examples ```julia without_states(["PA", "NC"]) +header = ["Scenario", "Electoral Votes", "Biden Total", "Trump Total", "Result"] +pretty_table(without_states["PA"]; backend = Val(:html), header = header, standalone = false) """ function without_states(lost::Vector{String}) filter(row -> all(!occursin(state, row.combo) for state in lost) && row.result == "Biden", outcomes) end +#------------------------------------------------------------------ +""" + metahelp() -header = ["Scenario", "Electoral Votes", "Biden Total", "Trump Total", "Result"] -pretty_table(without_states["PA"]; backend = Val(:html), header = header, standalone = false) - -using DataFrames +shows MetaFrame structure and give example mutable struct MetaFrame meta::Dict{Symbol, Any} data::DataFrame end -""" # Example usage df = DataFrame(name=["John", "Jane"], age=[28, 34]) meta_info = Dict(:source => "Survey Data", :year => 2021) @@ -40,6 +41,10 @@ meta_info = Dict( :url => "https://www.census.gov/data/tables/time-series/demo/voting-and-registration/p20-586.html", :title => "Table 4c. Reported Voting and Registration of the Total Voting-Age Population, by Age, for States: November 2022") """ +function metahelp() + println("Display with ?metahelp") +end +#------------------------------------------------------------------ function radix(df::DataFrame) for col in names(df) @@ -49,5 +54,3 @@ function radix(df::DataFrame) end end -# Display the formatted DataFrame -display(kids) \ No newline at end of file diff --git a/_assets/scripts/metrics.jl b/_assets/scripts/metrics.jl new file mode 100644 index 0000000..3fb1a08 --- /dev/null +++ b/_assets/scripts/metrics.jl @@ -0,0 +1,64 @@ +""" +Yes, there are several additional metrics you can consider when assessing the election outcome based on your Bayesian model: + +1. Posterior mean or median: The mean or median of the posterior distribution can provide a point estimate of the proportion of voters supporting each candidate. This can give you a single "best guess" of the election outcome based on your model. + +2. Probability of winning: You can calculate the probability that a candidate will win the election by computing the proportion of posterior samples where their vote share exceeds 50%. This gives you a direct measure of the likelihood of each candidate winning based on your model. + +3. Posterior probability of a tie: You can also compute the probability of a tie by calculating the proportion of posterior samples where the vote shares are exactly equal (or within a small margin to account for rounding). + +4. Marginal likelihood or Bayes factor: If you want to compare different models (e.g., with different priors or likelihood functions), you can compute the marginal likelihood or Bayes factor. These metrics provide a way to assess the relative fit of different models to the data while accounting for model complexity. + +5. Sensitivity analysis: You can explore how sensitive your results are to the choice of prior or other model assumptions. This can help you assess the robustness of your conclusions and identify which assumptions are most critical. + +6. Predictive accuracy: If you have historical data from previous elections, you can assess the predictive accuracy of your model by comparing its predictions to the actual outcomes. This can help you calibrate your model and understand how well it performs in practice. + +These additional metrics can provide a more comprehensive understanding of the election outcome and the uncertainty associated with your predictions. They can also help you communicate your results to a wider audience and make more informed decisions based on your analysis. + +Certainly! In Julia, you can use the DynamicLinearModels.jl package for Kalman filtering and state-space modeling. This package provides a flexible and efficient framework for working with dynamic linear models, including the Kalman filter. + +To install the DynamicLinearModels.jl package, you can use the following command in the Julia REPL: + +```julia +using Pkg +Pkg.add("DynamicLinearModels") +``` + +Once the package is installed, you can use it to implement Kalman filtering for your poll aggregation. Here's a basic example of how you can use DynamicLinearModels.jl to create a Kalman filter for tracking the true state of the race over time: + +```julia +using DynamicLinearModels + +# Define the state-space model +model = LocalLevel(1.0) + +# Initialize the Kalman filter +kf = kalman_filter(model, 1.0) + +# Iterate over the polls +for poll in polls + # Extract the poll date, sample size, and candidate support + date = poll.date + sample_size = poll.sample_size + candidate_support = poll.candidate_support + + # Update the Kalman filter with the new poll + update_kalman_filter!(kf, date, candidate_support, sample_size) +end + +# Get the filtered estimates of the true state +filtered_state = get_filtered_state(kf) +``` + +In this example, we define a `LocalLevel` model, which assumes that the true state of the race follows a random walk over time. We then initialize a Kalman filter with an initial estimate of the state. + +For each poll, we extract the relevant information (date, sample size, and candidate support) and update the Kalman filter using the `update_kalman_filter!` function. The function takes the Kalman filter object, the poll date, the candidate support (as a proportion), and the sample size. + +After processing all the polls, we can retrieve the filtered estimates of the true state using the `get_filtered_state` function. + +DynamicLinearModels.jl provides a wide range of options for customizing the state-space model, handling missing data, and estimating model parameters. You can refer to the package documentation for more details and examples: https://github.com/LAMPSPUC/DynamicLinearModels.jl + +Keep in mind that Kalman filtering assumes that the polls are noisy observations of the true state, and it tries to estimate the true state by balancing the information from the polls with the assumed dynamics of the state. The performance of the Kalman filter will depend on the quality of the polls, the appropriateness of the state-space model, and the choice of model parameters. + +As with any model, it's important to validate the Kalman filter's assumptions and performance using historical data or out-of-sample predictions before relying on it for real-time poll aggregation. +""" \ No newline at end of file diff --git a/_assets/scripts/models.jl b/_assets/scripts/models.jl new file mode 100644 index 0000000..60287a5 --- /dev/null +++ b/_assets/scripts/models.jl @@ -0,0 +1,18 @@ + +@model function election_model(num_votes, num_wins) + # Prior: Beta(2, 2) equivalent to a close race going in + p ~ Beta(2, 2) + + # Likelihood: Binomial(num_votes, p) + num_wins ~ Binomial(num_votes, p) +end + +# election_posterior = sample(election_model(num_votes, num_wins), sampler, num_samples, init_params=init_params) + +@model function poll_model(num_votes, num_wins, prior_dist) + # Define the prior using the posterior from the previous analysis + p ~ prior_dist + + # Define the likelihood + num_wins ~ Binomial(num_votes, p) +end \ No newline at end of file diff --git a/_assets/scripts/outcomes.jl b/_assets/scripts/outcomes.jl index c198502..dbc839c 100644 --- a/_assets/scripts/outcomes.jl +++ b/_assets/scripts/outcomes.jl @@ -14,7 +14,7 @@ include("constants.jl") include("utils.jl") # see college_table.jl for production of 2024vote.csv -base = CSV.read("../objs/2020vote.csv", DataFrame) +# base = CSV.read("../objs/2020vote.csv", DataFrame) base = CSV.read("/Users/ro/projects/swingwatch/_assets/objs/2024vote.csv", DataFrame) # see CreateOutcomes.jl for production of outcome.csv diff --git a/_assets/scripts/state_level.jl b/_assets/scripts/state_level.jl index 869ff35..7b182c9 100644 --- a/_assets/scripts/state_level.jl +++ b/_assets/scripts/state_level.jl @@ -19,7 +19,8 @@ prior_probs[ST] summarystats(current_samples) # Trace plot -plot(current_samples) +plot( +) # Autocorrelation autocor(current_samples) diff --git a/_assets/scripts/utils.jl b/_assets/scripts/utils.jl index c3fa6b1..ec20858 100644 --- a/_assets/scripts/utils.jl +++ b/_assets/scripts/utils.jl @@ -14,3 +14,52 @@ function lx_baz(com, _) # do whatever you want here return uppercase(brace_content) end + +function hfun_custom_taglist()::String + # ----------------------------------------- + # Part1: Retrieve all pages associated with + # the tag & sort them + # ----------------------------------------- + # retrieve the tag string + tag = locvar(:fd_tag) + # recover the relative paths to all pages that have that + # tag, these are paths like /blog/page1 + rpaths = globvar("fd_tag_pages")[tag] + # you might want to sort these pages by chronological order + # you could also only show the most recent 5 etc... + sorter(p) = begin + # retrieve the "date" field of the page if defined, otherwise + # use the date of creation of the file + pvd = pagevar(p, :date) + if isnothing(pvd) + return Date(Dates.unix2datetime(stat(p * ".md").ctime)) + end + return pvd + end + sort!(rpaths, by=sorter, rev=true) + + # -------------------------------- + # Part2: Write the HTML to plug in + # -------------------------------- + # instantiate a buffer in which we will write the HTML + # to plug in the tag page + c = IOBuffer() + write(c, "...1...") + # go over all paths + for rpath in rpaths + # recover the url corresponding to the rpath + url = get_url(rpath) + # recover the title of the page if there is one defined, + # if there isn't, fallback on the path to the page + title = pagevar(rpath, "title") + if isnothing(title) + title = "/$rpath/" + end + # write some appropriate HTML + write(c, "...2...") + end + # finish the HTML + write(c, "...3...") + # return the HTML string + return String(take!(c)) +end diff --git a/_layout/tag.html b/_layout/tag.html index 4b423f2..b0ed97e 100644 --- a/_layout/tag.html +++ b/_layout/tag.html @@ -10,7 +10,7 @@ {{insert sidebar.html}}

Tag: {{fill fd_tag}}

- {{taglist}} + {{custom-taglist}} {{insert page_foot.html}}
diff --git a/my_util b/my_util index 8e290c4..deeb326 100644 --- a/my_util +++ b/my_util @@ -109,7 +109,8 @@ For each poll, we extract the relevant information (date, sample size, and candi After processing all the polls, we can retrieve the filtered estimates of the true state using the `get_filtered_state` function. -DynamicLinearModels.jl provides a wide range of options for customizing the state-space model, handling missing data, and estimating model parameters. You can refer to the package documentation for more details and examples: https://github.com/LAMPSPUC/DynamicLinearModels.jl +DynamicLinearModels.jl provides a wide range of options for customizing the state-space model, handling missing data, and estimating model parameters. You can refer to the package documentation for more details and examples: + Keep in mind that Kalman filtering assumes that the polls are noisy observations of the true state, and it tries to estimate the true state by balancing the information from the polls with the assumed dynamics of the state. The performance of the Kalman filter will depend on the quality of the polls, the appropriateness of the state-space model, and the choice of model parameters. diff --git a/naive_prior.jl b/naive_prior.jl new file mode 100644 index 0000000..9818bfc --- /dev/null +++ b/naive_prior.jl @@ -0,0 +1,55 @@ + +using MCMCChains +using StatsPlots +using Statistics +using Turing + +# Set the number of votes and wins for PA +num_votes = 6835903 +num_wins = 3458229 + +# Define the model +@model function election_model(num_votes, num_wins) + # Prior: Beta(2, 2) equivalent to a close race going in + p ~ Beta(2, 2) + + # Likelihood: Binomial(num_votes, p) + num_wins ~ Binomial(num_votes, p) +end + +# Set up the sampler +sampler = NUTS(0.65) + +# Specify the number of samples and chains +num_samples = 10000 +num_chains = 4 + +# Sample from the posterior +init_params = [Dict(:p => 0.5)] +chain = sample(election_model(num_votes, num_wins), sampler, MCMCThreads(), num_samples, num_chains) + + +# Plot the posterior density +plot(chain, xlabel="Probability of Winning", ylabel="Density", + title="Posterior Distribution of Winning Probability") + +# Compute posterior mean and 95% credible interval +# 67%, 89%, and 97% +posterior_mean = mean(chain[:p]) +posterior_interval = quantile(chain[:p], [0.025, 0.975]) +posterior_interval = quantile(chain[:p], [0.67, 0.89, 0.97]) + +println("Posterior mean: ", posterior_mean) +println("95% credible interval: ", posterior_interval) + +# Summary statistics +summarystats(chain) + +# Trace plot +plot(chain) + +# Autocorrelation +autocor(chain) + +# Effective sample size +ess(chain) \ No newline at end of file diff --git a/prior.md b/prior.md new file mode 100644 index 0000000..c04128b --- /dev/null +++ b/prior.md @@ -0,0 +1,25 @@ ++++ +title = "The Bayesian Prior" ++++ + +The model considered two starting points—one in which the 2020 election was assumed to have been a Biden win from 0% to 100% of the votes in each swing state as being equally likely. This is known as a naive, or uninformative, prior. It is unreasonable to assume that it was to be expected that one candidate would take all votes. The other proceeded from the pre-election observation that the race in the swing states were largely within survey margin of error. Therefore, it was reasonable to expect that the results were more likely to have clustered around a 0.5 proportion of votes for Biden. + +Hypothetically, many contingencies could have affected voter turnout—voters who had intended to vote decided not to at the last minute due to conflicting committments. Other voters who had not intended to votes decided late to show up. This is one reason for adopting the 2020 results with some statistical noise as the starting point. + +The main difference between using `Binomial(1, 1)` and `Binomial(2, 2)` as priors in a Bayesian analysis lies in the shape and concentration of the prior distribution. Let's discuss each case separately: + +1. `Binomial(1, 1)` prior: + - This prior is equivalent to a `Beta(1, 1)` distribution, which is a uniform distribution over the interval [0, 1]. + - It assigns equal probability to all possible values of the parameter between 0 and 1. + - The `Binomial(1, 1)` prior is often referred to as a "flat" or "uninformative" prior because it does not express any strong prior belief or preference for any particular value of the parameter. + - This prior is commonly used when there is little or no prior knowledge about the parameter, and all values are considered equally likely. + - The posterior distribution obtained using this prior will be heavily influenced by the observed data. + +2. `Binomial(2, 2)` prior: + - This prior is equivalent to a `Beta(2, 2)` distribution, which has a symmetric bell-shaped curve with a peak at 0.5. + - It assigns higher probability to values around 0.5 and lower probability to values near 0 and 1. + - The `Binomial(2, 2)` prior expresses a prior belief that the parameter is more likely to be close to 0.5 compared to extreme values. + - This prior is more informative than the `Binomial(1, 1)` prior because it incorporates some prior knowledge or expectation about the parameter. + - The posterior distribution obtained using this prior will be a compromise between the prior belief and the observed data. + +In summary, the choice between `Binomial(1, 1)` and `Binomial(2, 2)` as priors depends on the prior knowledge or belief about the parameter being estimated. If there is no strong prior information, the `Binomial(1, 1)` prior can be used as a non-informative prior, allowing the data to dominate the posterior distribution. On the other hand, if there is some prior expectation that the parameter is more likely to be around 0.5, the `Binomial(2, 2)` prior can be used to incorporate that prior belief into the analysis. diff --git a/typology.md b/typology.md index c045758..2115421 100644 --- a/typology.md +++ b/typology.md @@ -8,6 +8,7 @@ Voters who did not vote in 2020 are not reflected in the results of the 2020 ele ## Did not vote in 2020 ### Will not vote in 2024 + * Ineligible + died + moved away @@ -15,7 +16,9 @@ Voters who did not vote in 2020 are not reflected in the results of the 2020 ele + still too young + still not a naturalized citizen * Still not politically engaged + ### Will vote in 2024 + * Became eligible - moved in - voting rights restored @@ -26,13 +29,17 @@ Voters who did not vote in 2020 are not reflected in the results of the 2020 ele ## Did vote in 2020 ### Will not vote in 2024 + These voters are captured in the priors part of the model but are not reflected, for the most part, in the polling. + * Became ineligible - - died - - moved away - - incarcerated + - died + - moved way + - incarcerated * Became politically unengaged + ### Will vote in 2024 + These voters may be picked up by polling. * Still politically engaged and will flip