Model Documentation

Methodology

How the Bayesian MCMC model combines structural fundamentals with polling to generate probabilistic election forecasts.

Overview

This forecast is derived from an open-source Bayesian model built and maintained by @thisismactan. The model is implemented in Stan, a probabilistic programming language for MCMC simulation, and orchestrated by R scripts that handle data ingestion, cleaning, and simulation synthesis.

The core thesis is that election outcomes are driven by two partially correlated signals: structural fundamentals (the partisan baseline of a district or state, adjusted for macroeconomic and political environment) and public polling (direct measurement of voter intent). Neither signal is perfect. The model quantifies the uncertainty in each and weights them accordingly.

01
Historical ETL
process_data.R ingests district-level results back to 1976, calculates partisan baselines, and estimates each seat's sensitivity to the national environment (elasticity).
02
Polling ETL
process_polls.R ingests daily polling from scraped NYT aggregates, applies quality weights, likely-voter adjustments, and time-decay functions.
03
Stan MCMC
The Stan model samples from the posterior distribution of district/state outcomes, propagating uncertainty through both the fundamentals and polling layers.
04
Simulation Synthesis
house_sim.R / senate_sim.R aggregate N simulations per seat into the final posterior CSV, which this dashboard reads and visualizes.

The r2p Metric

All forecasts are expressed as Republican Two-Party Vote Share (r2p). Third-party votes are excluded, collapsing every race into a zero-sum probability space on the interval [0.0, 1.0].

r2p = R_votes / (R_votes + D_votes)

Win condition: r2p > 0.5 → Republican wins the seat
Win condition: r2p < 0.5 → Democrat wins the seat

Win probability in the final output is calculated as the fraction of MCMC simulation draws where r2p exceeded 0.50. This is a direct Monte Carlo integration over the posterior predictive distribution — no normal approximation is applied to the tails.

P(R wins) = mean(r2p_sim > 0.5) for sim_id in 1..N

Where N ≈ several thousand posterior draws per seat

Fundamentals Prior

The fundamentals prior is the model's best guess for a district's outcome before any polling is observed. It is constructed from several structural components:

Partisan Baseline

Each seat has a historical partisan lean — how much more Republican or Democratic it ran relative to the national average, averaged across multiple election cycles to reduce cycle-specific noise.

National Environment

The generic congressional ballot (the gap between voters who prefer a generic Democrat vs. a generic Republican) provides an estimate of the national partisan tide. This is estimated from available polling at any given forecast date.

Midterm Penalty

Historically, the party holding the White House loses seats in midterm elections. The 2026 elections are a midterm, and the model applies an incumbency penalty that scales with historical midterm patterns.

Adjustment Variables

VariableEffectDirection
Incumbency advantage+3–5 points r2p toward incumbent partyPositive for incumbent
Open seatRegression toward national environmentNeutral
Post-redistrictingWider uncertainty interval on baselineIncreases variance
Midterm penalty−2 to −4 pts for WH partyNegative for president's party

Polling Model

Polls are not treated as ground truth. The model weights each poll by several quality-correction factors before computing a polling estimate with an associated variance.

Poll Weighting Hierarchy

FactorHigh-quality signalLow-quality signal
MethodologyProbability panelOpt-in/online panel
PopulationLikely Voters (LV)Adults (A)
RecencyWithin 2 weeks>60 days old
Sample sizeN > 800N < 300
Pollster track recordLow historical biasSignificant house effect

Time Decay

Polls are subject to exponential time decay. A poll conducted 90 days before election day has substantially less weight than an identical poll from one week prior. This is modeled as:

w_time = exp(-λ · days_before_election)

λ is a decay constant estimated from historical predictive accuracy

House Effects (Pollster Bias)

Certain polling firms systematically over- or under-estimate Republican support. The model estimates a house effect for each pollster based on their historical deviation from final outcomes, and applies this as a correction to their reported numbers.

Synthesis: Inverse-Variance Weighting

The fundamental innovation of the model is how it combines the fundamentals prior with the polling estimate. Rather than a simple average, it uses inverse-variance weighting (IVW) — a Bayesian-coherent method that gives more weight to whichever signal has lower uncertainty.

pred = (fund_pred / var_fund + poll_pred / var_poll) / (1/var_fund + 1/var_poll)

var_pred = 1 / (1/var_fund + 1/var_poll)

If var_poll → ∞ (no polls exist): pred → fund_pred
If var_poll → 0 (many perfect polls): pred → poll_pred
ℹ️

This means the model gracefully handles the full spectrum from well-polled competitive races (where polls dominate) to deeply red or blue seats with no polling (where fundamentals dominate). You will notice that unpolled safe seats have very narrow confidence intervals — because the fundamentals are certain — while toss-up races with conflicting polls have very wide intervals.

MCMC Simulation

The final prediction is not a point estimate but a full probability distribution. Stan samples thousands of values from the posterior predictive distribution for each seat, capturing correlations across seats (e.g., a strong Democratic wave affects all seats simultaneously).

From these draws, this dashboard calculates:

r_prob = mean(draw > 0.5) // Win probability
r2p_avg = mean(draw) // Expected r2p
r2p_p05 = quantile(draw, 0.05) // 5th percentile
r2p_p95 = quantile(draw, 0.95) // 95th percentile

The massive raw posterior file (house_district_posterior.csv, ~124MB) is processed daily by the update_data.py worker script, which compresses thousands of rows per district into these five summary statistics and writes a lightweight JSON file for the dashboard to serve.

Data Sources

SourceUsed ForUpdate Frequency
MIT Election Lab (1976–2024)Historical district-level results and baselinesStatic
Daily Kos ElectionsPresidential vote margins by districtStatic
NYT Poll Aggregator (scraped)Generic ballot + district/state pollsDaily
BallotpediaIncumbency and candidate filingsPeriodic
GitHub (thisismactan/US-2026)Final simulation outputs served by this dashboardDaily

Limitations & Caveats

All forecasts are probabilistic estimates, not predictions. A race with a 90% probability is not a certainty — it means the model would expect the favored candidate to win 9 out of 10 times under similar conditions. The remaining 1-in-10 scenario is entirely plausible.

Key limitations include: the model cannot anticipate late-breaking news (scandal, candidate withdrawal, economic shocks); polling in low-salience House races is sparse and sometimes of poor quality; district-level structural changes from the 2020 redistricting cycle introduce additional uncertainty for some seats; and the model does not account for third-party candidates who could tip results in close races.

This dashboard is a visualization layer on top of publicly available research. The underlying model code is open-source and available at github.com/thisismactan/US-2026.