Overview
This forecast is derived from an open-source Bayesian model built and maintained by @thisismactan. The model is implemented in Stan, a probabilistic programming language for MCMC simulation, and orchestrated by R scripts that handle data ingestion, cleaning, and simulation synthesis.
The core thesis is that election outcomes are driven by two partially correlated signals: structural fundamentals (the partisan baseline of a district or state, adjusted for macroeconomic and political environment) and public polling (direct measurement of voter intent). Neither signal is perfect. The model quantifies the uncertainty in each and weights them accordingly.
process_data.R ingests district-level results back to 1976, calculates partisan baselines, and estimates each seat's sensitivity to the national environment (elasticity).process_polls.R ingests daily polling from scraped NYT aggregates, applies quality weights, likely-voter adjustments, and time-decay functions.house_sim.R / senate_sim.R aggregate N simulations per seat into the final posterior CSV, which this dashboard reads and visualizes.The r2p Metric
All forecasts are expressed as Republican Two-Party Vote Share (r2p). Third-party votes are excluded, collapsing every race into a zero-sum probability space on the interval [0.0, 1.0].
Win condition: r2p > 0.5 → Republican wins the seat
Win condition: r2p < 0.5 → Democrat wins the seat
Win probability in the final output is calculated as the fraction of MCMC simulation draws where r2p exceeded 0.50. This is a direct Monte Carlo integration over the posterior predictive distribution — no normal approximation is applied to the tails.
Where N ≈ several thousand posterior draws per seat
Fundamentals Prior
The fundamentals prior is the model's best guess for a district's outcome before any polling is observed. It is constructed from several structural components:
Partisan Baseline
Each seat has a historical partisan lean — how much more Republican or Democratic it ran relative to the national average, averaged across multiple election cycles to reduce cycle-specific noise.
National Environment
The generic congressional ballot (the gap between voters who prefer a generic Democrat vs. a generic Republican) provides an estimate of the national partisan tide. This is estimated from available polling at any given forecast date.
Midterm Penalty
Historically, the party holding the White House loses seats in midterm elections. The 2026 elections are a midterm, and the model applies an incumbency penalty that scales with historical midterm patterns.
Adjustment Variables
| Variable | Effect | Direction |
|---|---|---|
| Incumbency advantage | +3–5 points r2p toward incumbent party | Positive for incumbent |
| Open seat | Regression toward national environment | Neutral |
| Post-redistricting | Wider uncertainty interval on baseline | Increases variance |
| Midterm penalty | −2 to −4 pts for WH party | Negative for president's party |
Polling Model
Polls are not treated as ground truth. The model weights each poll by several quality-correction factors before computing a polling estimate with an associated variance.
Poll Weighting Hierarchy
| Factor | High-quality signal | Low-quality signal |
|---|---|---|
| Methodology | Probability panel | Opt-in/online panel |
| Population | Likely Voters (LV) | Adults (A) |
| Recency | Within 2 weeks | >60 days old |
| Sample size | N > 800 | N < 300 |
| Pollster track record | Low historical bias | Significant house effect |
Time Decay
Polls are subject to exponential time decay. A poll conducted 90 days before election day has substantially less weight than an identical poll from one week prior. This is modeled as:
λ is a decay constant estimated from historical predictive accuracy
House Effects (Pollster Bias)
Certain polling firms systematically over- or under-estimate Republican support. The model estimates a house effect for each pollster based on their historical deviation from final outcomes, and applies this as a correction to their reported numbers.
Synthesis: Inverse-Variance Weighting
The fundamental innovation of the model is how it combines the fundamentals prior with the polling estimate. Rather than a simple average, it uses inverse-variance weighting (IVW) — a Bayesian-coherent method that gives more weight to whichever signal has lower uncertainty.
var_pred = 1 / (1/var_fund + 1/var_poll)
If var_poll → ∞ (no polls exist): pred → fund_pred
If var_poll → 0 (many perfect polls): pred → poll_pred
This means the model gracefully handles the full spectrum from well-polled competitive races (where polls dominate) to deeply red or blue seats with no polling (where fundamentals dominate). You will notice that unpolled safe seats have very narrow confidence intervals — because the fundamentals are certain — while toss-up races with conflicting polls have very wide intervals.
MCMC Simulation
The final prediction is not a point estimate but a full probability distribution. Stan samples thousands of values from the posterior predictive distribution for each seat, capturing correlations across seats (e.g., a strong Democratic wave affects all seats simultaneously).
From these draws, this dashboard calculates:
r2p_avg = mean(draw) // Expected r2p
r2p_p05 = quantile(draw, 0.05) // 5th percentile
r2p_p95 = quantile(draw, 0.95) // 95th percentile
The massive raw posterior file (house_district_posterior.csv, ~124MB) is processed daily by the update_data.py worker script, which compresses thousands of rows per district into these five summary statistics and writes a lightweight JSON file for the dashboard to serve.
Data Sources
| Source | Used For | Update Frequency |
|---|---|---|
| MIT Election Lab (1976–2024) | Historical district-level results and baselines | Static |
| Daily Kos Elections | Presidential vote margins by district | Static |
| NYT Poll Aggregator (scraped) | Generic ballot + district/state polls | Daily |
| Ballotpedia | Incumbency and candidate filings | Periodic |
| GitHub (thisismactan/US-2026) | Final simulation outputs served by this dashboard | Daily |
Limitations & Caveats
All forecasts are probabilistic estimates, not predictions. A race with a 90% probability is not a certainty — it means the model would expect the favored candidate to win 9 out of 10 times under similar conditions. The remaining 1-in-10 scenario is entirely plausible.
Key limitations include: the model cannot anticipate late-breaking news (scandal, candidate withdrawal, economic shocks); polling in low-salience House races is sparse and sometimes of poor quality; district-level structural changes from the 2020 redistricting cycle introduce additional uncertainty for some seats; and the model does not account for third-party candidates who could tip results in close races.
This dashboard is a visualization layer on top of publicly available research. The underlying model code is open-source and available at github.com/thisismactan/US-2026.