Overview
This dashboard visualizes Mac Tan's forecast for the 2026 United States congressional elections. Unlike simple poll averages or pundit ratings, the model produces a full joint probability distribution across all House and Senate seats — not just a prediction of the outcome in each individual race.
In short, it combines what we know about a district from recent history with what polling is telling us right now about the national and local political mood, weights each source according how certain its prediction is, and then runs thousands of simulated elections to generate win probabilities and credible intervals for the outcome in each race.
Every number on this site is the direct output of that simulation. When you see "R Win: 63.4%", that means Republicans won the seat in about 6,340 out of 10,000 simulations.
process_data.R takes in historical district- and state-level election results and processes them to make them digestible for modeling.process_polls.R ingests daily polling from The New York Times' poll tracking project, applies weights for poll recency and quality, and computes polling averages at the national, state, and district level.house_sim.R / senate_sim.R run the 2026 district- and state-level data through the posterior draws from the Stan models to obtain a posterior distribution for each House and Senate race. These individual results can then be aggregated up to produce forecasts for the entire House and Senate.The 2PV Metric
All forecasts are expressed as two-party vote share (2PV). Votes for third-party and independent candidates are excluded, collapsing every race into a zero-sum probability space on the interval [0.0, 1.0].
R2PV = R_votes / (R_votes + D_votes) Win condition: R2PV > 0.5 → Republican wins the seat Win condition: R2PV < 0.5 → Democrat wins the seat
Win probability in the final output is calculated as the fraction of posterior draws where a party's 2PV exceeded 0.50. This is a direct integration over the posterior distribution using Hamiltonian Monte Carlo.
P(R wins) = mean(R2PV > 0.5) for sim_id in 1...N Where N = 10,000 posterior draws per seat
National Swing Model
The national swing model produces a posterior distribution for each district, based on an estimate of its baseline partisanship and the national political environment. Several factors go into this model:
Partisan Baseline
Each district and state starts from a baseline 2PV result, which is the 2PV result in the previous regularly scheduled House or Senate election (excluding special elections). However, often the prior election result doesn't serve as a reasonable baseline: common reasons for this would be that the incumbent is retiring, the previous election wasn't contested, or (in the case of House elections) the district's boundaries have been redrawn since the last election. In these cases the model will rely much more heavily on the most recent presidential election result in the district or state to form a partisan baseline.
National Environment
The generic congressional ballot (the gap between voters who prefer a generic Democrat vs. a generic Republican) average provides an estimate of the national political environment and the extent to which it favors Democrats or Republicans. This is estimated from available polling at any given forecast date. When the generic congressional ballot swings toward one party relative to the previous election, that party will generally perform better in House and Senate races across the board, although polling and other district- and state-specific factors can strengthen, weaken, or overpower this effect.
Incumbency
Incumbents generally do better than generic candidates from their party, even if their advantages are weaker than ever. When there is no incumbent running for re-election, states and districts tend to revert to their presidential voting habits. A race is assumed to have an incumbent running for election until the incumbent has either suspended their campaign or lost renomination.
Midterm
District- and state-level election results tend to be more sensitive to swings in the national political environment during midterms than in presidential elections.
Redistricting
When a House district's boundaries are redrawn (as is happening in a lot of places these days), the previous election results are less useful as a partisan baseline. In districts that have been redrawn since the previous election, the model relies much more heavily on the most recent presidential election result instead of the previous House or Senate election result. The model will only consider new district boundaries once they have survived any legal challenges to them (so as of May 7, 2026 for example, House forecasts for California and Texas use the new district boundaries but House forecasts for Virginia and Florida use the old ones).
Polling Model
Polls go through adjustment and weighting before being averaged.
Partisan Adjustment
The poll partisanship adjustment is very basic: it adjusts polling results only if the New York Times poll tracker identifies the poll as a partisan poll. This is a hard criterion: it isn't enough for the pollster to generally produce results that lean toward one party; a poll is considered partisan only if it is conducted for a political party, explicitly partisan organization, or a candidate. Polls which are sponsored by one party have their results adjusted slightly away from that party and they are given much less weight in the average. For generic ballot polls, this is the only adjustment performed.
National Environment Adjustment
Polls are snapshots of the political mood at a point in time. A poll conducted six months ago might produce very different results if it were conducted today, even if it was conducted according to the same methodology on the same population using the same questions. To keep Senate and House district-level polling relevant, each state- and district-level poll's result is adjusted by the amount by which the generic ballot average has shifted since the poll was conducted. For example, if a Senate poll conducted six months ago showed a tie, but the generic ballot average has shifted six points more Democratic since then, the Senate poll would be adjusted to be six points more Democratic as well.
Poll Weighting Factors
| Factor | High weight | Low weight |
|---|---|---|
| Methodology | True random sampling, probability panes | Opt-in/non-probability panel |
| Pollster sponsorship | Independent poll conducted for a nonpartisan media organization or university | Poll conducted on behalf of a candidate or party |
| Population | Likely Voters (LV) | Registered Voters (RV) |
| Recency | Poll conducted in the past few days | >Polls older than 90 days old |
| Sample size | N > 1,000 | N < 300 |
| Time in field | Poll conducted over several days | Poll in the field for only one day |
MCMC Simulation
The final prediction is not a point estimate but a full probability distribution. Stan samples thousands of values from the posterior predictive distribution for each seat, capturing correlations across seats (e.g., a strong Democratic wave affects all seats simultaneously).
From these draws, this dashboard calculates:
r_prob = mean(draw > 0.5) // Win probability r2p_avg = mean(draw) // Expected r2p r2p_p05 = quantile(draw, 0.05) // 5th percentile r2p_p95 = quantile(draw, 0.95) // 95th percentile
The massive raw posterior file (house_district_posterior.csv, ~124MB) is processed daily by the update_data.py worker script, which compresses thousands of rows per district into these five summary statistics and writes a lightweight JSON file for the dashboard to serve.
Aggregation
For each House and Senate race, the model produces at least a posterior distribution based on national swing as well as (potentially) a posterior distribution based on race-specific polling. These are averaged using inverse-variance weights, which is to say:
pred = (national_swing_pred/national_swing_var + poll_pred/poll_var) / (1/national_swing_var + 1/poll_var) If var_poll → ∞ (no polls exist): pred → fund_pred If var_poll → 0 (many perfect polls): pred → poll_pred
These weights are known to minimize the variance of the final prediction and are also consistent with Bayesian updating of a normal prior with normally distributed data.
This means the model gracefully handles the full spectrum from well-polled competitive races (where polls dominate) to deeply red or blue seats with no polling (where the model prediction based on national swing dominates).
Data Sources
| Source | Used For | Update Frequency |
|---|---|---|
| MIT Election Lab (1976–2024) | Historical district-level results and baselines | Static |
| Daily Kos Elections | Presidential vote margins in historical congressional districts | Static |
| New York Times Poll Tracking Project | Generic ballot + district/state polls | Daily |
| Ballotpedia | Incumbency and candidate filings | Periodic |
| GitHub (thisismactan/US-2026) | Forecast outputs used by this dashboard | Daily |
Limitations & Caveats
All forecasts are probabilistic estimates, not predictions. A race with a 90% probability of one candidate winning is not a certainty — it means the model would expect the favored candidate to win 9 out of 10 times under similar conditions. Out of every 10 races where the leading candidate has a 90% probability of winning, it should not be a surprise when the underdog wins one.
Key limitations include the following:
- The model does not direct take into account things that happen in the news, like scandal or economic shocks. The forecast will reflect these things only to the extent that those news events are reflected in the polls.
- Polling in low-salience House races is sparse and often of poor quality: typically the only polls of a House race will be those conducted for one of the campaigns which the campaign decides to release. As you might imagine, this can lead to significant selection bias in what House polls we see.
- The model does not account for third-party and independent candidates who could tip results in close races. For the most part the impact of these candidates is limited, but in some races they may have a large impact or may even be running as the de facto Democratic nominee (see Nebraska's Senate election).
This dashboard is a visualization layer on top of publicly available research. The underlying model code is open-source and available at github.com/thisismactan/US-2026.