Skip to content
Back to blog
2026-04-096 min read

Why AI Predictions Should Be Scored Publicly

Every AI tool makes implicit predictions about the future. Almost none of them track whether those predictions were right. That needs to change.

The Hidden Prediction Problem

Every intelligence tool, market newsletter, and AI-powered research platform makes predictions. Some are explicit: "The Fed will cut rates in June." Most are implicit: "Tensions are escalating" (implies things will get worse), "This sector is undervalued" (implies it will go up), "Supply chain disruptions are easing" (implies prices will stabilize).

These are all claims about the future. But almost no one tracks whether they were right.

Why This Matters

When you read an analysis that says "70% chance of escalation," you're making decisions based on that number. Maybe you're adjusting portfolio exposure. Maybe you're advising a client. Maybe you're briefing leadership on geopolitical risk.

But how do you know if the source saying "70%" actually means it? Have their 70% predictions historically come true about 70% of the time? Or do they cluster every uncertain event at 65-75% because it sounds confident without being committal?

You have no way to know. Because they don't track it. And they don't track it because tracking accuracy is hard, expensive, and — most importantly — risky. Publishing your track record means publishing your mistakes.

The Incentive Problem

The current incentive structure in intelligence analysis rewards boldness over accuracy.

Bold prediction + correct = "visionary analyst"

Bold prediction + wrong = quietly forgotten

Cautious prediction + correct = "obvious in hindsight"

Cautious prediction + wrong = quietly forgotten

Notice that being wrong has no cost. The analyst moves on to the next prediction, the reader has already forgotten the last one, and the cycle continues.

This is why most analysis converges on the same safe consensus takes. There's no reward for being precisely calibrated — only for being memorably bold or reassuringly consensus.

What Calibration Actually Looks Like

A well-calibrated forecaster isn't someone who's always right. That's impossible. A well-calibrated forecaster is someone whose probabilities match reality over time:

  • Their 30% predictions happen about 30% of the time
  • Their 70% predictions happen about 70% of the time
  • Their 90% predictions happen about 90% of the time

This sounds obvious, but it's extraordinarily rare. Research from Philip Tetlock's Good Judgment Project — the largest forecasting study ever conducted — found that most experts are poorly calibrated. They assign probabilities that don't match outcomes. They're overconfident in some domains and underconfident in others. And without systematic tracking, they never correct these biases.

How We Built an Accountability System

VORENTH's Research Desk produces autonomous forecasts on geopolitical, market, economic, and policy events. Every forecast is:

  1. Discrete and falsifiable — Not "tensions will rise" but "The EU will impose additional sanctions on Russian energy exports before September 2026"
  2. Time-bounded — Every forecast has a specific target date
  3. Probability-weighted — Not just yes/no but a calibrated probability reflecting genuine uncertainty
  4. Verifiable — Each forecast includes clear criteria that can be checked against real-world outcomes
  5. Publicly tracked — Every forecast, every resolution, every miss is published on our track record page

When a forecast reaches its target date, the system evaluates the outcome against real-world evidence and scores it using Brier scoring — the same method used by IARPA's forecasting tournaments and academic research.

The Compounding Effect

Here's what makes public accountability more than just transparency theater: it makes the system better over time.

Every resolved forecast becomes a data point. Did we assign the right probability? Were we overconfident in this category? Underconfident in that one? The system uses historical accuracy data to anchor future probabilities — automatically correcting for systematic biases.

If our predictions in a given category have historically been overconfident, future probabilities are adjusted accordingly. Not by a human deciding to "be more careful" — by a quantitative correction based on measured performance.

Over hundreds of resolved forecasts, this creates a forecasting system that genuinely improves. Not because the AI gets smarter, but because the feedback loop forces calibration toward reality.

Why No One Else Does This

Building a public accountability system is hard for three reasons:

It's technically complex. You need structured prediction extraction, automated resolution against real-world data, Brier scoring, calibration curves, base rate computation, and a public dashboard that updates in real time. Most platforms would rather ship a new feature than build scoring infrastructure.

It's commercially risky. Publishing your misses alongside your hits means some visitors will see incorrect predictions and leave. In the short term, this looks worse than showing no track record at all.

It requires intellectual honesty. You have to separate scored forecasts from vague directional signals. You have to resist the temptation to claim "we said this would happen" when your actual prediction was hedged. You have to show the ugly calibration data alongside the flattering hit rate.

We think the long-term value of a verified track record outweighs the short-term cost of transparency. In a world where every platform claims to be "AI-powered intelligence," the one that can prove its accuracy has a structural advantage that compounds over time.

What We're Asking the Industry

We don't think VORENTH should be the only platform that scores its predictions publicly. We think this should be table stakes.

If your platform makes claims about the future — explicit or implicit — you should track whether those claims were right. You should publish the results. You should use them to improve.

Until that happens, "AI intelligence" is just a more expensive way to read the news.

Get intelligence briefings delivered

Weekly analysis, prediction updates, and early access to new features.

Intelligence briefings and early access. No spam.