Somewhere in the past year, a statistic has reached your board. More than 80% of AI projects fail. Or 95% of pilots, if the deck cited MIT. Or 42% of initiatives, if it cited S&P Global. The figures usually arrive stripped of everything that gives a number meaning — who was asked, what counted as failure, over what period — and are then deployed to argue for urgency, for caution, or for budget, depending on who is presenting.
This article quotes the same numbers with their methodology and their limits attached, because the caveats are not academic hygiene. They are where the lesson lives. Read properly, the figures do not contradict each other — they measure different things — and they converge on a conclusion considerably more useful than alarm: most AI project failure is not a technology problem. It is settled by decisions a board controls — what the project is for, who owns it, what success means, and when to stop.
Key takeaways
- The famous failure figures — 80%, 95%, 42% — measure different things, over different windows, with different methods. None of them is "the" AI failure rate.
- With their caveats restored, the studies agree on a shape: adoption is near-universal, most pilots never reach production, and only roughly one organisation in twenty reports value at scale.
- RAND's five root causes of AI project failure are organisational rather than technical, and every one of them is settled before a model is chosen.
- A sixth cause runs through all the datasets: a governance vacuum, in which nobody owned the decision, success was never defined, and no evidence trail existed to support stopping.
- Boards change the odds before procurement: success metrics first, a named owner, explicit kill criteria, and governance built into the system rather than audited after it.
What the headline numbers actually measure
"More than 80% of AI projects fail." The most-quoted figure comes from RAND's August 2024 report (RRA2680-1), and RAND itself is more careful with it than the people who repeat it. The report's framing is that "by some estimates, more than 80% of AI projects fail" — twice the rate of comparable IT projects that do not involve AI. That is prior-literature context, not a failure rate RAND measured. The report's own evidence base is 65 interviews with experienced data scientists and engineers, and its real contribution is not the headline but the cause analysis those interviews produced, which we come to below.
42% and 46%. The sharpest recent measurement comes from S&P Global Market Intelligence's 451 Research, in a Voice of the Enterprise survey with fieldwork in October–November 2024 and 1,006 respondents across North America and Europe. The share of companies abandoning most of their AI initiatives jumped from 17% to 42% year on year, and respondents scrapped on average 46% of AI proofs-of-concept before they reached production. The caveats: it is self-reported, "most initiatives" is not "all", and a single year's movement is a data point rather than a trend. But as a structured survey with a real sample, it is sturdier evidence than the 80% figure it is often quoted alongside.
Gartner's 30% — a prediction, and what became of it. In July 2024 Gartner predicted that at least 30% of generative-AI projects would be abandoned after proof of concept by the end of 2025, citing poor data quality, inadequate risk controls, escalating costs and unclear business value. That was a forecast, not a measurement — and it is worth being honest about how it aged. Gartner's subsequent analysis suggests the realised abandonment rate exceeded half. The prediction was not alarmist. It was conservative.
"95% of GenAI pilots fail." The figure that dominated headlines comes from MIT's NANDA initiative and its August 2025 "GenAI Divide" report, covered by Fortune. The caveats here are heavy, and they matter. "Failure" was defined as no measurable profit-and-loss impact within six months — a short window for any enterprise system to show financial return. The sample was a convenience sample: 153 survey responses gathered at conferences, 52 interviews, and around 300 publicly reported deployments. The authors themselves describe the figures as directional. The number says far less than the headlines claimed; the direction, though, matches everything else in this list.
The value studies. Two large surveys ask the question from the other end — not "did the project die?" but "did it produce value?". BCG's September 2025 study, "The Widening AI Value Gap", surveyed more than 1,250 senior executives and found only around 5% of companies achieving AI value at scale, while roughly 60% reported no material value. McKinsey's State of AI, published November 2025 with 1,993 respondents, found 88% of organisations now using AI in at least one function, but only 39% reporting EBIT impact at the enterprise level, and about 6% qualifying as "high performers". Both rest on executive self-assessment, and attributing EBIT movement to AI is a judgement, not an audit. Even so, two independent samples landing on roughly 5–6% at the top is the most informative convergence in the whole dataset.
| Figure | Source | What it measured | What it does not tell you |
|---|---|---|---|
| More than 80% of AI projects fail | RAND, Aug 2024 | Prior estimates plus 65 expert interviews | Not a measured survey; RAND's own framing is "by some estimates" |
| 42% abandoned most AI initiatives, up from 17% | S&P Global / 451 Research, fieldwork Oct–Nov 2024 | Self-report, n=1,006, North America and Europe | One year's movement; "most" is not "all" |
| At least 30% of GenAI projects abandoned after proof of concept | Gartner, Jul 2024 | A prediction for end-2025, not a measurement | Gartner's later analysis suggests the outcome exceeded half |
| 95% of GenAI pilots fail | MIT NANDA, Aug 2025 | No P&L impact within six months; convenience sample of 153 conference responses, 52 interviews, ~300 public deployments | Authors call the figures directional; six months is a short window |
| ~5% achieve value at scale; ~60% no material value | BCG, Sept 2025 | Executive self-assessment, n=1,250+ | "Value at scale" is a high bar, not a failure rate |
| 88% adoption; 39% EBIT impact; ~6% high performers | McKinsey, Nov 2025 | Self-report, n=1,993 | Attributing EBIT to AI is judgement, not audit |
None of these is the AI failure rate, because there is no such single number. Together, though, they describe a consistent shape: adoption near-universal, attrition between pilot and production severe, and value at scale rare. That shape — not any single percentage — is what a board should govern against.
Five root causes, and a sixth the data keeps pointing at
The RAND report is worth a board's attention precisely because it asked why, not just how often. From its 65 interviews, five leading root causes:
- Problem-definition misalignment. Stakeholders misunderstand, or miscommunicate, the problem the AI is meant to solve. The project optimises for something nobody needed.
- Insufficient data. The organisation lacks the data required to train or ground an adequate system, and discovers this after committing.
- Technology-first orientation. The organisation pursues the technology rather than a problem — acquiring AI because AI is what one acquires this year.
- Infrastructure gaps. There is no foundation for managing data and deploying finished models, so the pilot has nowhere to live.
- Beyond current capability. The problem chosen is genuinely harder than what today's AI can do.
Notice what is absent: "the model was not good enough" appears only once, at number five, and the first four are organisational failures that are settled before any vendor is selected. Gartner's stated abandonment drivers — poor data quality, inadequate risk controls, escalating costs, unclear business value — tell the same story from a different dataset: three of the four are failures of governance, not engineering.
Which points at the sixth cause, the one the failure literature keeps circling without always naming: a governance vacuum. Nobody owned the decision — there was committee enthusiasm but no named individual accountable for the outcome. Success was never defined — no baseline, no target metric, no time window, which means the project can neither succeed nor fail, only continue. And there was no evidence trail — nothing recorded that would let anyone decide, defensibly, to stop or to scale.
The UK's National Audit Office found exactly this pattern when it examined the use of AI in government in March 2024. The NAO did not publish a failure rate, and it would be wrong to press one onto it. What it found was a governance gap: adoption at an early, pilot-heavy stage, limited evaluation of whether deployed systems actually worked, and unresolved questions of strategy and accountability. The same vacuum, in public-sector form. We set out what filling that vacuum looks like in our companion piece on building a UK AI governance framework.
What failure looks like from the inside
The statistics describe failure from orbit. From inside an organisation, it rarely looks like an explosion. It looks like a pilot that never gets a production date.
The pattern has a name in delivery teams: proof-of-concept purgatory. A demonstration is built on curated data and it impresses — demos almost always impress, because the demonstration was designed around what the system does well. Production is a different test: the system has to survive the worst data on the worst day, integrate with systems that were never designed to receive it, pass a security review, and answer the question nobody asked during the demo — who is accountable when it is wrong? The S&P Global figure of 46% of proofs-of-concept scrapped is the visible end of this; the invisible version is the pilot still "being evaluated" four quarters later, consuming attention and licence fees while quietly going nowhere.
The MIT NANDA work, for all its sampling caveats, is genuinely useful here: its interviews attribute most pilot stalls to what the authors call a learning gap — tools that do not retain context or adapt to the workflow they sit in, so usage decays once novelty does. That is RAND's problem-definition and infrastructure causes, observed in the wild.
One distinction keeps boards from misreading all of this: abandonment is not the same as failure. A proof of concept killed early, against criteria agreed before it started, is governance working — that is what a proof of concept is for. A project killed after eighteen months because nobody can say what success was supposed to look like is pure loss. Gartner's worse-than-predicted abandonment rate contains both kinds, and from outside they are indistinguishable. From inside, the difference is whether the kill decision was designed in advance or improvised in embarrassment.
What boards change that engineers cannot
Every cause above is upstream of engineering. A delivery team cannot retrofit a problem definition, conjure data the organisation never collected, or invent the success metric after the budget is spent. Four decisions sit squarely with the board, and they map directly onto the failure causes.
Success metrics before procurement. Define the measurable outcome, its current baseline, the target, and the time window — before a vendor is selected, not after. This is the direct counter to RAND's first cause and to Gartner's "unclear business value". If the outcome cannot be stated as a number with a date, the project is not ready to be approved.
Named ownership. One accountable individual, not a steering committee. A project no one's name is attached to can be neither properly defended nor properly killed, and the 42% abandonment figure is in part a census of orphaned projects.
Kill criteria, set at approval. Decide in advance what evidence, by what date, triggers a stop. This converts proof-of-concept attrition from a slow, reputationally awkward drift into a cheap, fast, designed outcome — and it is the difference between the two kinds of abandonment above.
Governance in the build, not after it. Evidence trails, decision records and boundary controls specified as requirements of the system, not bolted on for the auditors. In the systems we build, the decision ledger and source-verification controls are part of the architecture, because a system that cannot show its working cannot be governed — and, as the value studies suggest, the organisations capturing value at scale are distinguished by operating discipline of this kind, not by access to better models.
None of this requires a board to understand model weights. All of it is the ordinary grammar of governance — objectives, ownership, evidence, review — applied before the purchase order rather than after the post-mortem.
Read with their caveats, the numbers do not say that AI does not work. They say that organisations adopt faster than they govern. McKinsey's 88% adoption against 39% reported EBIT impact is not a verdict on the technology; it is the measured size of the gap between deploying a capability and governing it. The figure worth steering by is not the 80% — it is the 5–6% who defined success before they spent, named an owner, and kept the evidence. Nothing in any of these studies suggests they were lucky.
Last reviewed: 12 June 2026.
If you have an AI project that has stalled in pilot, or one that has already failed and nobody can quite say why, that is exactly the conversation our AI project rescue service exists for. If you are earlier in the cycle, see how we approach governance-first delivery, the systems we have built and the controls inside them, or start with our board guides.



