Goldenberry

Writing  ·  Field note  ·  9 min

A five-dimension framework for AI readiness.

28 April 2026

Architectural blueprint of a building plan
Photo: Amsterdam City Archives · Unsplash

We get asked, on a fairly regular basis, to give a company “an AI maturity score.” A single number. Usually because the board wants one to put in a deck.

We refuse, politely. The number is not useful. A composite score is the average of things that are not commensurable. A company can be in the top decile on engineering velocity and the bottom decile on data hygiene. The average is a number that describes neither. The board reads it as good or bad. The team reads it as imprecise. Nobody knows what to do on Monday morning.

What we do instead is run a benchmark across five dimensions, separately scored, with three to five concrete moves recommended per dimension. The score is a side effect. The action list is the deliverable. This document is a description of the framework. We use it to open most strategy engagements, and we are publishing it because clients keep asking for it as a standalone piece of work.

The five dimensions

The dimensions are not arbitrary. They are the five places, in our experience, where a company can be ahead of the rest of itself, and where the gap actually matters for what gets shipped next quarter.

One. Business and Product Strategy. Whether AI is a core pillar of how you compete, or a feature shelf. Whether the roadmap has AI-specific outcomes attached to it. Whether the board sees AI metrics that are tied to revenue and retention, or whether AI lives in an “innovation” sub-budget that nobody reports on. Whether the company has a thesis on monetisation that survives the model getting cheaper next year.

Two. Engineering and Product Delivery. What percentage of the engineering workflow is AI-augmented today, and what percentage could be by the end of the quarter if you pushed. Whether evaluations are continuous and tied to deploys. Whether AI features have SLAs the way regular features do, or whether they are exempted from the on-call rotation. The percentage of the team that has shipped a model-driven feature alone.

Three. Operational Excellence. AI inside the business, not in the product. Finance, legal, marketing, sales, customer support, HR. Whether each function has a champion. Whether the productivity gains are measured. Whether the workflows are custom and owned by the team, or whether the team is paying for SaaS that the SaaS vendor will rebuild on top of any model in eighteen months.

Four. Data and AI Infrastructure. The pipelines. The quality of the data going in. Whether the company has a data flywheel that gets stronger with every customer interaction, or whether each interaction is forgotten. Self-service analytics maturity. Whether the data platform makes new AI features cheap to build, or whether each feature has its own ETL.

Five. Governance and Enablement. Security, legal, risk, talent, the operating model. Whether the EU AI Act is a checkbox or a system. Whether human-in-the-loop is real or theatrical. Whether PII handling has been audited or assumed. The talent pipeline for the next two years.

These five do not interact symmetrically. Strong infrastructure compensates for some delivery weaknesses. Strong delivery does not compensate for missing strategy. We score them independently anyway, because the act of separating them is most of the value.

The scoring

We use the CMMI levels, ranges one through five. We did not invent this. The Software Engineering Institute did, in the 1990s, for assessing software process maturity. The reason we use it is that it has reasonable definitions for each level that have survived several decades of contact with practice.

In our adaptation, the levels read approximately like this.

Level one. Ad hoc. Some people on the team are using AI tools. There is no shared definition of what good looks like. Outcomes are uneven and depend on which person you talked to.

Level two. Repeatable. There are a few patterns that work, and a few people who can repeat them. The patterns are not yet documented or shared widely. New hires reinvent the wheel.

Level three. Defined. The patterns are written down. Onboarding includes them. The team can describe its AI practice without referring to specific people. Most projects start from a known baseline.

Level four. Managed. The practice has metrics attached. You can answer, with data, the question “are we getting better.” Outcomes are predictable enough that planning works.

Level five. Optimising. The practice changes itself. New patterns surface from the team without leadership pushing. Failures become formal learnings. The metrics evolve as the work does.

We have never scored a client at level five on all five dimensions. We have never scored a client at level five on more than two dimensions. The honest distribution we see, across the companies we have benchmarked, is two-three on most dimensions, with one outlier in either direction. The outlier is the most useful piece of information from the benchmark.

How a benchmark runs

A benchmark takes one week. Five working days. We schedule it as one week because that is how long it takes us to do the work properly, and not longer because the marginal value of week two is small once the interviews are in.

Monday and Tuesday are interviews. We talk to between ten and fifteen stakeholders across the company. The CEO, the CTO, two or three engineering leads, one or two product managers, the heads of the functions we score in dimension three (finance, marketing, customer success, sales), one person from data, one person from legal or compliance. We do not run them as workshops. They are one-on-one, forty-five to sixty minutes, with a structured guide that we adapt as the picture sharpens. The interviews are the most expensive thing in the engagement and we do not skimp on them. They are also the deliverable nobody sees, which is fine.

Wednesday is artifacts. We read the codebase, specifically the AI-adjacent parts: model call paths, evaluation frameworks, retrieval, observability. We read the data platform documentation. We look at the most recent quarter of incidents that touched AI features. We look at the talent pipeline. We do not produce a survey for the team to fill in. The survey is a way of pretending to do this work without doing it.

Thursday is the draft. Five dimensions. Each gets a score with a written justification. Each gets a list of three to five recommendations, ordered by impact-over-effort. The recommendations are concrete enough that an engineering lead could turn one into a Jira ticket the same day they read it.

Friday is review. We sit with the leadership team and walk through the dimensions one by one. Half the time, one of the recommendations changes because leadership knows something we did not. The other half, we hold the recommendation as written and explain why we still believe it after hearing the counterargument. By the end of Friday the report is shared.

The artifact is short. Twenty pages, maximum. Most of the value is in the action items, not the scores. We tell every client this on day one. They almost always look at the scores first anyway. After a few weeks of working from the action items, the scores stop mattering. That is the goal.

What the company gets at the end of the week is, in order: a five-dimension scorecard, a written justification per score, and a prioritised list of moves the team can start on Monday. There is no follow-on commitment built into the benchmark. About half the companies we run this for hire us afterwards for a longer engagement. About half do not. Both outcomes are fine. The action items work either way.

What the benchmark is not

It is not a beauty contest. We do not benchmark across companies in a public ranking. We do not let CEOs use the score in fundraising decks unless we have explicitly cleared the framing.

It is not a product roadmap. The strategic direction is the company’s call, not ours. The benchmark gives them the inputs.

It is not a substitute for working with a team. A score is a snapshot. A snapshot does not reflect that an engineering lead is leaving in three months, or that the data platform is six weeks from a major rewrite. The real work happens after the benchmark, when we know each other well enough that those conversations are honest.

Why publish this

Two reasons.

The first is that we keep getting asked, on consultative calls, what we use to assess a company before recommending an engagement. The answer is this framework. Sending people a link is faster than describing it.

The second is that the framework is more useful when more companies use it. Even when they do not work with us. The scoring vocabulary becomes shared. The conversation about “what does level three on Operational Excellence look like” becomes less idiosyncratic. We do not gain anything by keeping the methodology private. We have written it up so it can be picked up.

If you want the full guide for running a benchmark inside your own company, including the interview script and the scoring rubric, write to us. We will send the current version. It is, like the score it produces, work in progress.