Methodology

How we built a task-level AI readiness benchmark for procurement roles

Why This Benchmark Exists

Most commentary on AI and procurement is speculative. This benchmark takes a different approach: instead of asking “will AI replace procurement?”, we asked “what exactly does a procurement professional do, and can AI do each part of it?” The result is a ground-up, task-level assessment of AI readiness across Sourcing and Contracting roles.

How We Decomposed the Roles

We started by breaking each procurement role into its constituent tasks — the discrete, repeatable units of work that make up the role. For each task, we went further: mapping the trigger that initiates it, the inputs required, the step-by-step actions performed, the expected output, and the conditions that define success.

We did not stop at the happy path. For every task, we identified the exceptions — the things that go wrong, why they go wrong, and how a skilled practitioner resolves them. This is where most AI assessments fall short. Exception handling is where human judgment, relationship knowledge, and tacit expertise show up most clearly, and it is precisely what separates a benchmark grounded in operational reality from one based on job descriptions.

What We Assessed

For each task, we assessed two dimensions:

AI Readiness — whether AI can perform the task autonomously, with human oversight, or not at all. We applied this at the task level, not the role level, because AI readiness varies significantly within a single role. A Strategic Sourcing Analyst, for example, may have tasks AI can run end-to-end sitting alongside tasks that require relationship judgment no AI can replicate today.

Human Effort — how much practitioner time and expertise the task typically demands. This matters because displacing a high-effort task has more operational impact than displacing a low-effort one. Our scoring reflects this: the benchmark is weighted by effort, so the role-level score reflects real-world time displacement, not just task count.

What Trips AI Up

Where AI falls short, we recorded why — not just that it falls short. We categorized AI limitations into six ceiling types: Judgment, Relationship, Tacit Knowledge, Context, Legal Accountability, and Creativity. The “What Trips AI Up” view of the dashboard shows which ceiling types appear most frequently and where they cluster by role — giving practitioners a clear picture of where human expertise remains irreplaceable.

Scope

This MVP covers 5 Sourcing and Contracting roles, comprising 58 tasks and several hundred exception scenarios. Coverage will expand to all 35 procurement roles subject to demand from the practitioner community.

Research Credit

This benchmark was developed independently by the procurement.news research team. Methodology developed in collaboration with AllCaps.