The agentic commerce decision test

*Four structural criteria now determine whether a commerce system carries scripted-automation accountability or the elevated liability profile of an autonomous agent under European regulatory frameworks.*

Context

Why This Distinction Matters Now

European procurement and compliance practice has operated for years under governance frameworks calibrated to deterministic, auditable automation. Robotic Process Automation, the dominant vendor category, executes predefined rules against structured data and produces outputs that are fully traceable to authored logic [3]. The liability assignment is straightforward: the organisation that specifies the rules owns the decision.

That calibration is now under pressure from two directions. On the technology side, commercial AI systems increasingly exhibit at least one of the four agentic properties described in this brief, and vendors do not consistently distinguish their products from RPA in procurement documentation [8][9]. On the regulatory side, the EU AI Act, GDPR Article 22, and PSD2 each carry obligations that are triggered by the presence of specific system properties rather than by the label a vendor applies. A compliance team that assesses a vendor submission on the basis of marketing category rather than structural criteria will systematically misclassify risk.

At the vendor-assessment stage, the gap between marketing label and structural reality translates directly into misassigned contractual liability. Public and private procurement officers in Europe are under increasing pressure to professionalise AI governance within acquisition cycles [5], yet the technical vocabulary for distinguishing scripted from agentic systems has not been standardised across industries [2][4]. The four criteria in this brief draw on the most current peer-reviewed taxonomic work [8][9] and translate that work into a checklist applicable at the vendor-assessment stage, before a system is deployed in a live commerce or payment environment.

Four Tests at a Glance

MULTI-AGENT COLLABORATION The system delegates sub-tasks to distinct agent modules whose outputs feed back into a shared decision state, rather than routing inputs through a fixed processing chain.
DYNAMIC TASK DECOMPOSITION The system decomposes an objective into steps at runtime based on observed state, rather than executing a predetermined sequence regardless of intermediate results.
PERSISTENT MEMORY The system retains and updates a cross-session knowledge state that shapes future decisions, distinct from a stateless lookup or a session-scoped variable cache.
COORDINATED AUTONOMY The system initiates external actions (queries, payments, commitments) across platform or service boundaries without a human-confirmation checkpoint at each handoff.

The Four Criteria in Structural Detail

Each criterion isolates a distinct architectural property that scripted automation structurally cannot exhibit [8][9].

Multi-agent collaboration is present when the system instantiates specialist sub-agents and coordinates their outputs toward a shared goal. A scripted automation pipeline may route data across modular services, but each module's behaviour is fully specified at design time; no module revises its logic based on the emergent state of another. An agentic system, by contrast, allows sub-agents to produce outputs that modify the task framing for sibling agents within the same run. The distinction is architectural: fixed graph versus adaptive graph.

Dynamic task decomposition is present when the system generates a task plan after observing the current environment state, rather than executing a plan authored at design time. Robotic Process Automation, the dominant scripted paradigm, operates on explicit rule sets and predefined decision trees [3]; it cannot produce a novel step sequence it was not pre-programmed to emit. When a system constructs its own intermediate steps in response to live data, the resulting behaviour is structurally opaque to static audit.

Persistent memory is present when prior interaction history, inferred preferences, or accumulated context influence a decision made in a later session. This is distinct from a configuration file or a session cookie: the memory state is written by the system's own inference process, alters over time, and functions as an input to future decisions. GDPR Article 22 attaches to any solely automated decision producing legal or similarly significant effects; persistent memory is one property that can make Article 22 explanation obligations harder to discharge, because a memory state written by inference is not easily reconstructed from a static rule set for the purposes of explanation. Decisions shaped by accumulated inference across sessions may resist the kind of deterministic audit trail that Article 22's explanation right presupposes.

Coordinated autonomy is present when the system initiates consequential external actions across service or platform boundaries without a synchronous human confirmation at each handoff. This is the criterion most directly relevant to payment compliance: a system that can instruct a payment service provider, select a fulfilment partner, and confirm a purchase order in a single autonomous run has exercised agency across at least three regulatory perimeters simultaneously [8][9]. The absence of a contractual delegation chain at each such handoff constitutes an independent compliance failure regardless of how the other three criteria score.

Essentials to Consult

Sapkota, Roumeliotis & Karkee (2025, Information Fusion) [9]: the most current peer-reviewed taxonomy distinguishing individual AI agents from multi-agent agentic systems, supplying the definitional ground for criteria one and four.
Syed et al. (2019) [3]: the standard account of Robotic Process Automation's architectural constraints, establishing the scripted-automation baseline against which all four criteria are measured.
Sela (2018) [11]: an analysis of how automated versus human-adjudicated processes affect procedural legitimacy, directly relevant to accountability design in agentic commerce dispute resolution.
Huang & Rust (2020) [2]: the mechanical/thinking/feeling AI taxonomy, providing comparative vocabulary for situating agentic commerce systems within a broader capability classification that compliance officers can reference across vendor categories.

What Changes When You Use This Test

Once a system crosses into agentic territory on even one criterion, three specific compliance vectors shift in ways that scripted-automation governance frameworks do not address.

First, GDPR Article 22 automated-decision safeguards become directly applicable wherever a solely automated decision produces legal or similarly significant effects. Persistent memory intensifies the discharge problem: an agentic system's memory state is dynamic and potentially non-reproducible across sessions, which means the standard RPA audit trail is structurally insufficient. Compliance teams must contract for memory-state logging at the vendor level, not merely for transaction-level audit records.

Second, PSD2 strong customer authentication exemption eligibility depends on whether the transaction was initiated by an authenticated human or by an autonomous system acting on stored credentials. PSD2 Article 97 and the EBA's RTS on SCA do not explicitly address agentic-system-initiated transactions; the exemption categories were written against a model of human-confirmed payment acts, and how they apply to a system exercising coordinated autonomy across payment boundaries is an open interpretive position rather than settled compliance exposure. Acquiring banks and payment service providers should treat this as an identified gap requiring explicit contractual delegation rather than relying on existing exemption categories pending authoritative regulatory guidance.

Third, EU AI Act high-risk classification obligations attach when a system makes or materially influences decisions in regulated sectors, including financial services. A system that exhibits dynamic task decomposition and coordinated autonomy in a procurement context meets the functional profile of a high-risk system irrespective of the vendor's own classification. Procurement teams that accept a vendor's self-classification as automation without applying the four criteria independently expose the contracting organisation to the full obligations of the AI Act without the governance infrastructure those obligations require [1][7].

Counterpoint

The Classification Skeptic's View

The four-criterion framework presupposes that a binary classification between agentic and scripted systems is achievable in practice. The strongest objection to this presupposition runs as follows: autonomy, memory, and task decomposition all exist on continuous scales, and any threshold drawn across those scales is a policy choice rather than a structural fact. A scripted system with a large decision tree and session-scoped context can exhibit behaviour that is functionally indistinguishable from a system with persistent memory and dynamic decomposition, depending on the complexity of the authored rules. Defining the boundary by criterion count rather than by demonstrated emergent behaviour invites vendors to architect their systems just below the threshold, using sophisticated scripting to replicate agentic outputs while retaining the compliance profile of deterministic automation [2][3].

The criteria most exposed to this boundary-gaming risk are coordinated autonomy and persistent memory. Coordinated autonomy is particularly vulnerable because the distinction between a human-confirmed handoff and an agent-initiated one can be obscured by inserting a nominal confirmation step that adds no meaningful oversight. Persistent memory is similarly susceptible because a sufficiently large session-scoped context store can replicate cross-session influence without satisfying the technical definition of a persistent knowledge state. Practitioners should scrutinise these two criteria with the greatest care during vendor assessment.

This objection also has a regulatory-design dimension. Compliance frameworks built on binary classification tend to concentrate risk at the boundary: entities just outside the agentic category bear no elevated obligation, while entities just inside it bear the full obligation stack. That concentration creates incentives for misclassification that no checklist alone can eliminate. A graduated instrument, one that assigns partial obligations when one or two criteria are present and full obligations only when all four are met, would be more resistant to boundary manipulation, though the EU AI Act's current risk-tier structure does not accommodate such granularity. To reconcile these positions: the one-criterion threshold governs when elevated scrutiny is warranted and when structural interrogation of a vendor system must begin; a graduated obligation structure governs how much obligation ultimately attaches, and that second determination remains for supervisory and legislative development. Practitioners should treat the four-criterion test as the trigger for structural interrogation, not as a self-executing classification rule [11].

Unresolved Frontiers

The operational audit procedures that allow a compliance officer to verify, during a live vendor assessment, whether a system's task sequencing is generated at runtime or retrieved from an authored decision tree remain an open and unresolved methodological question in the field.
When an agentic system initiates a PSD2-governed payment transaction, the allocation of primary liability for SCA non-compliance among the developer, deployer, acquiring bank, and PSP within the delegation chain lacks authoritative guidance and has not been settled by published regulatory decisions.
Whether the presence of a single criterion is sufficient to trigger EU AI Act high-risk classification in a financial services procurement context, or whether a threshold number of criteria must be met, awaits definitive interpretive resolution; current regulatory guidance does not address the question directly.
The application of GDPR Article 22 explanation obligations when the inference state driving a decision is distributed across multiple sub-agents and no single agent holds the complete decision logic requires further supervisory clarification before practitioners can rely on any single compliance approach.
The extent of documented regulatory outcomes from European supervisory actions against commerce systems that were deployed as scripted automation but exhibited agentic properties in operation remains an open empirical question, as no comprehensive body of published enforcement decisions on this specific pattern has yet emerged.

The four criteria are not a classification convenience: each one present in a deployed system marks a specific point at which European regulatory obligations escalate, contractual delegation chains must be explicit, and audit infrastructure designed for scripted automation becomes structurally inadequate.

References

[1] Dwivedi, Y. K., Hughes, L., Ismagilova, E., Aarts, G., Coombs, C., & Crick, T. (2019). Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. Elsevier BV.

[2] Huang, M., & Rust, R. T. (2020). A strategic framework for artificial intelligence in marketing. Springer Science+Business Media.

[3] Syed, R., Suriadi, S., Adams, M., Bandara, W., Leemans, S. J. J., & Ouyang, C. (2019). Robotic Process Automation: Contemporary themes and challenges. Elsevier BV.

[4] Alter, S. (2008). Defining information systems as work systems: implications for the IS field. Palgrave Macmillan.

[5] OECD. (2023). Professionalising the public procurement workforce. Public Governance Policy Papers.

[6] Yaqub, M. Z., & Alsabban, A. (2023). Industry-4.0-Enabled Digital Transformation: Prospects, Instruments, Challenges, and Implications for Business Strategies. Multidisciplinary Digital Publishing Institute.

[7] Bartneck, C., Lütge, C., Wagner, A. R., & Welsh, S. (2020). An Introduction to Ethics in Robotics and AI. Springer International Publishing.

[8] Sapkota, R., Roumeliotis, K. I., & Karkee, M. (2025). AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges. SuperIntelligence - Robotics - Safety & Alignment.

[9] Sapkota, R., Roumeliotis, K. I., & Karkee, M. (2025). AI Agents vs. Agentic AI: A Conceptual taxonomy, applications and challenges. Elsevier BV (Information Fusion).

[10] Singh, A., Dwivedi, A., Agrawal, D., & Singh, D. (2023). Identifying issues in adoption of AI practices in construction supply chains: towards managing sustainability. Springer Science+Business Media.

[11] Sela, A. (2018). Can Computers Be Fair? How Automated and Human-Powered Online Dispute Resolution Affect Procedural Justice in Mediation and Arbitration. The Ohio State University.