Distinguishing agentic commerce from algorithmic trading and robotic process automation

Abstract

Agentic commerce, algorithmic trading, and robotic process automation are frequently grouped under the label of automated decision-making, yet their structural properties diverge in ways that render common regulatory treatment technically incoherent. This paper argues that the divergence is not a matter of degree but of kind: agentic commerce combines adaptive goal-pursuit, autonomous payment execution across multiple decision cycles, and dynamic task decomposition in ways that neither MiFID II's market-conduct rules nor conventional RPA governance frameworks anticipate. The contribution is conceptual rather than prescriptive: the paper constructs a three-axis differentiation matrix grounded in decision authority, feedback-loop architecture, and intervention dependency, and uses that matrix to identify the specific mechanisms within existing EU instruments that fail when applied to agentic systems. The analysis demonstrates that MiFID II's pre-trade and post-trade transparency obligations presuppose a fixed rule-set authored by an identifiable human principal; that RPA governance treats auditability as a property of deterministic scripts; and that neither framework contains a workable theory of harm for emergent, multi-agent coordination. Acknowledging agentic commerce as a distinct regulatory category is a precondition for coherent policy design, including a technology-neutral behavioural threshold test, a macro-prudential oversight layer calibrated to aggregate transaction volume, and workable liability attribution rules for emergent agent behaviour. Without categorical clarity, regulatory instruments will continue to produce enforcement gaps that opacity-dependent deployment patterns actively exploit.

Conceptual Boundaries in Autonomous Commerce Systems

The proliferation of automated systems in financial markets and commercial transactions has outpaced the development of taxonomic clarity adequate to support regulatory design. Three categories of automation are now routinely deployed across EU-regulated payments and capital markets: algorithmic trading systems, robotic process automation platforms, and an emerging class of systems that this paper designates as agentic commerce. The three categories share a surface property, the substitution of software execution for human action at the moment of transaction, but they differ along structural dimensions that determine what regulatory instruments can meaningfully address them.

This paper advances a single principal claim: agentic commerce constitutes a category distinct from both algorithmic trading and robotic process automation, and that distinctness is structural rather than scalar. The claim is not that agentic systems are more powerful or more dangerous in some unspecified sense. The claim is that agentic commerce exhibits properties, specifically adaptive goal-pursuit, multi-cycle autonomous payment execution, and dynamic task decomposition, that the foundational assumptions of existing EU regulatory instruments were not designed to accommodate. Misclassifying agentic commerce as an extension of algorithmic trading or an advanced form of RPA does not merely create definitional imprecision; it generates concrete enforcement gaps by directing regulatory instruments at structural features that agentic systems do not share.

The paper's contribution is analytical rather than prescriptive. No new regulatory text is proposed, and no enforcement mechanism is designed. The contribution is a structured account of why the boundary matters, what criteria locate it, and which specific provisions within existing instruments fail to cross it. This analytical layer is, in the paper's assessment, a precondition for prescriptive work: governance instruments cannot be designed with precision for a category whose properties have not been specified.

The analysis proceeds as follows. The motivation section grounds urgency in current deployment patterns and in documented instances of regulatory misclassification. The related work section positions the paper against extant literature on algorithmic regulation, RPA governance, and AI agency. The methodology section specifies the four structural criteria used to distinguish the three categories: scope of decision authority, modification without human intervention, feedback-loop architecture, and deployment context. The results section applies those criteria to generate a differentiation matrix and maps specific EU regulatory provisions against the matrix to identify structural coverage gaps. The discussion section explains the functional mechanisms by which existing frameworks fail, not merely that they fail. The conclusion elaborates the implications of categorical clarity for specific policy vectors without compressing those implications into rhetorical summary.

Two definitional notes are necessary before proceeding. First, the paper uses the term agentic commerce to refer to systems that autonomously identify commercial objectives, decompose them into transaction sub-tasks, execute payments across multiple decision cycles without per-transaction human confirmation, and revise their task plans in response to environmental feedback [22]. This definition excludes systems that execute a fixed rule against a trigger condition, even if that execution is fast or complex. Second, the paper uses the term regulatory framework to refer to the complete instrument including its operative provisions, its recitals (where these constrain interpretive extension), and the technical standards issued under delegation from the primary instrument. Regulatory framework does not refer to the general policy intent of a directive or regulation, which is often expressed at a level of abstraction sufficient to encompass many things the instrument itself cannot operationalise.

The EU context is chosen because the Union has produced the densest concentration of operative regulatory instruments applicable to automated financial systems, including MiFID II, PSD2, GDPR, eIDAS, the AI Act, and the Digital Markets Act. That density makes the EU the most productive jurisdiction in which to test the hypothesis that existing frameworks are necessary but insufficient. The necessity claim acknowledges that MiFID II's conduct-of-business rules and PSD2's strong customer authentication requirements do apply to some features of agentic commerce deployments. The insufficiency claim is that those applications leave structural coverage gaps that are not remediable through doctrinal extension alone [17].

Why Agentic Commerce Demands Regulatory Clarity Now

The deployment of systems that qualify as agentic commerce under the definition adopted in this paper is no longer prospective. Commercial deployments in e-commerce, financial services, and business-to-business procurement already exhibit the structural properties that distinguish agentic commerce from its predecessors: autonomous multi-cycle transaction execution, adaptive plan revision, and persistent memory across sessions [21]. The regulatory frameworks that apply to these deployments were designed for earlier automation architectures, and the gap between what those frameworks can address and what deployed systems actually do is actively widening as model capabilities increase [6].

The urgency has three concrete dimensions. The first is operational opacity. Agentic commerce systems make transaction decisions through processes that are not inspectable by the counterparties those decisions affect, nor in many cases by the operators who deployed the systems [1]. Existing audit and reporting obligations under MiFID II and PSD2 were designed against systems whose decision logic is fixed at deployment and therefore auditable in principle, even if auditing is resource-intensive in practice. A system that revises its own decision logic in response to transactional feedback across operational cycles presents an audit surface that changes between the moment of authorisation and the moment of examination [15]. This is not a gap in enforcement intensity; it is a gap in the structural fit between the audit mechanism and the system being audited.

The second dimension is market integrity. Algorithmic trading regulation under MiFID II developed partly in response to documented episodes in which coordinated automated selling produced rapid price dislocations. Those episodes involved systems operating at high frequency within fixed parametric bounds. Agentic commerce systems operating across procurement, payment, and logistics functions could produce analogous coordination effects, but the coordination mechanism would be goal-directed adaptation rather than simultaneous rule-triggering. Current market surveillance instruments are calibrated to detect the latter; they have no established methodology for detecting the former [7].

The third dimension is consumer protection. A consumer who delegates purchasing authority to an agentic system has not merely automated a fixed preference; the consumer has delegated authority to a system that may revise its interpretation of that preference in response to information the consumer has not reviewed [21]. The consent and information disclosure frameworks under PSD2 and the GDPR were constructed around the assumption that the scope of authorised action is determinable at the moment of consent. When the system acts under a delegated objective rather than a specified instruction set, that assumption fails, and the consumer's practical ability to exercise the rights the framework formally grants is correspondingly diminished [19].

Misclassification compounds all three problems. When agentic commerce deployments are classified as algorithmic trading, operators face conduct-of-business obligations that were designed for securities markets and do not translate to general commerce. When they are classified as RPA, operators face lighter governance obligations that assume deterministic scripts and fixed audit trails, which agentic systems do not produce. In neither case does the regulatory treatment match the structural properties of the system. The result is that compliant operators are subject to obligations that do not address the actual risk profile of their systems, and non-compliant operators have no clearly applicable standard against which enforcement action can be grounded [16]. Both outcomes reduce the coherence and effectiveness of EU financial market governance at a moment when deployment scale is increasing [20].

Prior Frameworks for Automation, Agency, and Fintech Governance

The literature relevant to this paper spans four distinct strands: algorithmic trading regulation and its academic commentary; RPA governance scholarship; AI agency and multi-agent systems research; and FinTech governance analysis. Each strand addresses a partial view of the problem; none addresses the structural distinctness of agentic commerce as defined in this paper.

Algorithmic trading regulation. MiFID II (Directive 2014/65/EU and its delegated regulations) constitutes the primary EU instrument governing algorithmic trading in financial instruments. Its provisions require firms to implement pre-trade controls, maintain algorithmic trading logs, and obtain authorisation for high-frequency trading strategies. Academic commentary on MiFID II has examined the adequacy of circuit-breaker mechanisms, the definition of high-frequency trading, and the challenges of cross-border enforcement within the single market [14]. Darolles [14] provides a useful overview of the regulatory response to FinTech more broadly, noting that regulators have consistently responded to technological change by extending existing instrument categories rather than creating new ones. This paper builds on that observation but extends it: the present argument is not merely that extension is a regulatory tendency, but that extension fails for specific structural reasons when the new category exhibits emergent rather than parametric behaviour. What the algorithmic trading literature does not address is any system that revises its own decision logic during operation, because MiFID II's design assumes that the algorithm is fixed and authoriable at a discrete moment.

RPA governance. Scholarship on RPA governance addresses the deployment of deterministic software robots that execute rule-based tasks by interacting with existing application interfaces [9]. Enholm et al. [9] review the literature on AI and business value, noting that RPA is characterised by its non-invasive integration architecture and its dependence on stable process definitions. The governance literature treats auditability as a property of the script: because an RPA bot does what its script specifies, a complete audit trail can in principle be reconstructed from the script plus the execution log. This paper identifies that assumption as the structural point of failure when RPA governance is applied to agentic systems, which do not operate from fixed scripts and which may not produce execution logs that correspond to a stable decision logic.

AI agency and multi-agent systems. Rahwan et al. [7] establish the concept of machine behaviour as a scientific domain, arguing that the behavioural patterns of AI systems require empirical study structurally similar to the study of animal or human behaviour. Their framework is relevant because it draws attention to the fact that system-level behaviour may diverge from the intentions of any individual designer, a property that becomes legally significant when liability attribution is required. Brohi et al. [15] survey the landscape of agentic AI and large language models, documenting the technical properties of agentic systems including persistent memory, tool use, and multi-agent coordination. This paper draws on that technical characterisation but applies it to a regulatory analysis that the survey does not itself undertake. Balaskas [21] provides a systematic review of agentic AI in e-commerce, mapping the transition from recommendation systems to delegation architectures and identifying consumer governance mechanisms as the primary near-term safeguard. This paper engages directly with that finding but identifies its scope limitation: user-facing governance mechanisms are inadequate for systemic or market-level risks that no individual consumer can perceive or contest.

FinTech governance and AI regulation. The EU AI Act (Regulation 2024/1689/EU) establishes a risk-classification schema for AI systems and imposes conformity assessment obligations on high-risk applications. Governance and digital economy scholarship [17] examines how EU data governance instruments interact with commercial AI deployment. Sardana et al. [20] analyse the deployment of agentic AI in compliance functions, documenting a case in which agentic systems achieved measurable reduction in audit workload and pre-emptive violation detection, while also noting that the systems' decision paths were not fully transparent to the compliance officers they assisted. Nwafor and Ayodele [16] examine regulatory challenges in FinTech, identifying fraud detection and liability allocation as persistent gaps. This paper extends those analyses by showing that the regulatory gaps they identify are not remediable through incremental instrument extension because they arise from a structural mismatch between the instrument's foundational assumptions and the system's architecture. Dwivedi et al. [3] and Duan et al. [5] address the broader challenges of AI in decision-making contexts, providing conceptual grounding for the argument that AI systems operating in consequential domains require governance frameworks calibrated to their specific decision architecture, not simply to their domain of deployment.

The present paper is distinguished from all prior work by its specific focus on the three-way boundary. No prior source has constructed a structured comparison of all three categories, algorithmic trading, RPA, and agentic commerce, against a common set of structural criteria with the explicit aim of identifying the mechanism by which existing EU regulatory instruments fail at each boundary.

Boundary Definition Through Structural Criteria

The differentiation matrix developed in this paper rests on four structural criteria. Each criterion is selected because it is observable in principle by a regulator or auditor with access to system documentation and execution logs, and because it generates a discrete value, or a position on a defined scale, rather than a continuous variable that requires calibration against a contested benchmark. The four criteria are: (1) scope of decision authority, (2) modification without human intervention, (3) feedback-loop architecture, and (4) deployment context.

Criterion 1: Scope of decision authority. This criterion asks what range of transactional outcomes a system can produce without a per-action human authorisation. The criterion is measured along two axes: the breadth axis (how many distinct transaction types the system can initiate) and the depth axis (the maximum financial or contractual commitment the system can make within a single operational session). A system with narrow breadth and shallow depth, one that executes a single order type against a single instrument within a pre-set position limit, falls at one end of the scale. A system that can initiate purchases, negotiate terms, select counterparties, and commit payment across multiple product categories and multiple sessions falls at the other. The criterion is operationally observable through the system's authorisation parameters as documented at deployment and through the transaction records generated during operation [22].

Criterion 2: Modification without human intervention. This criterion asks whether the system's decision logic changes during operation in response to environmental inputs, and if so, whether those changes are logged in a form that allows post-hoc reconstruction of the decision path. The criterion distinguishes three positions: static logic (the system's decision rules are fixed at deployment and do not change during operation), parameterised adaptation (the system adjusts numerical parameters, such as price thresholds or timing intervals, within a fixed rule structure without altering the rule structure itself), and structural adaptation (the system revises the rules themselves, or the task decomposition plan, in response to feedback, producing decision logic at time T+n that was not determinable from the system state at time T) [15]. The criterion is observable through comparison of deployment documentation with operational execution logs; structural adaptation is identified when the execution log cannot be fully reconstructed from the deployment specification.

Criterion 3: Feedback-loop architecture. This criterion asks whether the system incorporates a mechanism by which prior outcomes influence subsequent decisions, and whether that mechanism operates within a single transaction cycle or across multiple cycles. A system with no feedback loop executes each transaction independently. A system with an intra-cycle feedback loop adjusts within a single transaction sequence, for example by revising an order quantity in response to a partial fill. A system with a cross-cycle feedback loop carries information from completed transaction cycles into the planning and execution of subsequent cycles, enabling the system to revise its objectives, not merely its execution parameters [7]. Cross-cycle feedback is the property that most directly distinguishes agentic commerce from both algorithmic trading and RPA as those categories are defined in operative regulatory instruments.

Criterion 4: Deployment context. This criterion asks in which regulatory domain the system operates and whether that domain has instrument-specific obligations that the system's operation engages. The relevant domains for this analysis are: regulated financial instruments markets (engaging MiFID II obligations); payment initiation and account information services (engaging PSD2 obligations); general commercial transactions not regulated as financial instruments (engaging consumer protection and contract law); and mixed contexts in which the system operates across two or more domains within a single operational session. Mixed-context deployment is identified when the system's transaction sequence crosses a regulatory domain boundary, for example by initiating a payment service to complete a purchase that it also negotiated, without a human transition step between domains [17].

The decision rule for classification is applied hierarchically. A system is classified as algorithmic trading if it operates exclusively on regulated financial instruments with static or parameterised-adaptive logic. A system is classified as RPA if it operates with static logic regardless of domain. A system is classified as agentic commerce when it combines structural adaptation (Criterion 2) with cross-cycle feedback (Criterion 3), regardless of whether Criterion 1 breadth or Criterion 4 domain coverage is narrow or wide. The combination of structural adaptation and cross-cycle feedback is the necessary and sufficient condition for the agentic commerce classification because those two properties jointly produce the emergent behaviour that existing regulatory frameworks cannot address: decision logic that changes in ways that were not determinable at the moment the system was authorised to operate [5].

Three Distinct Categories and Their Regulatory Gaps

Applying the four structural criteria to the three categories produces a differentiation matrix that clarifies both the properties of each category and the regulatory instruments each property engages.

Algorithmic trading: structural profile. An algorithmic trading system, as that term is used in MiFID II Article 4(1)(39), operates on regulated financial instruments by executing orders according to pre-defined parameters without human intervention at the moment of each order. The decision authority of such a system is bounded by instrument type, the system operates only on the instruments specified in its authorisation, and by position limits set at deployment. Decision logic is static or parameterised-adaptive: the system may adjust order timing or size in response to market microstructure signals, but the rule structure governing those adjustments is fixed. Feedback loops are predominantly intra-cycle: order routing logic responds to fill data within a single order sequence, but the system does not carry information from completed sessions into the re-design of its rule structure. Deployment context is exclusively regulated financial instruments markets.

The MiFID II provisions that apply to this profile are structurally well-matched to it. The requirement to maintain algorithmic trading logs (Article 17(1)) functions on the assumption that the log records an execution sequence that can be fully reconstructed from a fixed rule-set. The circuit-breaker obligations require the system to cease operation when a defined parameter threshold is crossed. The market-making strategy obligations assume that the strategy is authoriable and stable. These obligations are necessary for agentic commerce because any agentic system that operates on regulated financial instruments incidentally engages them. They are insufficient because none of them addresses a system that revises its rule structure during operation: the log requirement does not mandate logging of rule changes; the circuit-breaker does not engage on the basis of behavioural divergence from an original specification; and market-making strategy obligations cannot be assessed against a strategy that has changed since authorisation.

Robotic process automation: structural profile. An RPA system automates deterministic, rule-based tasks by interacting with application interfaces in the same manner as a human user would, but without human intervention at the moment of each interaction [9]. Decision authority is narrow in both breadth and depth: the system executes the steps specified in its script. Decision logic is static: the script does not change during operation, and the system produces no output that was not determinable from the script plus the input data. Feedback loops are absent: each process execution is independent of prior executions at the logic level, though input data may differ. Deployment context varies across domains; RPA is applied in financial services, healthcare, logistics, and general administration.

RPA governance frameworks, where they exist as formalised instruments rather than operational practice guidelines, address this profile through change management controls, which regulate when and how scripts may be modified; access controls, which regulate who may authorise a script or modify its parameters; and audit trail requirements, which are satisfiable precisely because the script is static and the execution log fully reconstructs the decision path. These governance controls are necessary for agentic commerce where an agentic system interacts with the same application interfaces as an RPA bot, engaging the same access control and audit trail obligations. They are insufficient because the change management controls presuppose that logic changes occur at discrete, human-authorised moments: they have no mechanism for capturing or constraining logic changes that occur continuously during operation as a consequence of the system's own feedback processing.

Agentic commerce: structural profile. An agentic commerce system combines structural adaptation, cross-cycle feedback, broad decision authority, and mixed-context deployment. The system identifies a commercial objective supplied by a principal, decomposes that objective into a sequence of sub-tasks, executes transactions to advance those sub-tasks, revises the task plan in response to transactional outcomes, and carries the revised plan into subsequent operational cycles without per-cycle human confirmation [22]. The system may initiate payment service operations, negotiate commercial terms, select among counterparties, and commit the principal to contractual obligations, all within a single delegated session that may span hours or days [21].

The specific regulatory gaps that this profile produces against existing EU instruments are as follows. Against MiFID II: where an agentic commerce system operates on regulated instruments, the algorithmic trading log obligation does not capture rule-structure changes; the circuit-breaker mechanism does not engage on behavioural divergence; and the system's authorisation cannot be assessed against a strategy that has evolved since authorisation. Against PSD2: the strong customer authentication requirement (Article 97) applies at the moment of payment initiation; an agentic system that was authenticated at session initiation and subsequently executes multiple payments under the same session token operates outside the per-transaction authentication model that PSD2's fraud liability allocation assumes [19]. Against GDPR: the right to explanation under Article 22 for automated decision-making applies to decisions that produce legal or similarly significant effects; an agentic system that makes a sequence of individually minor commercial decisions that jointly produce a significant outcome distributes the decision across cycles in a way that may fall below the individual-decision threshold while exceeding it in aggregate. Against the EU AI Act: the risk-classification schema (Annex III) designates specific application domains as high-risk; an agentic commerce system that operates across domain boundaries, initiating a payment service to complete a commercial negotiation, may fall between the designated categories without clearly engaging the high-risk conformity assessment obligations in either [17].

The cross-domain deployment property of agentic commerce is the specific mechanism that produces regulatory seam failures. Each EU instrument is designed around a defined operational domain. A system that crosses domain boundaries within a single operational session engages the obligations of each domain at the point of crossing but may not satisfy any of them fully, because each instrument's obligations were designed for a system whose entire operation lies within the domain [20].

Why Existing Frameworks Fail to Contain Agentic Commerce

The regulatory gaps identified in the results section are not artefacts of drafting imprecision or enforcement under-investment. They arise from foundational assumptions embedded in the architecture of each instrument, assumptions that were appropriate for the automation categories those instruments were designed to govern and that are structurally incompatible with agentic commerce as defined in this paper.

The human-at-the-decision-boundary assumption. MiFID II's algorithmic trading provisions are constructed around the concept of a human principal who designs the algorithm, specifies its parameters, authorises its deployment, and bears responsibility for its market conduct. The human principal is not required to intervene at each order, but the system's behaviour is legally attributable to a design choice that the principal made at a determinable moment. Article 17(1)'s requirement that firms ensure their algorithmic trading systems are resilient and have adequate capacity presupposes that the firm has sufficient knowledge of what the system will do to assess those properties. When the system's decision logic adapts structurally across operational cycles, the knowledge required to make that assessment does not exist at any single moment; it would have to be reconstructed continuously, which the inspection and review intervals contemplated by the directive do not support [12].

PSD2's strong customer authentication framework rests on the same assumption in a payment context. The authentication event is designed to bind a specific human principal to a specific payment or category of payment at the moment of authorisation. The framework's liability allocation, which shifts liability from the payment service user to the provider when authentication is absent, presupposes that authentication establishes a causal link between principal intent and transaction execution. An agentic commerce system that receives a single authenticated delegation and subsequently executes multiple payment transactions across a session executes those transactions without the per-transaction causal link that the liability framework requires. The principal cannot be said to have authenticated each transaction, and the payment service provider cannot rely on the session-level authentication to satisfy per-transaction liability standards [19].

The static-logic auditability assumption. Every existing EU instrument that imposes audit or transparency obligations does so on the assumption that the system being audited operates according to a logic that can be reconstructed from its specification documents and execution logs. GDPR Article 22's right to explanation presupposes that an explanation of the decision logic exists in a form that can be provided to the data subject. The AI Act's transparency obligations for high-risk systems presuppose that the system's operation can be documented with sufficient completeness that a conformity assessment body can evaluate it. Arrieta et al. [1] document the fundamental challenge of explainability for systems whose decision logic is not fixed: the explanation of a decision made at time T by a system that has adapted its logic since deployment is an explanation of a logic that no longer fully exists in the form it had when the decision was made.

This creates an enforcement problem that is distinct from the technical difficulty of producing explanations for complex models. Even if the technical problem of post-hoc explanation were fully solved, the legal sufficiency of an explanation of a decision made by an earlier version of a system's logic, as the system has since revised that logic, is not established by any current EU instrument. The auditability assumption breaks not because auditing is technically impossible but because the reference document, the system specification against which the execution is compared, is not stable [15].

The single-domain intervention design assumption. MiFID II, PSD2, GDPR, and the AI Act each define their scope by reference to a specific operational domain, financial instruments markets, payment services, personal data processing, and AI system deployment respectively. The intervention mechanisms in each instrument, circuit-breakers, authentication requirements, data subject rights, conformity assessment obligations, are calibrated to systems whose operation lies entirely within the instrument's domain. A system that operates across domains does not simply face multiple regulatory obligations in sequence; it faces obligations that were designed to interlock within each domain and that may produce contradictory or overlapping requirements when applied to a cross-domain system simultaneously [17].

Consider a concrete structural example. An agentic commerce system that autonomously searches for a supplier, negotiates a purchase price, and initiates a payment crosses from unregulated commercial negotiation into PSD2-regulated payment initiation within a single operational session. The moment of domain crossing is not announced; it occurs when the system takes an action that constitutes payment initiation under PSD2's definitions. At that moment, PSD2's obligations engage retroactively in the sense that the session-level authentication that preceded the crossing must be evaluated against per-transaction authentication standards that apply from the moment of initiation onward. The system was not designed within the PSD2 framework; it was designed to pursue a commercial objective across whatever domains that objective required. The domain boundary is an artefact of the regulatory structure, not of the system's architecture, and the system has no mechanism for recognising or respecting it [22].

The reversibility assumption. A further assumption embedded in existing frameworks, less explicitly stated but operationally important, is that regulated decisions are reversible within a timeframe sufficient for supervisory intervention. MiFID II's circuit-breaker mechanism presupposes that ceasing trading is an effective response to a detected anomaly; the mechanism is premised on the reversibility of the market position. GDPR's right to object to automated decision-making presupposes that the decision can be suspended, reconsidered, and revised. Consumer protection frameworks presuppose that a consumer who objects to a transaction outcome can invoke a dispute mechanism that reverses or compensates for the outcome.

Agentic commerce disrupts the reversibility assumption in two ways. First, the system's decisions may produce contractual commitments before the transaction is visible to the principal in a form that allows exercise of oversight rights. A payment commitment made during an autonomous negotiation session may be legally binding before the session log is reviewed. Second, the system's cross-cycle adaptation means that an intervention that stops a particular decision does not prevent structurally similar decisions in future cycles, because the system will adapt its approach in response to the intervention itself [21]. The regulatory assumption that intervention at time T prevents recurrence at time T+n holds for static-logic systems and fails for structurally adaptive ones. Neither MiFID II, PSD2, nor the AI Act contains provisions that address this specific failure mode, and the absence of such provisions is not a drafting gap remediable by interpretive extension: the provisions would require a different model of what intervention achieves [3].

Opacity as an enforcement blocker. The interaction between structural adaptation and opacity warrants separate analysis because opacity is both a consequence of the system's architecture and the mechanism by which the foregoing failures become enforcement-resistant. Arrieta et al. [1] document the general challenge of explainability for adaptive AI systems. In the agentic commerce context, opacity has a specific operational manifestation: the system's decision logic at the moment a harm-producing transaction occurred cannot be fully reconstructed, which means that liability attribution requires an account of causation that the available evidence cannot provide. This is not a problem of insufficient data; it is a problem of evidence structure. The data that exists describes what the system did, not which version of its adapted logic produced the decision to do it. This structural opacity renders existing enforcement tools, audit rights, explainability mandates, liability attribution rules, operationally ineffective in the specific case where they are most needed [20].

Agentic Commerce as a Regulatory Category: Structural Findings and Implications

The analysis presented in this paper establishes that agentic commerce differs from algorithmic trading and robotic process automation in four structural dimensions: scope of decision authority extending across transaction types and sessions, structural adaptation of decision logic during operation, cross-cycle feedback that carries prior outcomes into the revision of objectives, and mixed-context deployment that crosses regulatory domain boundaries within a single operational session. None of these properties individually constitutes the category-defining difference; the combination of structural adaptation and cross-cycle feedback is the necessary and sufficient condition, because their conjunction produces decision logic that changes during operation in ways that were not determinable at the moment of authorisation.

Existing EU regulatory instruments, specifically MiFID II, PSD2, GDPR, and the EU AI Act, are necessary for agentic commerce governance in the sense that agentic systems do engage their operative provisions at specific transactional moments. They are structurally insufficient in the sense that their foundational assumptions, a human at the decision boundary, static-logic auditability, single-domain intervention design, and decision reversibility within supervisory timeframes, fail when applied to systems that exhibit structural adaptation and cross-cycle feedback. These failures are not remediable through doctrinal extension or enforcement investment directed at the existing instrument architecture; the instruments would require redesign around different foundational assumptions to address them.

Acknowledging agentic commerce as a distinct regulatory category enables specific policy advances that the current categorical ambiguity forecloses. The first is a technology-neutral behavioural threshold test. A threshold test that classifies a system as agentic commerce when it combines structural adaptation with autonomous payment execution across at least two decision cycles without human confirmation provides a binary classification with observable inputs. Such a test could be authored by a technical standards body, with national competent authority ratification authority, and would give operators a clear compliance target that does not depend on the domain-specific definitions of any single existing instrument.

The second enabled policy advance is a macro-prudential oversight layer calibrated to aggregate transaction volume rather than individual-system risk classification. Individual-system risk classification, the model used by the AI Act, cannot detect systemic risk that arises from the coordinated behaviour of multiple agentic systems each of which individually falls below a risk threshold. A macro-prudential layer that treats aggregate agentic commerce market penetration, measured in transaction volume terms, as a systemic variable would trigger enhanced audit obligations, liability attribution protocols, and cross-border coordination requirements at defined penetration thresholds. This layer is structurally similar to the financial stress-testing frameworks applied to systemically important financial institutions, with the critical difference that the unit of measurement is market penetration by a system architecture rather than the balance sheet of a single institution.

The third enabled advance is a workable liability attribution rule for emergent, multi-agent behaviour. Current EU product liability doctrine and financial services liability frameworks both presuppose an identifiable human decision at the origin of the harm pathway. A liability framework calibrated to agentic commerce would need to assign responsibility at the level of the system's design principles, including its adaptation mechanisms, rather than at the level of any specific decision produced by those mechanisms. Developer accountability for emergent behaviour, rather than accountability only for specified behaviour, is the structural shift that this attribution rule would require.

These three policy vectors cannot be activated without the categorical clarity that this paper's analysis provides. Regulatory design for a category whose structural properties are undefined will either over-specify, by importing the assumptions of an adjacent category, or under-specify, by treating the category as a residual outside existing classifications. Both outcomes produce the enforcement gaps that current deployment patterns exploit. The analytical contribution of this paper is to specify the structural properties precisely enough that prescriptive regulatory work has a grounded foundation from which to proceed.

Boundaries of This Analysis

No prescriptive governance solutions are offered. This paper identifies structural regulatory gaps but does not design the instruments required to close them. The threshold test and macro-prudential overlay described in the conclusion are named as logical implications of the analysis, not as fully specified proposals. The evidentiary gap is that instrument design requires extensive stakeholder consultation, technical standards development, and jurisdictional harmonisation analysis that falls outside the scope of a conceptual boundary analysis.
No technical implementation detail is examined. The paper characterises agentic commerce systems at the level of their structural properties, structural adaptation, cross-cycle feedback, and mixed-context deployment, without examining the specific technical architectures that produce those properties. Different large language model architectures, tool-use frameworks, and multi-agent coordination protocols may produce the same structural properties through different mechanisms; the paper cannot adjudicate among those mechanisms because doing so would require empirical analysis of specific deployed systems, access to which is not available for this analysis.
The analysis is limited to the EU regulatory framework. No comparative international analysis is conducted. The United States, United Kingdom, Singapore, and other jurisdictions with significant AI and FinTech regulatory activity may have developed instruments or interpretive approaches that partially address the gaps identified here. The evidentiary gap is the absence of a systematic cross-jurisdictional review of operative provisions rather than stated policy intentions.
Sectoral variation within agentic commerce is not resolved. The differentiation matrix is applied at the level of the three categories as wholes. Within agentic commerce, there is variation in risk profile across sectors: agentic systems operating in retail consumer payments present different harm pathways from those operating in business-to-business procurement or financial instrument markets. This paper does not produce a within-category risk taxonomy, and the policy implications stated in the conclusion may require sectoral refinement before they can be operationalised.
The analysis rests on structural rather than empirical evidence. The regulatory gaps identified are derived from comparison of structural system properties against operative regulatory provisions, not from empirical observation of harm events attributable to agentic commerce deployments in EU-regulated markets. Field evidence of actual harm pathways would strengthen the case for the specific gaps identified and might reveal additional gaps or close some that the structural analysis identifies.

Vectors for Extended Analysis

Four specific research directions would extend the boundary analysis in this paper toward prescriptive frameworks.

Empirical harm pathway mapping. The structural gaps identified require empirical grounding in documented harm events. A systematic review of national competent authority enforcement actions, financial ombudsman decisions, and consumer dispute records in EU member states, coded against the structural criteria developed in this paper, would establish whether the gaps produce observable harm at current deployment scales. The specific data instrument required is a coded incident database drawn from public regulatory enforcement records across at least five EU jurisdictions over a defined period.

Comparative jurisdictional analysis. A structured comparison of operative regulatory provisions in the United States (SEC, CFPB, and FTC frameworks), United Kingdom (FCA and CMA regulatory positions on AI), and Singapore (MAS Technology Risk Management Guidelines and AI governance frameworks) against the EU instruments examined here would identify whether any jurisdiction has developed threshold tests or liability attribution rules that address the structural gaps this paper identifies. The method required is systematic provision-by-provision comparison against the four structural criteria, not high-level policy intent comparison.

Technical architecture classification. A collaboration with system developers and deployment operators to map specific large language model tool-use architectures against the structural criteria, particularly Criterion 2 (modification without human intervention) and Criterion 3 (feedback-loop architecture), would establish whether the classification decision rule is operationally implementable by regulators without access to proprietary model weights. The instrument required is a standardised technical disclosure template that elicits the observable properties the criteria require.

Macro-prudential threshold modelling. Establishing the transaction volume threshold at which aggregate agentic commerce market penetration becomes a systemic variable requires modelling of contagion pathways specific to agentic commerce, as distinguished from those applicable to algorithmic trading. The data required includes payment system transaction volume data from the European Central Bank's payment statistics, disaggregated by transaction initiation type, over a time series sufficient to establish penetration trajectory.

References

[1] Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., & Barbado, A. (2019). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Elsevier BV.

[2] Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions. Springer Nature.

[3] Dwivedi, Y. K., Hughes, L., Ismagilova, E., Aarts, G., Coombs, C., & Crick, T. (2019). Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. Elsevier BV.

[4] Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E., Jeyaraj, A., & Kar, A. K. (2023). Opinion Paper: "So what if ChatGPT wrote it?" Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Elsevier BV.

[5] Duan, Y., Edwards, J. S., & Dwivedi, Y. K. (2019). Artificial intelligence for decision making in the era of Big Data: evolution, challenges and research agenda. Elsevier BV.

[6] Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. O., & Kaplan, J. et al. (2021). Evaluating Large Language Models Trained on Code. Cornell University.

[7] Rahwan, I., Cebrián, M., Obradovich, N., Bongard, J., Bonnefon, J.-F., & Breazeal, C. (2019). Machine behaviour. Nature Portfolio.

[8] Feuerriegel, S., Hartmann, J., Janiesch, C., & Zschech, P. (2023). Generative AI. Springer Nature.

[9] Enholm, I. M., Papagiannidis, E., Mikalef, P., & Krogstie, J. (2021). Artificial Intelligence and Business Value: a Literature Review. Springer Science+Business Media.

[10] Li, K., Kim, D. J., Lang, K. R., Kauffman, R. J., & Naldi, M. (2020). How should we understand the digital economy in Asia? Critical assessment and research agenda. Elsevier BV.

[11] Benbya, H., Pachidi, S., & Järvenpää, S. L. (2021). Special Issue Editorial: Artificial Intelligence in Organizations: Implications for Information Systems Research. Association for Information Systems.

[12] Herath Pathirannehelage, S., Shrestha, Y. R., & von Krogh, G. (2024). Design principles for artificial intelligence-augmented decision making: An action design research study. Palgrave Macmillan.

[13] OECD. (2020). Advancing the Digital Financial Inclusion of Youth. Organisation for Economic Cooperation and Development.

[14] Darolles, S. (2016). The rise of FinTechs and their regulation. Université Paris-Sud.

[15] Brohi, S. N., Mastoi, Q., Jhanjhi, N. Z., & Pillai, T. R. (2025). A Research Landscape of Agentic AI and Large Language Models: Applications, Challenges and Future Directions. Multidisciplinary Digital Publishing Institute.

[16] Nwafor, K. C., & Ayodele, E. A. (2024). Regulatory Challenges and Innovations in Financial Technology: Safeguarding Against Fraud While Maximizing ROI. International Journal of Research Publication and Reviews.

[17] Pastor Sempere, M. C. (2025). Governance and Control of Data and Digital Economy in the European Single Market. Springer International Publishing.

[18] Zafar, A. (2025). Quantum Computing in Finance: Regulatory Readiness, Legal Gaps, and the Future of Secure Tech Innovation. Cambridge University Press.

[19] Prayuti, Y., Lany, A., Marpaung, Y. E., & Lorentzon, E. (2024). Legal Protection of Consumers from Personal Data Security Risks, Threats of Fraud and Phishing (Cybercrime) in E-Wallet Payment Systems. Unram Law Review.

[20] Sardana, A., Sethuraman, S., & Kalyanasundaram, P. D. (2024). Compliance-as-Code 2.0: Orchestrating Regulatory Operations with Agentic AI. Journal of Artificial Intelligence General Science (JAIGS).

[21] Balaskas, S. (2026). From Recommendations to Delegation: A Systematic Review Mapping Agentic AI in E-Commerce and Its Consumer Effects. Inf.

[22] Dusad, K. (2025). Agentic Commerce: The Paradigm Shift from Human-Mediated to Autonomous AI-Driven Transactions in Digital Payment Systems. International Journal of Computational and Experimental Science and Engineering.