The foundation model dependency

*European agentic commerce is building its reasoning layer on US-hosted infrastructure, and the terms of that dependency have not yet been set by European law, European operators, or European capability.*

Context

Agentic AI systems moved from research prototypes to commercial deployment at speed between 2023 and 2025. In the commerce sector, the transition is visible in automated customer resolution, dynamic inventory management, and AI-mediated procurement workflows that replace rule-based automation with LLM-driven planning. The regulatory frame that governs this transition (the EU AI Act, GDPR, and the emerging GPAI Code of Practice) was designed against a landscape in which AI was a tool operated by a human, not an autonomous planning agent executing multi-step tasks with access to financial and personal data [22].

At the same moment, European technology sovereignty emerged as an explicit policy priority. The Russian invasion of Ukraine in 2022 accelerated EU thinking about strategic dependence on external infrastructure [19], and the 2023 AI Act debate embedded sovereignty language into the legislative record. What the policy debate has not resolved is the distance between the aspiration and the technical baseline: the European foundation model ecosystem, while growing, has not produced a model that carries production agentic commerce workloads at the capability level of the frontier US providers [23]. The visibility of the gap is therefore a function of deployment pace: as agentic systems move from pilot to production, the reasoning layer dependency that was previously a research architecture choice becomes an operational and regulatory obligation.

At a Glance

INFERENCE LOCK-IN Migrating the core LLM reasoning layer in a production compound agentic system carries costs beyond raw capability differentials: each migration requires re-embedding proprietary data, rebuilding evaluation pipelines, and absorbing throughput regressions during transition, because fan-out overhead and cold-start cascade penalties accumulate across interdependent sub-agent calls tied to a specific provider's infrastructure [21].
REGULATORY EXPOSURE GPAI obligations under the EU AI Act applied from August 2024; Annex III high-risk obligations apply from August 2026. Both tiers apply regardless of where the model is hosted. GDPR Article 22 attaches to any solely automated decision with legal or significant effects on an individual, a trigger defined by the decision type rather than hosting geography; the data-residency problem is a separate Chapter V question, addressed in practice through contractual arrangements rather than by any framework-level guarantee the provider is bound to offer [22].
ECOSYSTEM DEPTH GAP Commerce-domain fine-tuning datasets, plugin-market depth, and developer toolchains concentrated in US LLM ecosystems constitute switching costs that persist even if a European model reaches parameter parity [6].
NARROW DOMESTIC PROOF European language-specific fine-tuning (e.g., GaMS3-12B on Slovene tasks) shows localised outperformance against frontier US models, but the result does not extend to multilingual, multi-step commercial reasoning workloads [23].

How Agentic Systems Inherit US Infrastructure

An agentic commerce system is not a single model call. It is a compound architecture: a reasoning layer that plans and decomposes tasks, an orchestration layer that routes sub-tasks to specialised tools or APIs, and an execution layer that interacts with commerce infrastructure (inventory, payment, fulfilment). The reasoning layer is the load-bearing component. It maintains context across multi-step plans, resolves ambiguity in tool outputs, and generates the decision logic that downstream layers execute. Current production deployments concentrate this reasoning function in large foundation models because no other component class provides the combination of general-purpose language understanding, tool-use capability, and in-context planning required [20].

The dependency deepens through the mechanics of scale. Production compound AI systems exhibit fan-out patterns in which a single user request spawns multiple parallel sub-agent calls, each carrying its own inference overhead [21]. Cold-start latency penalties accumulate across the chain. Fine-tuning histories and retrieval-augmented context accumulated on a specific provider's infrastructure are not portable: they are tied to the weight checkpoints, embedding spaces, and API contracts of the host provider. European commerce operators who have accumulated domain-specific adaptation on a US-hosted model face a migration cost that is not limited to retraining compute. It includes re-embedding proprietary data, reconstructing evaluation pipelines, and absorbing a latency and throughput regression during transition.

The architecture is not, in principle, monolithic. Ecosystem theory establishes that modular infrastructure allows capable components to emerge and substitute at discrete layers without requiring end-to-end replacement [6]. An agentic system that cleanly separates its reasoning layer from its orchestration and execution layers is, structurally, more amenable to component substitution than one in which the foundation model's API shapes all downstream interfaces. In practice, however, the degree of modularity in current production deployments varies: providers who supply both the reasoning layer and the surrounding toolchain (plugin registries, embedding APIs, evaluation frameworks) create interface dependencies that resist component-level substitution even where the model weights themselves could in principle be replaced.

European foundation model development has produced capable smaller models at the language-specific level [23], but no European model has been demonstrated at the parameter scale, multi-modal breadth, and production throughput required by the agentic commerce workloads described in deployment studies [21]. The domestic proof of localised outperformance on single-language tasks has not extended to the multilingual, multi-step commercial reasoning workloads that constitute the core agentic commerce use case. The gap is structural, not merely one of elapsed time.

Key References

Drafter's note: Sources [20], [21], and [22] are arXiv preprints carrying 2026 identifiers (January, April, and March 2026 respectively). They have not undergone formal peer review and cannot be independently verified at the time of drafting. Load-bearing technical and regulatory claims drawn from these sources are presented at the level of structural argument; they should be treated as documented positions requiring supervisory or peer-reviewed corroboration before use in compliance submissions.

Wei et al. (2026), Agentic Reasoning for Large Language Models [20]: establishes the architectural role of LLMs as the planning and reasoning core in multi-step agentic systems, grounding the dependency in technical structure rather than conjecture.
Prasad S & Arora (2026), Scalable Inference Architectures for Compound AI Systems [21]: documents fan-out overhead, cold-start cascade penalties, and the infrastructure constraints that make the LLM reasoning layer the primary point of lock-in in production deployments.
Zhang & Maharjan (2026), Security, Privacy, and Agentic AI in a Regulatory View [22]: maps the EU regulatory obligations that apply to agentic systems and identifies the gaps between current compliance frameworks and the realities of non-EU-hosted reasoning layers.
Edler et al. (2023), Technology Sovereignty as an Emerging Frame for Innovation Policy [18]: provides the analytical vocabulary for distinguishing technology sovereignty as a policy goal from the specific instruments available to achieve it, essential context for evaluating European LLM strategy.

Regulatory and Operational Consequences

Four concrete consequences follow from the architecture described above, each distinct in character and timeline.

Governance accountability. The EU AI Act classifies certain agentic decision-making functions (credit decisions, product eligibility, personalised pricing) under Annex III high-risk categories, with full obligations applying from August 2026 [22]. GPAI obligations, which govern general-purpose model providers, applied from August 2024. A European commerce operator whose reasoning layer runs on a US-hosted foundation model must obtain from that provider the technical documentation, logging access, and conformity evidence required to satisfy NCA audits. Whether US providers are obligated to maintain EU-compatible audit trails at the granularity the Act requires is an open regulatory risk position: no authoritative supervisory guidance or Commission decision has established the applicable threshold for non-EU-hosted GPAI models, and compliance currently depends on the terms individual operators negotiate rather than on any framework-level obligation the provider is bound to observe [22].

Inference cost and margin structure. Agentic commerce workloads generate inference costs that scale with task complexity and fan-out depth, not with transaction volume alone [21]. European operators pricing in euros and subject to EU consumer-protection constraints on automated pricing cannot easily pass through inference cost volatility driven by US provider pricing decisions. The provider, by contrast, sets inference pricing in a market where the majority of its customers are US-denominated.

Model control and output accountability. When the foundation model is updated, deprecated, or fine-tuned differently by the provider, the behaviour of downstream commerce agents changes without operator action. European operators cannot compel US providers to maintain version stability for regulatory compliance purposes, and no EU instrument currently imposes such an obligation on non-EU model hosts.

GDPR and data residency. Training data, inference inputs, and intermediate reasoning traces passing through a US-hosted model may constitute personal data transfers subject to GDPR Chapter V. Contractual data-residency arrangements (Standard Contractual Clauses, EU-soil compute commitments) are the current practical instrument, but their adequacy when inference itself occurs outside EU jurisdiction awaits authoritative supervisory guidance [22].

European agentic commerce cannot resolve its foundation model dependency through regulatory declaration alone. The EU AI Act establishes accountability obligations that require auditability, version stability, and data-residency guarantees from the reasoning layer. Until a European model achieves the parameter scale, multi-modal capability, and production-ecosystem depth to carry that layer in commerce-grade deployments, operators face a specific set of compounding exposures: a contractual compliance arrangement with a US provider whose terms Europe cannot set by law and whose continuity depends on the provider's commercial and jurisdictional circumstances; a capability gap that no currently operating European AI consortium has demonstrated the scale to close; and an absence of any EU instrument that could compel a non-EU model host to maintain the version stability, audit-log granularity, or inference-residency guarantees that Annex III obligations will require from August 2026.

Counterpoint

The Case for Incremental European Development

The strongest opposing position holds that the sovereignty gap is neither permanent nor structurally irreversible, and that forced redundancy (requiring European operators to adopt immature domestic alternatives ahead of commercial readiness) imposes costs that exceed those of managed dependency on US providers under contractual residency arrangements.

The empirical basis for this position is not negligible. European-language-specific fine-tuning of smaller foundation models has produced localised benchmark results that outperform frontier US models on tasks relevant to their target populations [23]. Ecosystem theory establishes that modularity in AI infrastructure allows capable components to emerge without hierarchical coordination [6], meaning a European model need not replicate the full US provider stack to address discrete commerce functions. The strategic alliance literature further establishes that accessing external knowledge through structured partnership preserves flexibility that ownership forecloses [8]: a European commerce operator that negotiates strong contractual terms with a US provider (EU-soil inference, version-stability commitments, audit-log access) may achieve a compliance posture functionally equivalent to domestic hosting without bearing the fixed cost of frontier model development.

This position is serious and deserves direct engagement. Its structural weakness lies in the exposure it accepts: contractual residency arrangements are contingent on the provider's willingness to maintain EU-compatible terms, which is in turn contingent on the regulatory and commercial environment the provider faces in its home jurisdiction. A US export-control measure or a provider commercial restructuring can dissolve the arrangement without European operators or regulators having any instrument of recourse. The counterpoint's logic holds only if the geopolitical environment remains stable enough for contractual protections to retain their value, a condition the current record does not confirm [19].

Unresolved Questions

At what specific technical layer (model weights, API contract structure, fine-tuning history, or inference infrastructure) does migration cost peak for a European commerce operator, given that no empirical cost study in the current record documents this with sufficient granularity to answer?
Whether EU AI Act GPAI obligations (August 2024 application date) and Annex III high-risk obligations (August 2026) can be satisfied by contractual data-residency arrangements with US-hosted providers, or require domestic model hosting, awaits authoritative guidance from the Commission or a national competent authority.
Whether any European foundation model currently in deployment (Mistral, Aleph Alpha, or a GaMS-derivative scaled beyond 12B parameters) meets the latency, throughput, and multi-modal thresholds required for production agentic commerce workloads awaits public benchmarking evidence.
The degree to which plugin-market depth and commerce-domain fine-tuning datasets in the US LLM ecosystem constitute switching costs independent of raw model capability is contested in the ecosystem literature [6] and lacks direct empirical measurement in the agentic commerce context.
Whether geopolitical escalation (export controls or cloud-access restrictions analogous to semiconductor restrictions) constitutes a quantifiable operational risk for European commerce operators has no published risk-modelling counterpart in the current corpus.

Sources

[6] Jacobides, M. G., Cennamo, C., & Gawer, A. (2018). Towards a theory of ecosystems. Wiley.

[8] Grant, R. M., & Baden-Fuller, C. (2003). A Knowledge Accessing Theory of Strategic Alliances. Wiley.

[18] Edler, J., Blind, K., Kroll, H., & Schubert, T. (2023). Technology sovereignty as an emerging frame for innovation policy. Elsevier BV.

[19] Helwig, N. (2023). EU Strategic Autonomy after the Russian Invasion of Ukraine. Wiley.

[20] Wei, T., Li, T.-W., Liu, Z., Ning, X., Yang, Z., & Zou, J. (2026). Agentic Reasoning for Large Language Models. arXiv (preprint, unreviewed).

[21] Prasad S, S., & Arora, U. (2026). Scalable Inference Architectures for Compound AI Systems: A Production Deployment Study. arXiv (preprint, unreviewed).

[22] Zhang, S., & Maharjan, S. (2026). Security, privacy, and agentic AI in a regulatory view. arXiv (preprint, unreviewed).

[23] Vreš, D., Arčon, T., Petrič, T., Vajda, D., Robnik-Šikonja, M., & Lebar Bajec, I. (2026). Building a Strong Instruction Language Model for a Less-Resourced Language. arXiv (preprint, unreviewed).