From Chips to Agents: The New Shape of the AI Stack
AI is rapidly evolving from a layered technology stack into an integrated “AI factory” system, where compute, models, and applications converge to drive real economic output.
The artificial intelligence (AI) stack is no longer just a conceptual framework, it has become a strategic battleground. In recent speeches and keynotes, particularly at NVIDIA’s GTC events, Jensen Huang has reframed the stack not as isolated layers but as an integrated “AI factory” system, where compute, models, and applications converge into a single economic engine. This shift is critical to understanding how value is being created, and captured, across the AI ecosystem today.
At the foundation lies energy, which is responsible for supplying the raw computational power and electricity needed to train and run AI models. It includes GPUs, chips, and data centres, the raw horsepower behind AI. Utilities and energy providers such as NextEra Energy, Duke Energy, and National Grid supply the massive and increasingly critical electricity required to power data centres. As AI scales, this layer becomes a bottleneck not just in terms of chip availability but also energy consumption, making efficiency, power infrastructure, and sustainable energy sources central to the future of AI growth.
The infrastructure layer sits above raw energy and turns individual chips into scalable, usable AI systems by integrating servers, storage, networking, and cooling into full data centre environments. This layer is responsible for orchestrating massive clusters of GPUs so they can work together efficiently, enabling the training of large models and high-throughput inference. Companies like Supermicro, Dell Technologies, and Hewlett Packard Enterprise build and deploy AI-optimized servers and racks, while NVIDIA (with DGX systems and InfiniBand networking) and Cisco provide high-speed interconnects that allow thousands of chips to function as a single system. Together, this layer ensures reliability, scalability, and performance, effectively transforming raw compute into production-ready AI infrastructure.
The compute layer is built on infrastructure and is responsible for delivering the raw processing power that runs AI workloads, aggregating chips into high-performance computing environments capable of handling large-scale training and real-time inference. This includes GPU clusters, supercomputers, and AI-optimized compute platforms that execute complex mathematical operations across vast datasets. Companies like NVIDIA lead with integrated compute ecosystems (GPUs, CUDA software, DGX systems), while Advanced Micro Devices and Intel provide alternative compute architectures. Additionally, hyperscalers such as Amazon Web Services, Microsoft Azure, and Google Cloud operationalize this compute at scale by offering on-demand access to massive GPU clusters. This layer effectively transforms hardware into usable computational capacity, determining how fast, scalable, and cost-efficient AI systems can be deployed.
The model layer is where raw compute and infrastructure are transformed into usable intelligence, consisting of large pretrained systems that can understand, generate, and reason across text, images, and other data types. Organizations like OpenAI, Google DeepMind, Meta, and Anthropic are actively working towards the development of more sophisticated models powering AI . In fact, Huang highlighted the rise of agentic AI, systems that do not just respond to prompts but execute tasks, signalling a shift from passive intelligence to active digital labour. Companies develop these foundation models by training them on massive datasets using advanced architectures and techniques. These models act as reusable “engines” that developers can fine-tune or build upon, dramatically reducing the time and cost required to create AI applications, while also becoming a key competitive layer where performance, safety, and proprietary data create differentiation.
The application layer is where AI capabilities are translated into real-world products that users and businesses interact with directly, turning underlying models into practical tools that solve specific problems or enhance productivity. Companies like OpenAI (with ChatGPT), Microsoft (with Copilot integrations), Adobe (with generative design tools), and Notion Labs embed AI into workflows across writing, coding, design, and business operations. This layer focuses on usability, distribution, and user experience, and it captures the most visible value in the stack by directly monetizing AI through subscriptions, enterprise software, and consumer applications.
However, the strategic implication is clear: the boundaries between layers of the AI stack are collapsing. NVIDIA’s push to “own the full stack”, from training and inference to deployment and tooling, illustrates a broader consolidation trend across the industry. What was once a modular ecosystem is becoming vertically integrated, with a few dominant players attempting to control multiple layers simultaneously. In this new paradigm, the winners may not be those with the best models or applications alone, but those who can orchestrate the entire stack as a unified system of production, distribution, and intelligence.
In essence, Huang’s vision suggests that the AI stack is evolving into something closer to an operating system for the global economy, where compute is the engine, models are the intelligence, and applications are merely interfaces to a much deeper, interconnected system.