As AI accelerators merge with consumer GPUs and CPUs gain dedicated NPUs, hardware is changing faster than any time since the 3D graphics revolution. We spoke with Jake Morrison, a senior silicon engineer, about what these changes mean for everyday PC builders.

Introduction

For over 25 years, Computer Heaven has brought you insights from the bleeding edge of PC hardware. Today, we’re incredibly fortunate to sit down with Jake Morrison, a veteran in the silicon trenches. At 42, Jake boasts 15 years of experience as a Senior Silicon Engineer, having contributed to the design of both GPUs and SoCs at some of the industry’s most prominent chipmakers. His deep understanding of microarchitecture, memory subsystems, and power delivery stems from direct involvement in bringing complex chips from concept to production.

In an era where AI is rapidly reshaping the landscape of computing, and hardware innovation seems to accelerate daily, Jake’s perspective is invaluable. He offers a rare glimpse into the engineering challenges and strategic decisions that define the components we enthusiasts scrutinize and cherish. From the intricacies of GDDR7 to the future of chiplet designs and the contentious 16-pin power connector, Jake pulls back the curtain on what truly matters in silicon design. Prepare for a frank, technically detailed, and opinionated discussion that will undoubtedly reshape your understanding of modern PC hardware.

Q: How significantly has the rise of AI influenced fundamental GPU and CPU design over the last few years?

The impact of AI on both GPU and CPU design has been nothing short of transformative, moving beyond just adding dedicated “AI cores” to fundamentally altering architectural priorities. For GPUs, it’s pushed the envelope on floating-point precision mixtures. While FP32 is still crucial for many rendering tasks, AI training and inference often leverage FP16, BF16, and even FP8 or FP4. This necessitates specialized tensor cores or matrix multiplication units (like NVIDIA’s Tensor Cores or AMD’s AI Accelerators) capable of efficient mixed-precision arithmetic. We’re talking about massive throughput increases for these specific operations, often measured in hundreds of TFLOPS for FP16, which significantly outstrips FP32 performance. This isn’t just about adding more ALUs; it’s about optimizing data paths, cache hierarchies, and memory controllers to feed these hungry compute units with vast amounts of data quickly.

Industry-wide coverage of the chiplet revolution is also tracked at i-actu (broader tech news from France), with weekly summaries that complement Jake’s design-engineer perspective.

Jake’s emphasis on memory bandwidth lines up with what we found in our own buyer’s guide to the best CPUs for gaming and productivity workloads.

On the CPU side, the influence is more nuanced but equally profound. While GPUs excel at parallel matrix operations, CPUs are increasingly incorporating vector extensions and dedicated blocks for AI inference. Intel’s AMX (Advanced Matrix Extensions) in their Xeon line and upcoming consumer chips, or AMD’s VNNI (Vector Neural Network Instructions) are prime examples. These instructions accelerate common AI operations like dot products and convolutions directly on the CPU, making it more efficient for certain lightweight inference tasks, especially when data locality is critical or when a discrete GPU isn’t present. Furthermore, the overall memory subsystem design is evolving to better handle the irregular memory access patterns common in AI workloads, with larger caches and more sophisticated prefetchers. My opinion is that this trend will only intensify; future CPUs will feature ever more robust integrated AI capabilities, blurring the lines for many common AI tasks.

Q: We’re seeing more dedicated NPUs (Neural Processing Units) in consumer CPUs. Do you see NPUs eventually eclipsing GPUs for general consumer AI tasks, or will GPUs maintain their dominance?

NPUs and GPUs serve distinct, albeit sometimes overlapping, roles in the AI landscape, and I firmly believe GPUs will maintain their dominance for general consumer AI tasks that demand high throughput and flexibility. NPUs, like those found in Intel’s Core Ultra ‘Meteor Lake’ or AMD’s Ryzen 8040 series, are purpose-built for extreme efficiency at specific, often lower-precision, AI inference tasks. They excel at “always-on” or “bursty” workloads like background noise suppression, real-time video effects, local LLM inference for system features, or even gaze detection, typically operating within a tight power budget (often under 10-15W). Their strength lies in their specialized instruction sets and optimized data paths for neural network operations, often delivering tens of TOPS (Trillions of Operations Per Second) for INT8 or INT4 workloads with remarkable power efficiency.

However, GPUs, particularly discrete GPUs, offer orders of magnitude more raw compute power and memory bandwidth. A high-end GPU like an NVIDIA RTX 4090 or AMD RX 7900 XTX can deliver hundreds of TOPS at INT8, and importantly, they offer the flexibility to run a vast array of AI models, from complex image generation (Stable Diffusion) to large language model inference (LLMs with billions of parameters), and even local AI training. They are not restricted to specific network topologies or precision levels in the same way an NPU might be. My opinion is that NPUs will become indispensable for making our everyday devices smarter and more reactive, handling the low-latency, low-power AI tasks that enhance user experience without taxing the main CPU or GPU. But when it comes to the heavy lifting – the demanding creative AI applications, high-resolution upscaling, or complex local LLMs – the sheer parallel processing capability and memory bandwidth of a discrete GPU will remain unchallenged in the consumer space for the foreseeable future.

Q: GDDR7 memory is on the horizon. What are the key technical advancements it brings, and how critical is it for next-gen GPUs?

GDDR7 is absolutely critical for next-gen GPUs, and it represents a significant leap forward in memory technology, moving beyond incremental speed bumps. The headline feature is the massive increase in per-pin bandwidth, targeting up to 36 Gbps initially, potentially scaling higher. This is achieved primarily through a shift from GDDR6’s NRZ (Non-Return-to-Zero) signaling to PAM-3 (Pulse Amplitude Modulation 3-level) signaling. PAM-3 encodes three bits per two clock cycles by using three voltage levels (e.g., -1, 0, +1), effectively transmitting 1.5 bits per clock cycle per pin, compared to NRZ’s one bit. This allows for a substantial increase in effective data rate without proportionally increasing the clock frequency, which helps manage signal integrity and power consumption.

Beyond raw speed, GDDR7 also brings improvements in power efficiency, with reduced VDD/VDDQ voltages (e.g., 1.1V from GDDR6’s 1.25V), which translates to lower power consumption per bit transferred. Error correction features, such as on-die ECC, are also being enhanced to ensure data integrity at these extreme speeds, a necessity as bit error rates can increase with higher frequencies and denser signaling. For a high-end GPU with a 384-bit memory bus, 36 Gbps GDDR7 could enable a staggering 1.7 TB/s of bandwidth, a 50% increase over a GDDR6X setup at 24 Gbps. My strong opinion is that without GDDR7, the performance of future GPUs would be severely bottlenecked. As compute capabilities continue to scale exponentially with each generation, particularly with the demands of 4K+ gaming, ray tracing, and AI workloads, feeding these massive processing arrays requires an equally massive and ever-increasing amount of bandwidth. GDDR7 isn’t just nice to have; it’s a fundamental requirement to unlock the full potential of next-generation silicon.

Q: With the push for local AI inference, what kind of impact do you foresee on consumer hardware requirements in the next 3-5 years?

The push for local AI inference will profoundly reshape consumer hardware requirements over the next 3-5 years, making dedicated AI accelerators as commonplace as integrated graphics. We’re already seeing the beginnings with NPUs in CPUs, but this will accelerate dramatically. The most immediate impact will be on memory: not just quantity, but also speed and bandwidth. Running sophisticated LLMs locally, even quantized versions, requires significant RAM. A 7B parameter model might need 8-12GB of system RAM, while a 70B model could demand 64GB or more, even with heavy quantization. This will drive up the minimum RAM configuration for many systems and push for faster DDR5/DDR6 modules.

Modern server rack in a data center corridor

Beyond memory, the integration of increasingly powerful NPUs into mainstream CPUs will be critical. These NPUs will evolve to handle a wider range of AI models and higher throughput, moving from tens to hundreds of TOPS, becoming the primary engine for ambient AI features, operating system enhancements, and light creative tasks. Discrete GPUs will also see their AI capabilities further refined, with more powerful tensor cores and larger VRAM pools (e.g., 24GB+ becoming standard for high-end, 16GB for mid-range) to handle more demanding generative AI tasks like high-resolution image and video synthesis, or complex local LLM fine-tuning. My opinion is that by 2028, a system without a powerful NPU will feel as outdated as a system without a multi-core CPU does today. The operating system itself will become an “AI-first” environment, offloading countless background tasks to these dedicated accelerators, demanding a holistic approach to AI hardware integration across the CPU, GPU, and NPU.

Q: Memory bandwidth often gets overshadowed by capacity or clock speed in enthusiast discussions. From an engineering perspective, how underrated is its importance, especially with modern workloads?

Memory bandwidth is, without a doubt, one of the most underrated metrics in enthusiast discussions, and from an engineering perspective, it’s frequently the primary bottleneck for modern workloads. While capacity dictates how much data you can store and clock speed influences latency, bandwidth determines how fast that data can be moved between the processing units (CPU/GPU) and the memory modules. Modern applications, be it high-resolution gaming with complex textures and effects, real-time video editing, scientific simulations, or especially AI training/inference, are incredibly data-hungry. They constantly require fetching and writing vast amounts of data.

Consider a GPU: it might have thousands of shaders, but if the memory interface can’t feed them data fast enough, those shaders sit idle, waiting. This is why HBM (High Bandwidth Memory) has become crucial for high-performance computing and professional GPUs, offering significantly more bandwidth (e.g., HBM3 hitting over 5 TB/s per stack) than even the fastest GDDR, despite often having lower clock speeds. Similarly, on the CPU side, tasks like database operations, large data analytics, or even certain gaming scenarios can become severely limited if the system RAM bandwidth isn’t sufficient. You can have the fastest CPU cores and a massive cache, but if the main memory bus can’t keep up, you’re essentially starving the processing units. My strong opinion is that system builders often prioritize maximum core count or clock speed over ensuring adequate memory bandwidth, leading to situations where expensive CPUs or GPUs are performing below their potential. For demanding users, investing in faster RAM with tighter timings, or understanding the benefits of wider memory buses, is often a more impactful upgrade than another 100MHz on the CPU.

Q: With GPUs pushing 450W+ and CPUs approaching 300W, power delivery and PSU quality have become paramount. What aspects of PSU design are most critical for stability and longevity in such high-power systems?

The escalating power demands of modern CPUs and GPUs have undeniably thrust power delivery and PSU quality into the spotlight. For stability and longevity in high-power systems, several aspects of PSU design are absolutely critical. First and foremost is transient response. Modern components don’t draw power linearly; they exhibit massive, rapid current spikes (transients) that can exceed their average power draw by 2x or even 3x for very short durations. A quality PSU must be able to deliver this burst power without significant voltage droop, which can cause system instability or crashes. This requires robust primary and secondary side capacitors (Japanese brands like Rubycon, Nippon Chemi-Con, Nichicon are often preferred for their quality and longevity), and sophisticated voltage regulation.

Secondly, ripple suppression is vital. Ripple is the AC component remaining in the DC output voltage, and excessive ripple can cause instability, reduce component lifespan, and increase heat. High-quality PSUs use better filtering components and more refined designs to keep ripple well within ATX specifications (typically 120mV for 12V rails). Thirdly, the quality of the internal components and overall topology matters immensely. A well-designed PSU uses high-grade MOSFETs, transformers, and proper soldering, often employing topologies like full-bridge LLC resonant converters for efficiency and stability. Lastly, over-current, over-voltage, and short-circuit protections (OCP, OVP, SCP) are non-negotiable. These safety features protect not just the PSU, but also your expensive components from catastrophic failure. My opinion is that skimping on a PSU is the single biggest mistake a system builder can make. It’s a false economy. A cheap PSU might technically deliver the wattage, but its poor transient response, high ripple, and inadequate protections can lead to instability, premature component failure, and even fire hazards. Always invest in an 80 Plus Gold or Platinum rated unit from a reputable brand like Seasonic, Corsair, or Cooler Master, with a long warranty.

Q: The 16-pin (12VHPWR/12V-2x6) connector has been controversial. From an engineering perspective, what were the design goals, and where do you think it fell short initially?

The 16-pin (12VHPWR, now refined to 12V-2x6) connector was born out of a clear engineering necessity: density and higher power delivery through a single cable. With GPUs pushing 450-600W, the previous solution of multiple 8-pin PCIe connectors (each rated for 150W, plus 75W from the PCIe slot) became unwieldy, requiring three or four separate cables. The goal was to consolidate this into a single, compact connector that could reliably deliver up to 600W. This not only cleans up cable management but also simplifies power delivery on the GPU PCB itself. The four smaller sense pins were crucial for power supply and GPU communication, allowing for dynamic power negotiation and preventing overdraw.

Jake’s reservations about the connector echo what we found in our hands-on RTX 5090 review, where the new 12V-2x6 implementation is a clear improvement.

However, the initial 12VHPWR design undeniably fell short in real-world application, leading to widespread melting issues. From an engineering perspective, the primary failure points were inadequate contact area and insufficient tolerance for user error. The small, closely spaced pins, combined with the need for a full, flush insertion, made it highly susceptible to partial insertion. A partially inserted connector means higher resistance at the contact points, leading to localized heating (I^2R losses). When you’re pushing 50A through a tiny contact point, even a slight increase in resistance can generate hundreds of degrees Celsius, melting the plastic housing. Furthermore, the cable’s stiffness, especially with adapter cables, often put mechanical stress on the connector, encouraging partial insertion or bending of the internal wires. My opinion is that while the concept of a high-density power connector was sound, the initial implementation of 12VHPWR prioritized compactness over robust user-friendliness and mechanical tolerance. The 12V-2x6 revision attempts to address this with longer sense pins (ensuring power isn’t delivered until fully seated) and improved wire termination, which is a step in the right direction, but the industry needs to learn from this experience regarding user-proof design for high-power interfaces.

Q: AMD pioneered the chiplet revolution with Ryzen and EPYC, and Intel is now adopting it with Meteor Lake and future designs. What are the biggest advantages and challenges of this approach for consumer hardware?

The chiplet revolution, spearheaded by AMD and now embraced by Intel, is arguably the most significant architectural shift in silicon design in the last decade, particularly for consumer hardware. The biggest advantages are numerous. Firstly, yields and cost. Manufacturing large monolithic dies is incredibly difficult; the larger the die, the higher the probability of a defect, leading to lower yields and higher costs per good chip. By breaking a large chip into smaller, more manageable chiplets (e.g., CPU core complexes, I/O dies, GPU tiles), manufacturers can achieve much higher yields on smaller, defect-prone process nodes. This directly translates to lower manufacturing costs and better price-to-performance ratios for consumers. Secondly, modularity and flexibility. Different chiplets can be manufactured on different process nodes optimized for their specific function (e.g., a CPU core chiplet on a leading-edge 4nm node, an I/O chiplet on a more mature, cost-effective 12nm node). This allows for faster iteration, easier scaling (e.g., adding more CPU core chiplets for higher core counts), and mixing and matching IP from different foundries.

Silicon wafer with AI chip patterns on a clean-room workbench

However, the chiplet approach also introduces significant challenges. The primary one is inter-chiplet communication. Data transfer between chiplets requires a high-bandwidth, low-latency interconnect (like AMD’s Infinity Fabric or Intel’s Foveros). This interconnect consumes power, adds latency compared to monolithic designs, and occupies valuable die area. Managing this latency and ensuring coherent data access across multiple chiplets is a complex engineering feat. Another challenge is thermal management, as multiple heat sources are now spread across a larger area, often requiring more sophisticated cooling solutions. My opinion is that despite these challenges, chiplets are the undeniable future of high-performance silicon. The benefits in cost, scalability, and design flexibility far outweigh the engineering hurdles, and we’ll see further innovations in packaging and interconnect technologies that continue to mitigate these issues, ultimately bringing more powerful and complex chips to consumers at more reasonable prices.

Q: As a silicon engineer, when you’re building your own PC, what components or aspects do you prioritize that the average enthusiast might overlook?

When I build my own PC, my priorities definitely skew towards stability, reliability, and long-term value, often overlooking the absolute bleeding edge that enthusiasts chase. The average enthusiast might obsess over benchmark numbers, but I prioritize the foundation. First, the PSU. I will never skimp here. I’m looking for a top-tier 80 Plus Platinum or Titanium unit from Seasonic or Corsair, known for excellent ripple suppression, transient response, and high-quality internal components. I’d rather spend $200-$300 on a 1000W PSU that lasts 10 years and protects my components than save $50 and risk instability.

Second, memory quality and stability over raw speed. While I’ll aim for fast DDR5, I’m more concerned with reliable, high-density modules from brands like G.Skill or Crucial that have strong JEDEC compliance and good XMP profiles, rather than pushing for the absolute highest MHz with untested timings. I often consider ECC RAM if the motherboard supports it, even for a consumer build, for data integrity, though it’s rare to find on mainstream platforms. Third, storage redundancy and endurance. I typically opt for enterprise-grade or prosumer NVMe SSDs (like Samsung’s Pro series or WD Black SN850X) known for high TBW (Terabytes Written) ratings and consistent performance, often configured in a RAID 1 for critical data, alongside a larger, reliable HDD for bulk storage. I also pay close attention to cooling solutions – not just the raw performance, but the noise profile and long-term reliability of fans and pumps. I’d rather have a slightly less performant but quiet and robust Noctua air cooler or a high-quality AIO from Arctic or Lian Li than a cheap, loud RGB-laden solution. My opinion is that the average enthusiast often gets caught up in chasing synthetic benchmarks and RGB aesthetics, while overlooking the critical importance of a stable, robust power delivery, reliable memory, and data integrity, which are the true hallmarks of a high-performance and long-lasting system.

Q: Looking back at the chip shortage and recent geopolitical developments, how do you see these factors continuing to impact consumer pricing and availability of PC hardware in the coming years?

The chip shortage and ongoing geopolitical developments have fundamentally altered the landscape of semiconductor manufacturing and, consequently, consumer pricing and availability. I see these factors continuing to exert significant pressure, leading to higher baseline costs and potential supply chain volatility for years to come. The shortage exposed the fragility of a highly centralized global supply chain, with Taiwan’s TSMC being disproportionately critical. This has spurred a global push for reshoring and regionalization of chip manufacturing, with massive investments in new fabs in the US (e.g., Intel’s Arizona plants, TSMC’s Arizona fab) and Europe (e.g., Intel in Germany). While this increases resilience, building a new fab costs tens of billions of dollars and takes years, and these costs will inevitably be passed on to consumers.

Furthermore, geopolitical tensions, particularly between the US and China, are driving policies like export controls on advanced manufacturing equipment and cutting-edge chips. This creates market fragmentation and forces companies to design different products for different regions, adding complexity and cost. Tariffs also directly increase import costs, which are then absorbed by the consumer. We’re already seeing this with fluctuating NAND and DRAM prices, which are influenced by both supply/demand and geopolitical maneuvering. My opinion is that the era of consistently declining price-per-performance for PC hardware, particularly for leading-edge nodes, is largely over. We will likely see higher average prices for new generations of CPUs and GPUs, with slower price erosion over time. Availability might improve compared to the peak of the shortage, but regional conflicts or natural disasters in key manufacturing hubs could still trigger localized or temporary shortages. The industry is moving towards a more diversified but inherently more expensive manufacturing model.

Q: Finally, looking ahead to 2028, what bold predictions do you have for the state of consumer PC hardware?

By 2028, I predict we’ll see a profound shift in how we perceive and interact with our PCs, driven primarily by ubiquitous AI integration and continued advancements in packaging. My boldest prediction is that every mainstream consumer CPU will integrate an NPU capable of at least 100 TOPS, making dedicated AI acceleration a standard, expected feature, not a premium one. This will enable truly seamless, real-time local AI for everything from advanced operating system features and intelligent assistants to sophisticated content creation and hyper-realistic gaming NPCs, without constant cloud reliance.

Secondly, I foresee discrete GPUs evolving into highly specialized compute engines, primarily targeting high-end gaming, professional content creation, and advanced AI workloads. Integrated graphics (iGPUs) paired with powerful NPUs will handle the vast majority of mainstream computing, including 1080p gaming and casual creative tasks, becoming significantly more capable than current mid-range discrete cards. We’ll see iGPUs leveraging HBM-like on-package memory or incredibly fast system RAM access to achieve this.

Thirdly, chiplet designs will dominate across all tiers of CPUs and GPUs, with advanced 3D stacking (like Intel’s Foveros Direct or AMD’s 3D V-Cache taken to the next level) becoming common. This will enable unprecedented component density and heterogeneous integration, mixing CPU cores, specialized AI accelerators, and even different memory types on the same package. My opinion is that by 2028, the traditional distinction between CPU, GPU, and NPU will be further blurred, with systems becoming highly optimized, interconnected compute platforms where the “brain” is a complex, multi-chiplet SoC designed for parallel processing across diverse workloads, making today’s monolithic designs seem almost quaint.

Closing Thoughts

Our conversation with Jake Morrison has been an illuminating journey into the intricate world of silicon engineering. It’s clear that the forces shaping our PC hardware—from the relentless demands of AI to the geopolitical shifts affecting manufacturing—are complex and multifaceted. His insights underscore that the components we passionately discuss are the result of immense engineering challenges and strategic decisions, often balancing performance, power, and cost in ways we rarely consider.

For developers experimenting with local AI inference on these GPUs, the practical setup guides at codeyourweb.org (developer-focused hardware reading) cover Python toolchain and CUDA stack configuration.

Jake’s emphasis on memory bandwidth, PSU quality, and the foundational stability of a system resonates deeply. His predictions for 2028 paint a picture of an AI-infused, highly integrated future, where the lines between CPU, GPU, and NPU become increasingly indistinct. As enthusiasts, it’s crucial to look beyond raw clock speeds and benchmark numbers, understanding the underlying architectural shifts and the practical implications for our builds. We extend our sincerest thanks to Jake for sharing his invaluable perspective, reminding us that at the heart of every powerful PC lies brilliant, often unseen, engineering.

The chiplet versus monolithic-tile debate plays out concretely in our AMD versus Intel platform comparison — worth a read if you are choosing a platform right now.

For a hands-on look at what tensor-core-heavy designs deliver in 2026, our in-depth RTX 5090 review is the practical companion piece to this interview.