Find ASIC Vendors

Decoupled by Design: Rethinking Edge AI Architecture

May 08, 2026

Get a Price Quote

For many engineers, there are few hardware options to support high-performance AI. Typically, one has to choose between repurposed GPUs that require a full system redesign or running inference directly on embedded CPUs and NPUs at the cost of severe thermal limits and high latency. Earlier USB and M.2 accelerators offered a more modular path at the cost of limited compute and memory capacity. This left developers with an expensive balancing act where they must consider performance, power consumption and flexibility, often sacrificing one or more in the process.

Decoupled AI architecture balances power performance and flexibility

The Gateworks GW16168 M.2 card revives the modularity of earlier M.2 accelerators while significantly advancing the underlying technology. Future upgrades and revisions no longer require replacing otherwise capable industrial SBCs. For example, dedicated AI acceleration can be added directly to platforms such as the i.MX 8M Plus or i.MX 95 applications processors via the M.2 interface. However, these same SBC systems could reach 100% utilisation when running inference workloads. The GW16168 with 16GB of LPDDR4 memory allows these tasks to be offloaded to the card, freeing the host CPU to focus on system logic and I/O. As an added benefit, the common out-of-memory errors encountered when running Vision transformers or LLMs on standard edge modules are no longer an issue.

“The GW16168 illustrates exactly why decoupled AI architectures are the future of edge computing. By combining NXP’s Ara240 DNPU with our industrial-grade design, customers can scale AI performance without redesigning their entire hardware platform,” says Ravi Annavajjhala, Vice President and General Manager, Neural Processing Units, NXP Semiconductors. “This brings flexibility, longevity and cost efficiency to real-world AI deployments.”

Thermal challenges with AI

One of the biggest challenges in AI deployment is thermal management. High-performance AI systems can draw significant power, with demand often spiking during complex tensor operations. As a result, thermals frequently become the limiting factor. This is especially problematic in space-constrained industrial designs where advanced cooling systems can quickly become costly and impractical. The Gateworks M.2 card has been designed specifically to address this issue by using a passively cooled NXP Ara240 Discrete Neural Processing Unit (DNPU) together with carefully engineered power circuitry, to enable a typical power consumption of 12 W. This lower power envelope reduces heat build-up, enabling reliable operation in sealed, fanless environments while maintaining thermal characteristics aligned with industrial-grade AI hardware. The M.2 card also boasts a decade-long lifespan, thanks to advanced thermal management that reduces wear on the modules.

A partnership built on shared silicon

The GW16168 did not emerge from a standing start. Gateworks has designed and manufactured embedded computer boards in California for over 30 years, and every board in its current portfolio carries an NXP processor. That longstanding dependency has evolved into a formal gold partnership. Today, Gateworks designs single-board computers exclusively with NXP silicon across its product lines, and NXP shares early roadmap access, design documentation and direct engineering collaboration. When NXP acquired AI inference specialist Kinara in 2024 and absorbed the Ara240 Discrete Neural Processing Unit (DNPU) into its portfolio, Gateworks was positioned to move quickly. The kickoff meeting took place at CES in January 2026. The product ships in June.

“We’ve been working with NXP making computer boards for three generations of products,” says Kelly Peralta, VP of Sales and Business Development at Gateworks. “Every board we make has an NXP processor. We work very closely, mutually sharing early design roadmaps, design guidelines and a lot of collaboration to help our customers deploy faster. When we found out that NXP was going to have their own AI accelerator — when they acquired Kinara — we took that seriously.”