Apple Foundation Models: AFM 3 Brings 20B Parameters On-Device at WWDC 2026

TL;DR: Apple unveiled its third-generation Foundation Models at WWDC 2026, featuring AFM 3 with 20 billion parameters running entirely on-device across iPhone, iPad, and Mac. Apple confirmed the models contain zero Gemini code, while Google and Nvidia contribute only to Cloud Pro infrastructure.

Apple announced its third-generation Foundation Models at WWDC 2026 on June 8, headlined by AFM 3 — a 20-billion-parameter model running entirely on-device. Craig Federighi confirmed during the keynote that Apple’s new architecture integrates these models deep into iOS, iPadOS, and macOS. The announcement puts Apple’s on-device parameter count ahead of many competing cloud-dependent mobile solutions.

What Are Apple Foundation Models and How Do They Work?

Apple Foundation Models are the proprietary neural networks powering Apple Intelligence across the company’s ecosystem. According to Apple’s newsroom announcement, these models integrate deep into Apple’s platform experiences, enabling features like intelligent summaries, Genmoji, and the redesigned Siri. The third generation introduces three distinct model tiers, each serving different computational needs. The Foundation Models framework, documented in Apple’s developer resources, exposes these networks to third-party app developers through a Swift API.

Developers access the models through a guided pipeline. The framework handles model loading, tokenization, and inference automatically. Apple’s developer tools include Xcode integration for testing model behavior locally before deployment. The system routes requests intelligently between on-device and cloud resources based on task complexity.

The architecture is modular. Apple separates concerns into distinct model variants optimized for specific hardware constraints. This design allows the same fundamental model to scale across devices from iPhone to Mac Studio.

How Does Apple’s Three-Tier AI Architecture Function?

Apple’s 2026 AI architecture operates across three tiers: on-device processing, Private Cloud Compute, and Cloud Pro. The on-device tier handles immediate tasks like text prediction, image generation, and voice recognition. Private Cloud Compute processes more demanding requests on Apple Silicon servers with end-to-end encryption. Cloud Pro, the most powerful tier, tackles complex reasoning tasks requiring massive parameter counts. According to CNBC reporting, Apple partners with Google and Nvidia specifically for Cloud Pro infrastructure.

The routing system evaluates each request against multiple criteria. Simple tasks never leave the device. Moderate complexity triggers cloud compute. The most demanding queries reach Cloud Pro servers.

Apple emphasized privacy throughout the architecture. On-device processing keeps personal data local. Private Cloud Compute deletes data after processing. Cloud Pro operates under contractual privacy guarantees, though specifics remain limited.

Tier	Hardware	Parameters	Latency	Privacy Level
On-Device (AFM 3)	Apple Neural Engine	20B	Under 10ms	Full on-device
Private Cloud Compute	Apple Silicon Servers	Undisclosed	100-500ms	E2E encrypted, deleted after use
Cloud Pro	Google/Nvidia Infrastructure	Largest tier	Variable	Contractual guarantees

The separation matters. Each tier serves a purpose.

What Is AFM 3 and How Does It Run a 20B Model on iPhone?

AFM 3 represents Apple’s most capable on-device model, packing 20 billion parameters into a package that runs on the Neural Engine inside Apple Silicon. According to Memeburn’s coverage, the model runs entirely on iPhone without cloud connectivity. Apple achieved this through aggressive quantization and architectural optimization specific to the Neural Engine’s matrix multiplication units.

The model supports text generation, code completion, image understanding, and tool use. Apple’s framework documentation describes a streaming inference pipeline that generates tokens incrementally, reducing memory pressure during generation.

Running 20 billion parameters on a phone requires significant memory bandwidth. Apple’s unified memory architecture gives the Neural Engine direct access to RAM. The A18 Pro and M4 chips provide the computational throughput needed for real-time inference at this scale.

Key technical features of AFM 3 include:

20 billion parameters running entirely on the Neural Engine
Grouped-query attention for efficient memory usage during inference
4-bit quantization reducing model size while preserving output quality
LoRA adapters allowing task-specific fine-tuning without full model replacement
Tool calling enabling the model to invoke functions and APIs
Multimodal input processing text and images simultaneously
Streaming generation producing tokens incrementally for responsive UX
Session isolation keeping conversations private between apps
Dynamic routing allocating compute based on current device thermal state
Battery-aware scheduling deferring non-urgent inference during low-power conditions

Apple’s developer documentation confirms that AFM 3 exposes these capabilities through the Foundation Models framework. Developers can specify constraints like maximum token count, temperature, and guided generation schemas.

The framework includes built-in safety classifiers. These filter harmful content before it reaches users. Apple also provides on-device evaluation tools for testing model behavior against custom datasets.

Does Apple’s Foundation Model Contain Any Google Gemini Code?

No. Apple’s Foundation Models contain zero code from Google’s Gemini models. Craig Federighi stated this directly during the WWDC 2026 keynote, and AppleInsider confirmed the clarification in their coverage. “The amount of Google Assistant we use is none,” Federighi said, according to MacDailyNews. Apple built its Foundation Models from the ground up using its own training data, architecture, and infrastructure.

Google’s involvement is limited to Cloud Pro infrastructure. CNBC reported that Google and Nvidia provide hardware and systems expertise for Apple’s most powerful cloud tier. This is an infrastructure partnership, not a model licensing agreement.

Apple’s training pipeline uses proprietary data centers. The company has invested heavily in its own training infrastructure. The models reflect Apple’s design philosophy: vertical integration from silicon to software.

The distinction matters for privacy advocates. Gemini models process data through Google’s cloud infrastructure with Google’s privacy policies. Apple’s Foundation Models operate under Apple’s privacy framework, which the company positions as more restrictive. When a request stays on-device, it never touches Google’s systems. When it reaches Cloud Pro, Google provides compute resources but does not train on the data.

Apple’s partnership with Google and Nvidia is purely operational. The companies contribute hardware acceleration technology and data center design expertise. The model weights, training methodology, and inference pipeline are entirely Apple’s work.

How Are Google and Nvidia Involved in Apple Foundation Model Cloud Pro?

Apple Foundation Model Cloud Pro represents a surprising collaboration. Apple partnered with Google and Nvidia for its most advanced AI model, a departure from the company’s usual vertically integrated approach. CNBC reported that Apple executives confirmed this alliance during a media briefing at WWDC 2026. The involvement centers on infrastructure rather than model design.

Nvidia contributes its GPU expertise for the cloud infrastructure. Google provides TPU (Tensor Processing Unit) acceleration capabilities. Apple’s own silicon remains the foundation for the on-device models, but Cloud Pro extends beyond what Apple’s chips can handle alone. The partnership allows Apple to scale inference for complex reasoning tasks.

Craig Federighi, Apple’s Senior Vice President of Software Engineering, clarified the boundaries of this collaboration. “The amount of Google Assistant we use is none,” Federighi stated emphatically. AppleInsider confirmed that the Foundation Models contain zero Gemini code. Apple built the models entirely in-house. The Google and Nvidia contributions are purely infrastructure-related.

This distinction matters. Critics speculated Apple would simply rebrand Gemini as its own. That did not happen. The models are all-Apple through and through. Google and Nvidia provide the computing muscle, not the brains.

What Developer Tools Ship With the Foundation Models Framework?

Apple shipped a comprehensive set of developer tools alongside the Foundation Models framework at WWDC 2026. The framework provides native Swift APIs for integrating AI capabilities into iOS, iPadOS, and macOS applications. Developers can access both on-device and cloud-based models through a unified interface.

According to Apple’s developer documentation, the framework supports several key capabilities:

Guided generation for structured output using Swift schemas
LoRA adapters for task-specific fine-tuning without full model retraining
Tool calling that lets models invoke app functions directly
Session management for maintaining conversation context across interactions
Streaming output for real-time text generation in user interfaces
Safety filters built into the framework for content moderation
Async/await support throughout the entire API surface
Cloud Pro integration with automatic routing for complex queries

Apple’s newsroom announcement highlighted that new intelligence frameworks allow developers to build AI features more flexibly. The tools tap into models from Apple and third-party providers. Claude’s API documentation confirms a Swift package called “Claude for Foundation Models” that bridges Anthropic’s models with Apple’s framework.

Tool	Purpose	Availability
Foundation Models Framework	Core AI model access	iOS 19, macOS 16
Swift Assist	Code generation in Xcode	Xcode 26 beta
Core ML 8	Custom model deployment	All Apple platforms
Cloud Pro API	Server-side inference	Limited preview

The framework abstracts away model selection. Developers specify intent and constraints. The system handles routing between on-device and cloud automatically.

How Does AFM Core Advanced Differ From the Standard Model?

AFM Core Advanced introduces multi-step reasoning capabilities that the standard on-device model lacks. MacStories reported that Apple positioned AFM Core Advanced as a middle tier between the 20-billion-parameter on-device model and the Cloud Pro model. It targets complex tasks requiring deeper inference without round-tripping to cloud servers.

The standard AFM 3 model handles immediate, lightweight tasks. Text summarization, basic Q&A, and notification prioritization run on this model. AFM Core Advanced adds chain-of-thought reasoning. It can break down complex queries into intermediate steps before producing an answer.

Performance tradeoffs exist. AFM Core Advanced consumes more memory and battery than the standard model. Apple recommends using it selectively for tasks that genuinely benefit from deeper reasoning. The framework provides APIs for developers to specify which model tier their feature requires.

Memeburn’s coverage noted that AFM 3 runs a 20-billion-parameter model on-device. AFM Core Advanced uses a larger architecture that pushes against the memory limits of current Apple devices. It activates only when the standard model’s output quality is insufficient. Apple’s framework handles this decision automatically based on task complexity scoring.

Which Apple Devices Support the Third-Generation Foundation Models?

Apple restricted AFM 3 support to devices with sufficient neural engine capacity and RAM. The 20-billion-parameter on-device model demands significant memory. Only devices with at least 8GB of RAM qualify for the full third-generation Foundation Models experience.

Supported devices include:

iPhone 17 Pro and Pro Max — full AFM 3 with on-device inference
iPhone 17 — standard model only, Cloud Pro for advanced tasks
iPad Pro (M5) — full AFM 3 support including Core Advanced
MacBook Pro (M5) — full support with extended memory allocation
MacBook Air (M4) — standard model with Cloud Pro fallback
Mac Studio (M5 Ultra) — full support optimized for developer workloads
iPad Air (M3) — standard model only
Apple Vision Pro — full AFM 3 with vision-optimized adapters

Older devices fall back to Cloud Pro for tasks the local hardware cannot handle. Apple’s A17 Pro and M3 chips technically run a distilled version of AFM 3, but performance degrades noticeably. The Elec reported that Apple’s strategy positions the iPhone as an evolving AI platform, with each generation expanding local capabilities.

The 8GB RAM threshold is non-negotiable. The 20-billion-parameter model requires approximately 10GB of memory in its quantized form. Apple uses aggressive memory swapping on supported devices to maintain system responsiveness.

How Does Apple Intelligence Connect Siri to Foundation Models?

Apple Intelligence serves as the orchestration layer connecting Siri to the Foundation Models framework. The Elec described Apple Intelligence as “the layer connecting the new Siri and the Foundation Models Framework.” This architecture separates the conversational interface (Siri) from the intelligence engine (Foundation Models).

When a user speaks to Siri, Apple Intelligence evaluates the request. It determines which model should handle the query. Simple requests like setting timers stay on-device. Complex requests involving reasoning, code generation, or creative writing route through Cloud Pro. The routing happens in milliseconds.

Apple’s newsroom announcement emphasized that the new architecture integrates Foundation Models “deep into Apple’s platform experiences.” Siri gains the ability to maintain context across multiple interactions. It can reference previous conversations, open apps, and execute multi-step workflows using the Foundation Models’ tool-calling capabilities.

The redesign also introduces what Apple calls “app intents.” Developers register their app’s capabilities with the system. Siri can then invoke those capabilities through natural language. The Foundation Models framework translates user requests into structured function calls. This approach keeps user data private — the model sees only what the app explicitly exposes.

Frequently Asked Questions

Can developers access Apple Foundation Models through third-party APIs like Claude?

No. Apple Foundation Models are accessible exclusively through Apple’s Foundation Models framework on Apple platforms. However, Claude’s API documentation confirms a “Claude for Foundation Models” Swift package that allows developers to use Claude models within Apple’s framework. This package runs Anthropic’s Claude models alongside Apple’s native models, but the Apple Foundation Models themselves remain locked to Apple’s ecosystem.

What is the parameter count difference between AFM 3 on-device and Cloud Pro models?

AFM 3 runs 20 billion parameters on-device, according to Memeburn’s WWDC 2026 coverage. Apple has not publicly disclosed the exact parameter count for Cloud Pro. Industry analysts estimate Cloud Pro operates at several hundred billion parameters based on the infrastructure requirements and Google/Nvidia GPU involvement. The gap between the two models explains why Cloud Pro handles Apple’s most complex reasoning tasks.

Does Apple Foundation Model Cloud Pro run on Apple Silicon or third-party infrastructure?

Cloud Pro runs on third-party infrastructure. CNBC confirmed that Google and Nvidia provide the computing infrastructure for Apple Foundation Model Cloud Pro. Nvidia contributes GPU resources while Google provides TPU acceleration. Apple’s own Apple Silicon handles the on-device models exclusively, but Cloud Pro requires the scale that only Google Cloud and Nvidia GPUs can provide.

How does Apple ensure privacy when using cloud-based Foundation Models?

Apple implemented a privacy architecture called Private Cloud Compute for Cloud Pro inference. Requests are anonymized and stripped of personally identifiable information before leaving the device. Apple stated that cloud inference data is never stored, never accessible to Apple employees, and independently auditable by security researchers. The company confirmed that even with Google and Nvidia infrastructure involved, neither partner can access user data or model inputs.

Summary

AFM 3 delivers 20 billion parameters on-device — a massive leap for mobile AI that runs entirely on Apple Silicon without cloud dependency.
Google and Nvidia power Cloud Pro infrastructure — but the models themselves contain zero Gemini code and are built entirely by Apple.
The Foundation Models framework gives developers native Swift APIs — with guided generation, LoRA adapters, tool calling, and automatic cloud routing built in.
Device support requires 8GB RAM minimum — limiting AFM 3 to iPhone 17 Pro, M5 iPad Pro, and recent Macs for the full experience.
Apple Intelligence orchestrates Siri’s access to Foundation Models — routing requests between on-device and cloud models based on task complexity.

Explore the full Apple developer documentation to start building with the Foundation Models framework today.