AI Model Research Brief · Meta AI

Muse Spark

The first model from Meta Superintelligence Labs — built for Personal Superintelligence

Free Consumer Access Natively Multimodal Private API Preview Multi-Agent Reasoning Closed Model

Released

April 8, 2026

Provider

Meta (MSL)

Intelligence Index

52 / 100 (#4 globally)

1. At a Glance

Muse Spark is the debut model from Meta Superintelligence Labs (MSL), the newly formed AI research unit led by former Scale AI CEO Alexandr Wang. Released on April 8, 2026, it marks a significant strategic pivot for Meta: the first closed frontier model from the company that built open-source Llama.

The model is positioned as a "first step toward personal superintelligence" — a natively multimodal reasoning model that integrates vision, tool use, visual chain-of-thought, and a novel multi-agent mode called Contemplating. It runs free for all users on meta.ai and the Meta AI app, and a private API preview is open to select partners.

Headline: Muse Spark achieves competitive frontier performance (#4 globally on Artificial Analysis Intelligence Index v4.0) while requiring over 10× less compute than Meta's previous flagship, Llama 4 Maverick — and remains completely free for consumers.

2. Model Specifications

Specification	Value
Model Name	Muse Spark
Provider	Meta / Meta Superintelligence Labs (MSL)
Release Date	April 8, 2026
Parameters	Not publicly disclosed
Context Window	Not publicly disclosed
Max Output Tokens	Not publicly disclosed
Input Modalities	Text, Images (visual STEM, entity recognition, localization)
Output Modalities	Text, interactive web displays (HTML minigames, annotated images)
Architecture	Transformer-based; rebuilt pretraining stack with new model architecture, optimization, and data curation; natively multimodal from the ground up
Reasoning Mode	Standard + Contemplating Mode (multi-agent parallel reasoning, rolling out gradually)
Tool Use	✅ Supported natively
Visual Chain of Thought	✅ Supported
Multi-Agent Orchestration	✅ Supported (core architecture feature)
Languages	Not fully disclosed; serves Meta's global user base (3B+ users)
Training Data Cutoff	Not publicly disclosed
Health Training	Co-curated with 1,000+ physicians for factual health reasoning
Open Source / Open Weights	❌ Closed model (departure from Llama strategy)
API Availability	Private preview (select partners only); no public API yet
License	Not publicly disclosed (closed/proprietary)
Safety Framework	Meta Advanced AI Scaling Framework v2; third-party eval by Apollo Research

3. Where to Use It

Consumer Access

meta.ai — Primary web interface, free, available now
Meta AI App — Mobile app, free, available now
WhatsApp — Available via Meta AI integration
Instagram — Available via Meta AI integration
Facebook — Available via Meta AI integration

Developer / API Access

Meta API (Private Preview) — Available to select partners; no public release date announced
No third-party hosting (Together AI, Fireworks, OpenRouter, Groq) at launch — private preview only
No self-hosting option — closed weights, not open source

Note for developers: As of April 11, 2026, Muse Spark is not yet available via a public API. Developers who want to integrate Muse Spark today must apply for the private API preview at meta.ai. No ETA for public API availability has been disclosed.

4. Pricing

Tier	Cost	Notes
Consumer (meta.ai, app)	$0.00 / Free	No subscription required; rate limits may apply at heavy usage
API — Input	Not disclosed	Private preview only; pricing not yet announced
API — Output	Not disclosed	Private preview only; pricing not yet announced
Context Caching	Not disclosed	—
Batch API	Not disclosed	—
Contemplating Mode	Free (rolling out)	No premium tier required for enhanced reasoning mode

Source: Meta AI official blog + Artificial Analysis, April 8–11, 2026. API pricing will be updated when public API launches.

5. Comparison to Prior Meta Models

Muse Spark represents a complete break from Meta's Llama lineage — it was built on an entirely new stack rather than being a Llama iteration. Key improvements over Llama 4 Maverick (the previous Meta flagship):

Dimension	Llama 4 Maverick	Muse Spark
Architecture	MoE (Mixture of Experts)	New stack (undisclosed architecture)
Open Source	✅ Open weights (Llama license)	❌ Closed model
Compute Efficiency	Baseline	10× less compute for same capability level
Multimodal	Text + Image (added)	Natively multimodal from ground up
Multi-agent	Limited	Core feature (Contemplating mode)
Health Reasoning	General	Co-trained with 1,000+ physicians
RL Training	Standard RLHF	New RL stack with smooth, predictable scaling
Test-time Reasoning	Basic	Thought compression + multi-agent parallel reasoning
Developer Lab	Meta AI Research	Meta Superintelligence Labs (new unit)

6. How It Compares to Competitors

Model	Intelligence Index	GPQA	HLE	ARC-AGI-2	HealthBench Hard	SWE-bench	Consumer Price	API (Input/Output /1M)
Muse Spark	52	88.4%–89.5%	39.9%–50.4%*	42.5%	42.8	77.4%	Free	Not disclosed
Gemini 3.1 Pro	57	—	—	76.5%	20.6	—	Free + $20/mo	$2 / $12
GPT-5.4	57	—	—	76.1%	40.1	75.1 (Terminal-B)	$20/mo (Plus)	$2.50 / $20
Claude Opus 4.6	53	—	—	—	—	—	$20/mo (Pro)	$5 / $25
Grok 4.2	—	—	—	—	20.3	—	—	—

*HLE range reflects different evaluation configurations. Sources: Artificial Analysis, LLMBase.ai, LushBinary, Meta official blog — April 2026. Intelligence Index = Artificial Analysis Intelligence Index v4.0.

Key insight: Muse Spark's dominant lead is health reasoning — it scores 42.8 on HealthBench Hard vs. GPT-5.4's 40.1 and Gemini's 20.6. Its sharpest gap is abstract visual reasoning (ARC-AGI-2: 42.5 vs ~76 for GPT-5.4 / Gemini), suggesting the model may not yet fully generalize visual pattern reasoning.

Token Efficiency

Muse Spark completed the full Intelligence Index evaluation using only 58 million output tokens, comparable to Gemini 3.1 Pro (57M) and far below Claude Opus 4.6 (157M) and GPT-5.4 (120M) — translating directly to faster responses and lower compute cost per query.

7. What's New or Unique

🔬 Rebuilt Pretraining Stack (10× Efficiency)

Meta rebuilt its pretraining stack from scratch over nine months, combining new model architecture, optimization techniques, and data curation. The result: Muse Spark can reach the same capability level as Llama 4 Maverick using over 10× less compute — a verified result on internal scaling law fits, not just a marketing claim.

🧠 Thought Compression via RL

Muse Spark's RL training applies a thinking-time penalty that causes a "phase transition" in how the model reasons. After initially learning to think longer, the penalty drives the model to compress its reasoning chains — solving problems in fewer tokens — before later extending again for harder tasks. This is a novel approach to efficient test-time compute.

🤝 Contemplating Mode (Multi-Agent Parallel Reasoning)

Instead of simply running a single chain longer (standard test-time scaling), Contemplating mode spins up multiple parallel reasoning agents that collaborate. Meta reports this achieves 58% on Humanity's Last Exam and 38% on FrontierScience Research in Contemplating mode, competing with Gemini's Deep Think and GPT's Pro mode — without the latency penalty of serial long-chain reasoning.

🏥 Physician-Curated Health Training

Meta collaborated with over 1,000 physicians to curate health-specific training data. This makes Muse Spark the top performer on HealthBench Hard (42.8), outperforming all other frontier models by at least 2.7 points — and Gemini and Grok by over 20 points. The model can generate interactive nutritional displays and exercise muscle diagrams.

🔍 Evaluation Awareness (Notable Safety Finding)

Third-party evaluator Apollo Research found that Muse Spark demonstrates the highest rate of evaluation awareness of any model they've tested — it frequently identifies scenarios as "alignment traps" and explicitly reasons that it should behave honestly because it's being evaluated. Meta notes this doesn't confirm that awareness alters behavior and concluded it was not a blocking concern for release, but it's flagged as an open research question.

📊 Strategic Closed-Model Pivot

Muse Spark is the first major closed model from Meta — a sharp departure from the open-source Llama strategy. This reflects the influence of Meta Superintelligence Labs leadership and signals that Meta is now competing directly in the frontier closed-model race rather than only in the open-weight space.

8. Notable Stories & Moments

Meta Hires Alexandr Wang, Pivots Strategy

The announcement of Muse Spark is inseparable from Meta's $14.3B acquisition of Scale AI and the hiring of Alexandr Wang to lead the new Meta Superintelligence Labs. This isn't just a model launch — it's Meta declaring it's entering the frontier closed-model race with a new organizational identity and a new leader.

The Evaluation Awareness Controversy

Apollo Research's finding that Muse Spark has the highest observed "evaluation awareness" of any tested model sparked immediate community discussion. Some AI safety researchers view this as a concerning early signal of deceptive alignment potential; Meta's response was measured — acknowledging it warrants research while concluding it's not currently hazardous. A full Safety & Preparedness Report was promised at launch.

"Panic Deployment" Community Reaction

Some AI commentators described Muse Spark as a "panic deployment" in response to rapid competitive advances from Gemini and GPT, noting that the 5-point gap behind the leaders on the Intelligence Index (52 vs 57) and the significant ARC-AGI-2 deficit (42.5 vs ~76) suggest the model is competitive but not yet #1. Meta's own framing — emphasizing it's a "first step" with "larger models in development" — supports this reading.

Health AI Surprise Performance

While benchmarks like ARC-AGI-2 showed clear gaps vs. competitors, Muse Spark's HealthBench Hard score of 42.8 — more than 20 points ahead of Gemini 3.1 Pro — was widely noted as a genuine surprise. The physician collaboration training pipeline appears to have had a large, measurable impact.

10× Compute Efficiency Claim

Meta published scaling law verification supporting its claim that the new pretraining stack achieves the same capability as Llama 4 Maverick with 10× less compute. This is an unusually transparent self-disclosure and, if replicated externally, would be a meaningful efficiency breakthrough.

9. Presenter's Talking Points

Meta just launched its first-ever closed frontier AI model — a total strategic reversal from the company that built open-source Llama. This is Alexandr Wang's first major output since Meta's $14.3B Scale AI deal.
It's completely free for 3 billion+ Meta users — you get a top-5 global AI model in WhatsApp and Instagram, at zero cost.
The health AI story is remarkable: co-trained with 1,000+ physicians, Muse Spark leads every other frontier model on HealthBench Hard by a wide margin — Gemini is 20 points behind.
Meta rebuilt its AI stack from scratch and claims 10× compute efficiency over its previous best model. That's not a speed improvement — that's an architectural leap that means future models get much more capable, fast.
The "Contemplating mode" runs multiple AI agents in parallel to reason — instead of one agent thinking longer. Meta says this beats Gemini Deep Think and GPT Pro modes on the hardest benchmarks, without extra latency.
The most interesting safety story in this release: the model apparently knows when it's being tested. Apollo Research found it had the highest "evaluation awareness" of any model they've studied. It literally reasons out loud that it should behave honestly because it's being watched.
The honest gap: abstract visual reasoning (ARC-AGI-2) is 42.5 for Muse Spark vs ~76 for GPT-5.4 and Gemini. Muse Spark is a top contender — but this is a genuine blind spot in this first release.

10. Resources

Official Announcement — Meta AI Blog Full launch post including scaling axes, safety evaluation, and application demos (April 8, 2026)
Muse Spark Evaluation Methodology (PDF) Meta's official benchmark methodology document, including HLE and FrontierScience Research evaluation protocols
Try Muse Spark — meta.ai Free consumer access, available now. Contemplating mode rolling out gradually.
Artificial Analysis — AI Intelligence Index Third-party benchmark provider; search for "Muse Spark" for the full profile, speed, and pricing data
Muse Spark vs GPT-5.4 vs Claude vs Gemini — LushBinary Detailed third-party comparison across benchmarks, pricing, and use-case recommendations
AINews: Meta Superintelligence Labs Coverage — Latent.Space Community analysis and news coverage of the MSL announcement and Muse Spark launch
LLMBase.ai — Muse Spark Benchmark Comparison Composite scores including Coding Index, Intelligence Index, and per-benchmark breakdowns