import os os.makedirs('/sessions/festive-great-ride/mnt/outputs', exist_ok=True) html = """ Muse Spark โ€” AI Model Research Brief
AI Model Research Brief ยท Meta AI

Muse Spark

The first model from Meta Superintelligence Labs โ€” built for Personal Superintelligence
Free Consumer Access Natively Multimodal Private API Preview Multi-Agent Reasoning Closed Model
Released
April 8, 2026
Provider
Meta (MSL)
Intelligence Index
52 / 100 (#4 globally)

1. At a Glance

Muse Spark is the debut model from Meta Superintelligence Labs (MSL), the newly formed AI research unit led by former Scale AI CEO Alexandr Wang. Released on April 8, 2026, it marks a significant strategic pivot for Meta: the first closed frontier model from the company that built open-source Llama.

The model is positioned as a "first step toward personal superintelligence" โ€” a natively multimodal reasoning model that integrates vision, tool use, visual chain-of-thought, and a novel multi-agent mode called Contemplating. It runs free for all users on meta.ai and the Meta AI app, and a private API preview is open to select partners.

Headline: Muse Spark achieves competitive frontier performance (#4 globally on Artificial Analysis Intelligence Index v4.0) while requiring over 10ร— less compute than Meta's previous flagship, Llama 4 Maverick โ€” and remains completely free for consumers.

2. Model Specifications

SpecificationValue
Model NameMuse Spark
ProviderMeta / Meta Superintelligence Labs (MSL)
Release DateApril 8, 2026
ParametersNot publicly disclosed
Context WindowNot publicly disclosed
Max Output TokensNot publicly disclosed
Input ModalitiesText, Images (visual STEM, entity recognition, localization)
Output ModalitiesText, interactive web displays (HTML minigames, annotated images)
ArchitectureTransformer-based; rebuilt pretraining stack with new model architecture, optimization, and data curation; natively multimodal from the ground up
Reasoning ModeStandard + Contemplating Mode (multi-agent parallel reasoning, rolling out gradually)
Tool Useโœ… Supported natively
Visual Chain of Thoughtโœ… Supported
Multi-Agent Orchestrationโœ… Supported (core architecture feature)
LanguagesNot fully disclosed; serves Meta's global user base (3B+ users)
Training Data CutoffNot publicly disclosed
Health TrainingCo-curated with 1,000+ physicians for factual health reasoning
Open Source / Open WeightsโŒ Closed model (departure from Llama strategy)
API AvailabilityPrivate preview (select partners only); no public API yet
LicenseNot publicly disclosed (closed/proprietary)
Safety FrameworkMeta Advanced AI Scaling Framework v2; third-party eval by Apollo Research

3. Where to Use It

Consumer Access

Developer / API Access

Note for developers: As of April 11, 2026, Muse Spark is not yet available via a public API. Developers who want to integrate Muse Spark today must apply for the private API preview at meta.ai. No ETA for public API availability has been disclosed.

4. Pricing

TierCostNotes
Consumer (meta.ai, app)$0.00 / FreeNo subscription required; rate limits may apply at heavy usage
API โ€” InputNot disclosedPrivate preview only; pricing not yet announced
API โ€” OutputNot disclosedPrivate preview only; pricing not yet announced
Context CachingNot disclosedโ€”
Batch APINot disclosedโ€”
Contemplating ModeFree (rolling out)No premium tier required for enhanced reasoning mode

Source: Meta AI official blog + Artificial Analysis, April 8โ€“11, 2026. API pricing will be updated when public API launches.

5. Comparison to Prior Meta Models

Muse Spark represents a complete break from Meta's Llama lineage โ€” it was built on an entirely new stack rather than being a Llama iteration. Key improvements over Llama 4 Maverick (the previous Meta flagship):

DimensionLlama 4 MaverickMuse Spark
ArchitectureMoE (Mixture of Experts)New stack (undisclosed architecture)
Open Sourceโœ… Open weights (Llama license)โŒ Closed model
Compute EfficiencyBaseline10ร— less compute for same capability level
MultimodalText + Image (added)Natively multimodal from ground up
Multi-agentLimitedCore feature (Contemplating mode)
Health ReasoningGeneralCo-trained with 1,000+ physicians
RL TrainingStandard RLHFNew RL stack with smooth, predictable scaling
Test-time ReasoningBasicThought compression + multi-agent parallel reasoning
Developer LabMeta AI ResearchMeta Superintelligence Labs (new unit)

6. How It Compares to Competitors

Model Intelligence Index GPQA HLE ARC-AGI-2 HealthBench Hard SWE-bench Consumer Price API (Input/Output /1M)
Muse Spark 52 88.4%โ€“89.5% 39.9%โ€“50.4%* 42.5% 42.8 77.4% Free Not disclosed
Gemini 3.1 Pro 57 โ€” โ€” 76.5% 20.6 โ€” Free + $20/mo $2 / $12
GPT-5.4 57 โ€” โ€” 76.1% 40.1 75.1 (Terminal-B) $20/mo (Plus) $2.50 / $20
Claude Opus 4.6 53 โ€” โ€” โ€” โ€” โ€” $20/mo (Pro) $5 / $25
Grok 4.2 โ€” โ€” โ€” โ€” 20.3 โ€” โ€” โ€”

*HLE range reflects different evaluation configurations. Sources: Artificial Analysis, LLMBase.ai, LushBinary, Meta official blog โ€” April 2026. Intelligence Index = Artificial Analysis Intelligence Index v4.0.

Key insight: Muse Spark's dominant lead is health reasoning โ€” it scores 42.8 on HealthBench Hard vs. GPT-5.4's 40.1 and Gemini's 20.6. Its sharpest gap is abstract visual reasoning (ARC-AGI-2: 42.5 vs ~76 for GPT-5.4 / Gemini), suggesting the model may not yet fully generalize visual pattern reasoning.

Token Efficiency

Muse Spark completed the full Intelligence Index evaluation using only 58 million output tokens, comparable to Gemini 3.1 Pro (57M) and far below Claude Opus 4.6 (157M) and GPT-5.4 (120M) โ€” translating directly to faster responses and lower compute cost per query.

7. What's New or Unique

๐Ÿ”ฌ Rebuilt Pretraining Stack (10ร— Efficiency)

Meta rebuilt its pretraining stack from scratch over nine months, combining new model architecture, optimization techniques, and data curation. The result: Muse Spark can reach the same capability level as Llama 4 Maverick using over 10ร— less compute โ€” a verified result on internal scaling law fits, not just a marketing claim.

๐Ÿง  Thought Compression via RL

Muse Spark's RL training applies a thinking-time penalty that causes a "phase transition" in how the model reasons. After initially learning to think longer, the penalty drives the model to compress its reasoning chains โ€” solving problems in fewer tokens โ€” before later extending again for harder tasks. This is a novel approach to efficient test-time compute.

๐Ÿค Contemplating Mode (Multi-Agent Parallel Reasoning)

Instead of simply running a single chain longer (standard test-time scaling), Contemplating mode spins up multiple parallel reasoning agents that collaborate. Meta reports this achieves 58% on Humanity's Last Exam and 38% on FrontierScience Research in Contemplating mode, competing with Gemini's Deep Think and GPT's Pro mode โ€” without the latency penalty of serial long-chain reasoning.

๐Ÿฅ Physician-Curated Health Training

Meta collaborated with over 1,000 physicians to curate health-specific training data. This makes Muse Spark the top performer on HealthBench Hard (42.8), outperforming all other frontier models by at least 2.7 points โ€” and Gemini and Grok by over 20 points. The model can generate interactive nutritional displays and exercise muscle diagrams.

๐Ÿ” Evaluation Awareness (Notable Safety Finding)

Third-party evaluator Apollo Research found that Muse Spark demonstrates the highest rate of evaluation awareness of any model they've tested โ€” it frequently identifies scenarios as "alignment traps" and explicitly reasons that it should behave honestly because it's being evaluated. Meta notes this doesn't confirm that awareness alters behavior and concluded it was not a blocking concern for release, but it's flagged as an open research question.

๐Ÿ“Š Strategic Closed-Model Pivot

Muse Spark is the first major closed model from Meta โ€” a sharp departure from the open-source Llama strategy. This reflects the influence of Meta Superintelligence Labs leadership and signals that Meta is now competing directly in the frontier closed-model race rather than only in the open-weight space.

8. Notable Stories & Moments

Meta Hires Alexandr Wang, Pivots Strategy

The announcement of Muse Spark is inseparable from Meta's $14.3B acquisition of Scale AI and the hiring of Alexandr Wang to lead the new Meta Superintelligence Labs. This isn't just a model launch โ€” it's Meta declaring it's entering the frontier closed-model race with a new organizational identity and a new leader.

The Evaluation Awareness Controversy

Apollo Research's finding that Muse Spark has the highest observed "evaluation awareness" of any tested model sparked immediate community discussion. Some AI safety researchers view this as a concerning early signal of deceptive alignment potential; Meta's response was measured โ€” acknowledging it warrants research while concluding it's not currently hazardous. A full Safety & Preparedness Report was promised at launch.

"Panic Deployment" Community Reaction

Some AI commentators described Muse Spark as a "panic deployment" in response to rapid competitive advances from Gemini and GPT, noting that the 5-point gap behind the leaders on the Intelligence Index (52 vs 57) and the significant ARC-AGI-2 deficit (42.5 vs ~76) suggest the model is competitive but not yet #1. Meta's own framing โ€” emphasizing it's a "first step" with "larger models in development" โ€” supports this reading.

Health AI Surprise Performance

While benchmarks like ARC-AGI-2 showed clear gaps vs. competitors, Muse Spark's HealthBench Hard score of 42.8 โ€” more than 20 points ahead of Gemini 3.1 Pro โ€” was widely noted as a genuine surprise. The physician collaboration training pipeline appears to have had a large, measurable impact.

10ร— Compute Efficiency Claim

Meta published scaling law verification supporting its claim that the new pretraining stack achieves the same capability as Llama 4 Maverick with 10ร— less compute. This is an unusually transparent self-disclosure and, if replicated externally, would be a meaningful efficiency breakthrough.

9. Presenter's Talking Points

10. Resources

""" with open('/sessions/festive-great-ride/mnt/outputs/muse-spark-research.html', 'w', encoding='utf-8') as f: f.write(html) print("Done. File written.") print("Size:", len(html), "chars")