Page StatusContentTable

Edited 3 days ago

Updated monthlyDue in 4 weeks

Summary

An interactive sortable table comparing 12+ architecture scenarios across dimensions including likelihood, safety outlook, whitebox access, modularity, and formal verifiability. Supports grouped/unified views by category.

Issues1

QualityRated 20 but structure suggests 0 (overrated by 20 points)

Architecture Scenarios Table

Columns:|

		Probability this becomes dominant at TAI	Trend	Overall safety assessment	Interpretability of internals	Training approach	Behavior predictability	Component separation	Formal verification possible	Key Papers	Labs	Safety Pros	Safety Cons
Minimal Scaffolding Direct model API/chat with basic prompting. No persistent memory, minimal tools. Like ChatGPT web interface.	Deployment Patterns	5-15% Unlikely to stay dominant - scaffolding adds clear value	(illustrative)	5/10 Mixed Easy to study but limited interpretability; low capability ceiling reduces risk	LOW Model internals opaque; just see inputs/outputs	HIGH Standard RLHF on base model	MEDIUM Single forward pass, somewhat predictable	LOW Monolithic model	LOW Model itself unverifiable	InstructGPT (2022) Constitutional AI (2022)	OpenAIAnthropicGoogle DeepMind	+ Simple to analyze + No tool access = limited harm	− Model internals opaque − Limited capability ceiling
Light Scaffolding Model + basic tool use + simple chains. RAG, function calling, single-agent loops. Like GPT with plugins.	Deployment Patterns	15-25% Current sweet spot; but heavy scaffolding catching up	(illustrative)	5/10 Mixed Tool use adds capability and risk; scaffold provides some inspection	MEDIUM Scaffold code readable; model still opaque	HIGH Model trained; scaffold is code	MEDIUM Tool calls add some unpredictability	MEDIUM Clear tool boundaries	PARTIAL Scaffold code can be verified	Toolformer (2023) RAG (2020)	OpenAIAnthropicCohere	+ Scaffold logic inspectable + Tool permissions controllable	− Tool use enables real-world harm − Model decisions still opaque
Heavy Scaffolding / Agentic Multi-agent systems, complex orchestration, persistent memory, autonomous operation. Like Claude Code, Devin.	Deployment Patterns	25-40% Strong trend; scaffolding getting cheaper and more valuable	(illustrative)	4/10 Challenging High capability with emergent behavior; scaffold helps but autonomy is risky	MEDIUM-HIGH Scaffold code fully readable; model calls are black boxes	LOW Models trained separately; scaffold is engineered code	LOW Multi-step plans diverge unpredictably	HIGH Explicit component architecture	PARTIAL Scaffold verifiable; model calls not	ReAct (2022) Voyager (2023) Agent protocols	AnthropicCognitionOpenAI	+ Scaffold code auditable + Can add safety checks in code + Modular	− Emergent multi-step behavior − Autonomous = less oversight − Tool use risk
Dense Transformers Standard transformer architecture. All parameters active. Current GPT/Claude/Llama architecture.	Base Architectures	(base arch) Orthogonal to deployment - combined with scaffolding choices	(illustrative)	5/10 Mixed Most studied but still opaque; interpretability improving but slowly	LOW Weights exist but mech interp still primitive	HIGH Well-understood pretraining + RLHF	LOW-MED Emergent capabilities, phase transitions	LOW Monolithic, end-to-end trained	LOW Billions of parameters, no formal guarantees	Attention Is All You Need (2017) Scaling Laws (2020)	OpenAIAnthropicGoogle DeepMindMeta AI	+ Most studied architecture + Some interp tools exist	− Internals still opaque − Emergent deception possible − Scale makes analysis hard
Sparse / MoE Transformers Mixture-of-Experts or other sparse architectures. Only subset of params active per token.	Base Architectures	(base arch) May become default for efficiency; orthogonal to scaffolding	(illustrative)	4/10 Mixed Efficiency gains good for safety research budget, but routing adds complexity	LOW Same opacity as dense + routing complexity	HIGH Standard + load balancing	LOW Routing adds another layer of unpredictability	MEDIUM Expert boundaries exist but interact	LOW Combinatorial explosion of expert paths	Switch Transformers (2021) Mixtral (2024)	Google DeepMindMistralxAI	+ Can study individual experts + More efficient = more testing budget	− Routing is another black box − Hard to cover all expert combinations
SSM / Hybrid (Mamba-style) State-space models or SSM-transformer hybrids with linear-time inference.	Base Architectures	5-15% Promising efficiency but transformers still dominate benchmarks	(illustrative)	Unknown Too early to assess; different internals may help or hurt	MEDIUM Different internals, less studied	HIGH Still gradient-based	MEDIUM Recurrence adds complexity	LOW Similar to transformers	UNKNOWN Recurrence may help or hurt	Mamba (Gu & Dao 2023) RWKV	CartesiaTogether AIPrinceton	+ More efficient + Linear complexity	− Interp tools don't transfer − Less studied
World Models + Planning Explicit learned world model with search/planning. More like AlphaGo than GPT.	Base Architectures	5-15% LeCun advocates; not yet competitive for general tasks	(illustrative)	6/10 Mixed Explicit structure helps inspection but goal misgeneralization risks higher	PARTIAL World model inspectable but opaque	HIGH Model-based RL, self-play	MEDIUM Explicit planning but model errors compound	MEDIUM Separate world model, policy, value	PARTIAL Planning verifiable, world model less so	World Models (Ha 2018) MuZero (2020) JEPA (LeCun 2022)	Google DeepMindMeta FAIRUC Berkeley	+ Explicit goals + Can inspect beliefs	− Goal misgeneralization − Mesa-optimization
Neuro-Symbolic Hybrid Neural + symbolic reasoning, knowledge graphs, or program synthesis.	Base Architectures	3-10% Long-promised, rarely delivered at scale	(illustrative)	7/10 Favorable Symbolic components enable formal verification; hybrid boundaries a challenge	PARTIAL Symbolic parts clear, neural parts opaque	COMPLEX Neural trainable, symbolic often hand-crafted	MEDIUM Explicit reasoning more auditable	HIGH Clear neural/symbolic separation	PARTIAL Symbolic parts formally verifiable	Neural Theorem Provers AlphaProof (2024)	IBM ResearchGoogle DeepMindMIT-IBM Lab	+ Auditable reasoning + Formal verification possible	− Brittleness − Hard to scale − Boundary problems
Provable/Guaranteed Safe Formally verified AI with mathematical safety guarantees. Davidad's agenda.	Base Architectures	1-5% Ambitious; unclear if achievable for general capabilities	(illustrative)	9/10 Favorable If achievable, best safety properties by design; uncertainty about feasibility	HIGH Designed for formal analysis	DIFFERENT Verified synthesis, not just SGD	HIGH Behavior bounded by proofs	HIGH Compositional by design	HIGH This is the point	Guaranteed Safe AI (2024) Davidad ARIA programme	ARIA (Davidad)MIRI	+ Mathematical guarantees + Auditable by construction	− May not scale − Capability tax − World model verification hard
Novel / Unknown Paradigm Something we haven't thought of yet. Placeholder for model uncertainty.	Base Architectures	5-15% Epistemic humility; history suggests surprises	(illustrative)	Unknown Cannot assess; all current safety research may or may not transfer	??? Depends on what emerges	??? Unknown	??? No basis for prediction	??? Unknown	??? Unknown	None listed	Unknown	+ Fresh start possible	− All current work may not transfer
Biological / Organoid Actual biological neurons, brain organoids, or wetware computing.	Alternative Compute	<1% Fascinating but far from TAI-relevant scale	(illustrative)	3/10 Challenging Deeply opaque; no existing safety tools apply; ethical complexities	LOW Biological systems inherently opaque	UNKNOWN Biological learning rules	LOW Noisy and variable	LOW Highly interconnected	LOW Too complex	DishBrain (Kagan 2022)	Cortical LabsVarious academic	+ May have human-like values + Energy efficient	− Ethical concerns − No interp tools − Slow iteration
Neuromorphic Hardware Spiking neural networks on specialized chips. Event-driven, analog.	Alternative Compute	1-3% Efficiency gains real but not on path to TAI	(illustrative)	Unknown Different substrate with different properties; too early to assess	PARTIAL Architecture known, dynamics complex	DIFFERENT Spike-timing plasticity	MEDIUM More brain-like	MEDIUM Modular chip designs possible	LOW Analog dynamics hard to verify	Loihi (Intel 2018) SpiNNaker	Intel LabsIBM ResearchSynSense	+ Energy efficient + Robust	− Current tools don't transfer − Less mature
Whole Brain Emulation Upload/simulate a complete biological brain at sufficient fidelity. Requires scanning + simulation tech.	Non-AI Paradigms	<1% Probably slower than AI; scanning tech far away	(illustrative)	5/10 Mixed Human values by default, but speed-up and copy-ability create novel risks	LOW Brain structure visible but not interpretable	N/A Copied from biological learning	LOW Human-like = unpredictable	LOW Brains are highly interconnected	LOW Too complex, poorly understood	Whole Brain Emulation Roadmap (Sandberg 2008)	CarboncopiesAcademic neuroscience	+ Human values by default + Understood entity type	− Ethics of copying minds − Could run faster than real-time − Identity issues
Genetic Enhancement IQ enhancement via embryo selection, polygenic screening, or direct genetic engineering.	Non-AI Paradigms	<0.5% Too slow for TAI race; incremental gains only	(illustrative)	7/10 Favorable Slow and controllable; enhanced humans still have human values	LOW Genetic effects poorly understood	N/A Biological development	MEDIUM Still human, but smarter	LOW Integrated biological system	LOW Biological complexity	Embryo Selection for Cognitive Enhancement (Shulman & Bostrom)	Genomic PredictionAcademic genetics	+ Human values + Slow/controllable + Socially legible	− Ethical concerns − Too slow to matter for TAI − Inequality risks
Brain-Computer Interfaces Neural interfaces that augment human cognition with AI/compute. Neuralink-style.	Non-AI Paradigms	<1% Bandwidth limits; AI likely faster standalone	(illustrative)	5/10 Mixed Human oversight built-in, but security risks and bandwidth limits	PARTIAL Interface visible, brain opaque	HYBRID Human learning + AI training	LOW Human in the loop = unpredictable	MEDIUM Clear human/AI boundary	LOW Human component unverifiable	Neuralink whitepaper (2019)	NeuralinkSynchronBrainGate	+ Human oversight built-in + Gradual augmentation	− Bandwidth limits − Security risks − Human bottleneck
Collective/Hybrid Intelligence Human-AI teams, prediction markets, deliberative democracy augmented by AI. Intelligence from coordination.	Non-AI Paradigms	(overlay) Not exclusive; already happening	(illustrative)	7/10 Favorable Human oversight natural; slower pace; but coordination challenges	PARTIAL Process visible, emergent behavior less so	N/A Coordination protocols, not training	MEDIUM Depends on protocol design	HIGH Explicitly modular by design	PARTIAL Protocols can be analyzed	Superforecasting (Tetlock) Collective Intelligence papers	AnthropicOpenAIMetaculus	+ Human oversight + Diverse perspectives + Slower = more controllable	− Coordination failures − Vulnerable to manipulation − May not scale

16 scenarios across 4 categories